Full Code of haomo-ai/Cam4DOcc for AI

main 542f14a9d9e1 cached

110 files

543.7 KB

143.1k tokens

438 symbols

1 requests

Download .txt

Showing preview only (580K chars total). Download the full file or copy to clipboard to get everything.

Repository: haomo-ai/Cam4DOcc
Branch: main
Commit: 542f14a9d9e1
Files: 110
Total size: 543.7 KB

Directory structure:
gitextract_hgf40bk9/

├── LICENSE
├── README.md
├── data/
│   ├── README.md
│   ├── cam4docc/
│   │   ├── .gitkeep
│   │   ├── GMO/
│   │   │   └── .gitkeep
│   │   ├── GMO_lyft/
│   │   │   └── .gitkeep
│   │   ├── MMO/
│   │   │   └── .gitkeep
│   │   └── MMO_lyft/
│   │       └── .gitkeep
│   └── nuscenes/
│       └── .gitkeep
├── other_baselines/
│   ├── README.md
│   ├── lifted_2d/
│   │   └── eval_lifted_2d.py
│   ├── static_world/
│   │   └── eval_static_world.py
│   └── voxel_pcp/
│       └── eval_voxel_pcp.py
├── projects/
│   ├── __init__.py
│   ├── configs/
│   │   ├── _base_/
│   │   │   ├── datasets/
│   │   │   │   ├── custom_lyft-3d.py
│   │   │   │   ├── custom_nus-3d.py
│   │   │   │   └── custom_waymo-3d.py
│   │   │   ├── default_runtime.py
│   │   │   └── schedules/
│   │   │       ├── cosine.py
│   │   │       ├── cyclic_20e.py
│   │   │       ├── cyclic_40e.py
│   │   │       ├── mmdet_schedule_1x.py
│   │   │       ├── schedule_2x.py
│   │   │       ├── schedule_3x.py
│   │   │       ├── seg_cosine_150e.py
│   │   │       ├── seg_cosine_200e.py
│   │   │       └── seg_cosine_50e.py
│   │   ├── baselines/
│   │   │   ├── OCFNet_in_Cam4DOcc_V1.1.py
│   │   │   ├── OCFNet_in_Cam4DOcc_V1.1_lyft.py
│   │   │   ├── OCFNet_in_Cam4DOcc_V1.2.py
│   │   │   └── OCFNet_in_Cam4DOcc_V1.2_lyft.py
│   │   └── datasets/
│   │       └── custom_nus-3d.py
│   └── occ_plugin/
│       ├── __init__.py
│       ├── core/
│       │   ├── __init__.py
│       │   ├── evaluation/
│       │   │   ├── __init__.py
│       │   │   ├── efficiency_hooks.py
│       │   │   └── eval_hooks.py
│       │   └── visualizer/
│       │       ├── __init__.py
│       │       └── show_occ.py
│       ├── datasets/
│       │   ├── __init__.py
│       │   ├── builder.py
│       │   ├── cam4docc_dataset.py
│       │   ├── cam4docc_lyft_dataset.py
│       │   ├── nuscenes_dataset.py
│       │   ├── pipelines/
│       │   │   ├── __init__.py
│       │   │   ├── formating.py
│       │   │   ├── loading_bevdet.py
│       │   │   ├── loading_instance.py
│       │   │   ├── loading_occupancy.py
│       │   │   └── transform_3d.py
│       │   └── samplers/
│       │       ├── __init__.py
│       │       ├── distributed_sampler.py
│       │       ├── group_sampler.py
│       │       └── sampler.py
│       ├── occupancy/
│       │   ├── __init__.py
│       │   ├── apis/
│       │   │   ├── __init__.py
│       │   │   ├── mmdet_train.py
│       │   │   ├── test.py
│       │   │   └── train.py
│       │   ├── backbones/
│       │   │   ├── __init__.py
│       │   │   ├── pred_block.py
│       │   │   └── resnet3d.py
│       │   ├── dense_heads/
│       │   │   ├── __init__.py
│       │   │   ├── flow_head.py
│       │   │   ├── lovasz_softmax.py
│       │   │   ├── occ_head.py
│       │   │   └── utils.py
│       │   ├── detectors/
│       │   │   ├── __init__.py
│       │   │   ├── bevdepth.py
│       │   │   └── ocfnet.py
│       │   ├── fuser/
│       │   │   ├── __init__.py
│       │   │   ├── addfuse.py
│       │   │   ├── convfuse.py
│       │   │   └── visfuse.py
│       │   ├── image2bev/
│       │   │   ├── ViewTransformerLSSBEVDepth.py
│       │   │   ├── ViewTransformerLSSVoxel.py
│       │   │   └── __init__.py
│       │   ├── necks/
│       │   │   ├── __init__.py
│       │   │   ├── fpn3d.py
│       │   │   └── second_fpn_3d.py
│       │   └── voxel_encoder/
│       │       ├── __init__.py
│       │       └── sparse_lidar_enc.py
│       ├── ops/
│       │   ├── __init__.py
│       │   └── occ_pooling/
│       │       ├── OCC_Pool.py
│       │       ├── __init__.py
│       │       └── src/
│       │           ├── occ_pool.cpp
│       │           └── occ_pool_cuda.cu
│       └── utils/
│           ├── __init__.py
│           ├── coordinate_transform.py
│           ├── formating.py
│           ├── gaussian.py
│           ├── geometry.py
│           ├── metric_util.py
│           ├── nusc_param.py
│           ├── semkitti.py
│           └── voxel_to_points.py
├── run.sh
├── run_eval.sh
├── setup.py
├── tools/
│   ├── dist_test.sh
│   ├── dist_train.sh
│   ├── gen_data/
│   │   └── gen_depth_gt.py
│   ├── misc/
│   │   ├── browse_dataset.py
│   │   ├── fuse_conv_bn.py
│   │   ├── print_config.py
│   │   └── visualize_results.py
│   ├── test.py
│   └── train.py
└── viz/
    ├── viz_gt.py
    └── viz_pred.py

================================================
FILE CONTENTS
================================================

================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2023 HAOMO.AI

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# Cam4DOcc

The official code an data for the benchmark with baselines for our paper: [Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications](https://arxiv.org/abs/2311.17663)

This work has been accepted by CVPR 2024 :tada:

[Junyi Ma#](https://github.com/BIT-MJY), [Xieyuanli Chen#](https://github.com/Chen-Xieyuanli), Jiawei Huang, [Jingyi Xu](https://github.com/BIT-XJY), [Zhen Luo](https://github.com/Blurryface0814), Jintao Xu, Weihao Gu, Rui Ai, [Hesheng Wang*](https://scholar.google.com/citations?hl=en&user=q6AY9XsAAAAJ)

<img src="https://github.com/haomo-ai/Cam4DOcc/blob/main/benchmark.png" width="49%"/> <img src="https://github.com/haomo-ai/Cam4DOcc/blob/main/OCFNet.png" width="49%"/>



## Citation
If you use Cam4DOcc in an academic work, please cite our paper:

	@inproceedings{ma2024cvpr,
		author = {Junyi Ma and Xieyuanli Chen and Jiawei Huang and Jingyi Xu and Zhen Luo and Jintao Xu and Weihao Gu and Rui Ai and Hesheng Wang},
		title = {{Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications}},
		booktitle = {Proc.~of the IEEE/CVF Conf.~on Computer Vision and Pattern Recognition (CVPR)},
		year = 2024
	}
 
## Installation

<details>
	
<summary>We follow the installation instructions of our codebase OpenOccupancy, which are also posted here
</summary>

* Create a conda virtual environment and activate it
```bash
conda create -n cam4docc python=3.7 -y
conda activate cam4docc
```
* Install PyTorch and torchvision (tested on torch==1.10.1 & cuda=11.3)
```bash
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
```
* Install gcc>=5 in conda env
```bash
conda install -c omgarcia gcc-6
```
* Install mmcv, mmdet, and mmseg
```bash
pip install mmcv-full==1.4.0
pip install mmdet==2.14.0
pip install mmsegmentation==0.14.1
```
* Install mmdet3d from the source code
```bash
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
git checkout v0.17.1 # Other versions may not be compatible.
python setup.py install
```
* Install other dependencies
```bash
pip install timm
pip install open3d-python
pip install PyMCubes
pip install spconv-cu113
pip install fvcore
pip install setuptools==59.5.0

pip install lyft_dataset_sdk # for lyft dataset
```
* Install occupancy pooling
```
git clone git@github.com:haomo-ai/Cam4DOcc.git
cd Cam4DOcc
export PYTHONPATH=“.”
python setup.py develop
```

</details>

## Data Structure

### nuScenes dataset
* Please link your [nuScenes V1.0 full dataset](https://www.nuscenes.org/nuscenes#download) to the data folder. 
* [nuScenes-Occupancy](https://drive.google.com/file/d/1vTbgddMzUN6nLyWSsCZMb9KwihS7nPoH/view?usp=sharing), [nuscenes_occ_infos_train.pkl](https://github.com/JeffWang987/OpenOccupancy/releases/tag/train_pkl), and [nuscenes_occ_infos_val.pkl](https://github.com/JeffWang987/OpenOccupancy/releases/tag/val_pkl) are also provided by the previous work. If you only want to reproduce the forecasting results with "inflated" form, nuScenes dataset and Cam4DOcc are all you need.

### Lyft dataset
* Please link your [Lyft dataset](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/data) to the data folder.
* The required folders are listed below.

Note that the folders under `cam4docc` will be generated automatically once you first run our training or evaluation scripts.

```bash
Cam4DOcc
├── data/
│   ├── nuscenes/
│   │   ├── maps/
│   │   ├── samples/
│   │   ├── sweeps/
│   │   ├── lidarseg/
│   │   ├── v1.0-test/
│   │   ├── v1.0-trainval/
│   │   ├── nuscenes_occ_infos_train.pkl
│   │   ├── nuscenes_occ_infos_val.pkl
│   ├── nuScenes-Occupancy/
│   ├── lyft/
│   │   ├── maps/
│   │   ├── train_data/
│   │   ├── images/   # from train images, containing xxx.jpeg
│   ├── cam4docc
│   │   ├── GMO/
│   │   │   ├── segmentation/
│   │   │   ├── instance/
│   │   │   ├── flow/
│   │   ├── MMO/
│   │   │   ├── segmentation/
│   │   │   ├── instance/
│   │   │   ├── flow/
│   │   ├── GMO_lyft/
│   │   │   ├── ...
│   │   ├── MMO_lyft/
│   │   │   ├── ...
```
Alternatively, you could manually modify the path parameters in the [config files](https://github.com/haomo-ai/Cam4DOcc/tree/main/projects/configs/baselines) instead of using the default data structure, which are also listed here:
```
occ_path = "./data/nuScenes-Occupancy"
depth_gt_path = './data/depth_gt'
train_ann_file = "./data/nuscenes/nuscenes_occ_infos_train.pkl"
val_ann_file = "./data/nuscenes/nuscenes_occ_infos_val.pkl"
cam4docc_dataset_path = "./data/cam4docc/"
nusc_root = './data/nuscenes/'
```

## Training and Evaluation

We directly integrate the Cam4DOcc dataset generation pipeline into the dataloader, so you can directly run training or evaluate scripts and just wait :smirk:

Optionally, you can set `only_generate_dataset=True` in the [config files](https://github.com/haomo-ai/Cam4DOcc/tree/main/projects/configs/baselines) to only generate the Cam4DOcc data without model training and inference.

### Train OCFNetV1.1 with 8 GPUs

OCFNetV1.1 can forecast inflated GMO and others. In this case, _vehicle_ and _human_ are considered as one unified category.

For the nuScenes dataset, please run

```bash
bash run.sh ./projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.1.py 8
```

For the Lyft dataset, please run

```bash
bash run.sh ./projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.1_lyft.py 8
```
### Train OCFNetV1.2 with 8 GPUs

OCFNetV1.2 can forecast inflated GMO including _bicycle_, _bus_, _car_, _construction_, _motorcycle_, _trailer_, _truck_, _pedestrian_, and others. In this case, _vehicle_ and _human_ are divided into multiple categories for clearer evaluation on forecasting performance.

For the nuScenes dataset, please run

```bash
bash run.sh ./projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.2.py 8
```

For the Lyft dataset, please run

```bash
bash run.sh ./projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.2_lyft.py 8
```

* The training/test process will be accelerated several times after you generate datasets by the first epoch.

### Test OCFNet for different tasks

If you only want to test the performance of occupancy prediction for the present frame (current observation), please set `test_present=True` in the [config files](https://github.com/haomo-ai/Cam4DOcc/tree/main/projects/configs/baselines). Otherwise, forecasting performance on the future interval is evaluated.

```bash
bash run_eval.sh $PATH_TO_CFG $PATH_TO_CKPT $GPU_NUM
# e.g. bash run_eval.sh ./projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.1.py ./work_dirs/OCFNet_in_Cam4DOcc_V1.1/epoch_20.pth  8
```
Please set `save_pred` and `save_path` in the config files once saving prediction results is needed.

`VPQ` evaluation of 3D instance prediction will be refined in the future.

### Visualization

Please install the dependencies as follows:

```bash
sudo apt-get install Xvfb
pip install xvfbwrapper
pip install mayavi
```
where `Xvfb` may be needed for visualization in your server.

**Visualize ground-truth occupancy labels**. Set `show_time_change = True` if you want to show the changing state of occupancy in time intervals. 

```bash
cd viz
python viz_gt.py
```
<img src="https://github.com/haomo-ai/Cam4DOcc/blob/main/viz_occupancy.png" width="100%"/>

**Visualize occupancy forecasting results**. Set `show_time_change = True` if you want to show the changing state of occupancy in time intervals. 

```bash
cd viz
python viz_pred.py
```
<img src="https://github.com/haomo-ai/Cam4DOcc/blob/main/viz_pred.png" width="100%"/>

There is still room for improvement. Camera-only 4D occupancy forecasting remains challenging, especially for predicting over longer time intervals with many moving objects. We envision this benchmark as a valuable evaluation tool, and our OCFNet can serve as a foundational codebase for future research on 4D occupancy forecasting.

## Basic Information

Some basic information as well as key parameters for our current version.

| Type |  Info | Parameter |
| :----: | :----: | :----: |
| train           | 23,930 sequences | train_capacity |
| val             | 5,119 frames | test_capacity |
| voxel size      | 0.2m | voxel_x/y/z |
| range           | [-51.2m, -51.2m, -5m, 51.2m, 51.2m, 3m]| point_cloud_range |
| volume size     | [512, 512, 40]| occ_size |
| classes         | 2 for V1.1 / 9 for V1.2 | num_cls |
| observation frames | 3 | time_receptive_field |
| future frames | 4 | n_future_frames |
| extension frames | 6 | n_future_frames_plus |

Our proposed OCFNet can still perform well while being trained with partial data. Please try to decrease `train_capacity` if you want to explore more details with sparser supervision signals. 

In addition, please make sure that `n_future_frames_plus <= time_receptive_field + n_future_frames` because `n_future_frames_plus` means the real prediction number. We estimate more frames including the past ones rather than only `n_future_frames`.

## Pretrained Models

We will provide our pretrained models of the erratum version. Your patience is appreciated.

**Deprecated:**

~~Please download our pretrained models (for epoch=20) to resume training or reproduce results.~~

| Version | Google Drive <img src="https://ssl.gstatic.com/docs/doclist/images/drive_2022q3_32dp.png" alt="Google Drive" width="18"/> | Baidu Cloud <img src="https://nd-static.bdstatic.com/m-static/v20-main/favicon-main.ico" alt="Baidu Yun" width="18"/> | Config |
| :---: | :---: | :---: | :---: |
| ~~V1.0~~ | ~~link~~ | ~~link~~ | ~~only vehicle~~ |
| V1.1 | [link](https://drive.google.com/file/d/1IXRqOQk3RKpIjGgBBqV9D9vgSt58QDr8/view?usp=sharing) | [link](https://pan.baidu.com/s/18gODsVnBAXEJ4pzv2-LqGA?pwd=m99b) | [OCFNet_in_Cam4DOcc_V1.1.py](https://github.com/haomo-ai/Cam4DOcc/blob/main/projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.1.py) |
| V1.2 | [link](https://drive.google.com/file/d/1q1XnRt0wYE3oq6YBMBnagpGL7h2I46uN/view?usp=sharing) | [link](https://pan.baidu.com/s/1OPc1-a2McOO_0QPX63J7WQ?pwd=adic) | [OCFNet_in_Cam4DOcc_V1.2.py](https://github.com/haomo-ai/Cam4DOcc/blob/main/projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.2.py) |


## Other Baselines

We also provide the evaluation on the forecasting performance of [other baselines](https://github.com/haomo-ai/Cam4DOcc/tree/main/other_baselines) in Cam4DOcc.

## TODO
The tutorial is being updated ...

We will release our pretrained models as soon as possible. OCFNetV1.3 and OCFNetV2 are on their way ...


### Acknowledgement

We thank the fantastic works [OpenOccupancy](https://github.com/JeffWang987/OpenOccupancy), [PowerBEV](https://github.com/EdwardLeeLPZ/PowerBEV), and [FIERY](https://anthonyhu.github.io/fiery) for their pioneer code release, which provide codebase for this benchmark.


================================================
FILE: data/README.md
================================================
### Data Structure

Please link your [nuScenes V1.0 full dataset ](https://www.nuscenes.org/nuscenes#download) to the data folder. 

[nuScenes-Occupancy](https://drive.google.com/file/d/1vTbgddMzUN6nLyWSsCZMb9KwihS7nPoH/view?usp=sharing), [nuscenes_occ_infos_train.pkl](https://github.com/JeffWang987/OpenOccupancy/releases/tag/train_pkl), and [nuscenes_occ_infos_val.pkl](https://github.com/JeffWang987/OpenOccupancy/releases/tag/val_pkl) are also provided by the previous work. If you only want to reproduce the forecasting results with "inflated" form, nuScenes dataset and Cam4DOcc are all you need.

Note that the folders under `cam4docc` will be generated automatically once you first run our training or evaluation scripts.

```bash
Cam4DOcc
├── data/
│   ├── nuscenes/
│   │   ├── maps/
│   │   ├── samples/
│   │   ├── sweeps/
│   │   ├── lidarseg/
│   │   ├── v1.0-test/
│   │   ├── v1.0-trainval/
│   │   ├── nuscenes_occ_infos_train.pkl/
│   │   ├── nuscenes_occ_infos_val.pkl/
│   ├── nuScenes-Occupancy/
│   ├── cam4docc
│   │   ├── GMO/
│   │   │   ├── segmentation/
│   │   │   ├── instance/
│   │   │   ├── flow/
│   │   ├── MMO/
│   │   │   ├── segmentation/
│   │   │   ├── instance/
│   │   │   ├── flow/
```
The GMO folder will contain the data where vehicle and human are considered as one unified category.

The MMO folder will contain the data where vehicle and human are divided into multiple categories for clearer evaluation on forecasting performance.

In near future, we will unify GMO and MMO for easier usage.


================================================
FILE: data/cam4docc/.gitkeep
================================================


================================================
FILE: data/cam4docc/GMO/.gitkeep
================================================


================================================
FILE: data/cam4docc/GMO_lyft/.gitkeep
================================================


================================================
FILE: data/cam4docc/MMO/.gitkeep
================================================


================================================
FILE: data/cam4docc/MMO_lyft/.gitkeep
================================================


================================================
FILE: data/nuscenes/.gitkeep
================================================


================================================
FILE: other_baselines/README.md
================================================
## I. Static World

The static world model is built based on the identity hypothesis.

```bash
cd other_baselines/static_world
python ./eval_static_world.py
```
#### Parameters:
* **test_idx_dir**: Path of test indexes, which is generated by the standard OCFNet evaluation process. 
* **test_results_dir**: Path of occupancy prediction results. Here we simply set it to the path of OCFNet forecasting results and use the present occupancy prediction results for evaluation. You can also replace them with [OpenOccupancy](https://github.com/JeffWang987/OpenOccupancy) estimation results.
* **gt_dir**: Path of ground-truth segmentations.

##  II. Voxelization of PCP

Voxelization of point cloud prediction requires the outputs of [PCPNet](https://github.com/Blurryface0814/PCPNet). Here we use nuScenes-Occupancy as ground-truth since predicted points are limited by sparsity.
```bash
cd other_baselines/voxel_pcp
python ./eval_voxel_pcp.py
```
#### Parameters:
* **test_idx_dir**: Path of test indexes, which is generated by the standard OCFNet evaluation process.
* **occ_path**: Path of nuScenes-Occupancy.
* **test_results_dir**: Path of point cloud prediction results. The data is organized as follows:

```bash
Cam4DOcc
├── data/
│   ├── cam4docc/
│   │   ├── pcpnet_results/
│   │   │   ├── point_clouds/
│   │   │   │   ├── past/
│   │   │   │   │   ├── 000000.ply
│   │   │   │   │   ├── 000001.ply
│   │   │   │   │   ├── 000002.ply
│   │   │   │   │   ├── 000003.ply
│   │   │   │   ├── pred/
│   │   │   │   │   ├── 000000.ply
│   │   │   │   │   ├── ...
│   │   │   ├── saved_labels/
│   │   │   │   ├── past/
│   │   │   │   │   ├── 000000.label
│   │   │   │   │   ├── 000001.label
│   │   │   │   │   ├── 000002.label
│   │   │   │   │   ├── 000003.label
│   │   │   │   ├── pred/
│   │   │   │   │   ├── 000000.ply
│   │   │   │   │   ├── ...
```
We will provide our PCPNet predictions soon and please open an issue [here](https://github.com/Blurryface0814/PCPNet) if you have questions about how PCPNet is implemented for points forecasting.

## III. 2D-3D Lifted Prediction

2D-3D lifted prediction requires the outputs of [PowerBEV](https://github.com/EdwardLeeLPZ/PowerBEV). 

```bash
cd other_baselines/lifted_2d
python ./eval_lifted_2d.py
```
#### Parameters:
* **test_idx_dir**: Path of test indexes, which is generated by the standard OCFNet evaluation process.
* **gt_dir**: Path of ground-truth segmentations.
* **hmin**: minimum height for lifting operation.
* **hmax**: maximum height for lifting operation.
* **test_results_dir**: Path of point cloud prediction results. The data is organized as follows:
```bash
Cam4DOcc
├── data/
│   ├── cam4docc/
│   │   ├── powerbev_results/
│   │   │   ├── {scene_token}_{lidar_token}.npz
│   │   │   ├── ...
```
We have provided our [PowerBEV predictions](https://drive.google.com/file/d/1X_N-GwU2ZB65UI9-EYpeQrb2BzS44VVX/view?usp=sharing) and please open an issue [here](https://github.com/EdwardLeeLPZ/PowerBEV) if you have questions about how PowerBEV is implemented for BEV-based instance prediction.

More refinement strategies for the baselines will be released ... Before that, please simply use the scripts here for fast evaluation.

## Publications

If you use our proposed baselines in your work, please cite as:

* Cam4DOcc
```
@inproceedings{ma2024cvpr,
	author = {Junyi Ma and Xieyuanli Chen and Jiawei Huang and Jingyi Xu and Zhen Luo and Jintao Xu and Weihao Gu and Rui Ai and Hesheng Wang},
	title = {{Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications}},
	booktitle = {Proc.~of the IEEE/CVF Conf.~on Computer Vision and Pattern Recognition (CVPR)},
	year = 2024
}
```

* OpenOccupancy
```
@article{wang2023openoccupancy,
  title={Openoccupancy: A large scale benchmark for surrounding semantic occupancy perception},
  author={Wang, Xiaofeng and Zhu, Zheng and Xu, Wenbo and Zhang, Yunpeng and Wei, Yi and Chi, Xu and Ye, Yun and Du, Dalong and Lu, Jiwen and Wang, Xingang},
  journal={arXiv preprint arXiv:2303.03991},
  year={2023}
}
```

* PCPNet
```
@ARTICLE{10141631,
  author={Luo, Zhen and Ma, Junyi and Zhou, Zijie and Xiong, Guangming},
  journal={IEEE Robotics and Automation Letters}, 
  title={PCPNet: An Efficient and Semantic-Enhanced Transformer Network for Point Cloud Prediction}, 
  year={2023},
  volume={8},
  number={7},
  pages={4267-4274},
  doi={10.1109/LRA.2023.3281937}}
```
* PowerBEV
```
@inproceedings{ijcai2023p120,
  title     = {PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction in Bird’s-Eye View},
  author    = {Li, Peizheng and Ding, Shuxiao and Chen, Xieyuanli and Hanselmann, Niklas and Cordts, Marius and Gall, Juergen},
  booktitle = {Proceedings of the Thirty-Second International Joint Conference on
               Artificial Intelligence, {IJCAI-23}},
  pages     = {1080--1088},
  year      = {2023},
  month     = {8},
  doi       = {10.24963/ijcai.2023/120},
}
```









================================================
FILE: other_baselines/lifted_2d/eval_lifted_2d.py
================================================
# Developed by Junyi Ma
# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
# https://github.com/haomo-ai/Cam4DOcc

from tqdm import trange
import numpy as np
from nuscenes import NuScenes
import os
import torch
import torch.nn.functional as F
import copy
from pyquaternion import Quaternion

# Setups =================================================================================================
test_idx_dir = "../../data/cam4docc/test_ids/"
test_results_dir = "../../data/cam4docc/powerbev_results/"
gt_dir = "../../data/cam4docc/MMO/segmentation/"

test_seqs = os.listdir(test_idx_dir)
test_segmentations = os.listdir(test_results_dir)
dimension = [512, 512, 40]
future_ious = [0, 0, 0, 0]

voxel_size = np.array([0.2,0.2,0.2])
pc_range = np.array([-50, -50, 0, 50, 50, 0])
voxel_size_new = np.array([0.2,0.2,0.2])
pc_range_new = np.array([-51.2, -51.2, -5, 51.2, 51.2, 3])

# 10*0.2=2m 
# You can modify the parameters to show the changes with variable heights for lifting
hmin = -1
hmax = 9

nusc = NuScenes(version='v1.0-trainval', dataroot="../../data/nuscenes", verbose=False)
# ========================================================================================================

def cm_to_ious(cm):
    mean_ious = []
    cls_num = len(cm)
    for i in range(cls_num):
        tp = cm[i, i]
        p = cm[:, i].sum()
        g = cm[i, :].sum()
        union = p + g - tp
        mean_ious.append(tp / union)
    return mean_ious

def fast_hist(pred, label, max_label=18): 
    pred = copy.deepcopy(pred.flatten())
    label = copy.deepcopy(label.flatten())
    bin_count = np.bincount(max_label * label.astype(int) + pred, minlength=max_label ** 2)
    iou_per_pred = (bin_count[-1]/(bin_count[-1]+bin_count[1]+bin_count[2]))
    return bin_count[:max_label ** 2].reshape(max_label, max_label),iou_per_pred

for i in trange(len(test_seqs)):
    segmentation_file = test_results_dir + test_seqs[i]
    instance_seq = np.load(segmentation_file)['arr_0']
    instance_seq = torch.from_numpy(instance_seq)

    test_seqs_idxs = np.load(test_idx_dir+test_seqs[i])["arr_0"]
    gt_segmentation_file = os.path.join(gt_dir, test_seqs[i])
    gt_segmentation_seqs = np.load(gt_segmentation_file, allow_pickle=True)['arr_0']

    for t in range(3, 7):
        scene_token_cur = test_seqs_idxs[t].split("_")[0]
        lidar_token_cur = test_seqs_idxs[t].split("_")[1]

        instance_ = instance_seq[0,(t-1)].unsqueeze(0)  # t-1 -> t
        instance_ = instance_.unsqueeze(0)
        instance_ = F.interpolate(instance_.float(), size=[500, 500], mode='nearest').contiguous() # Note: default PowerBEV has different ranges with OCFNet
        instance_ = instance_.squeeze(0)

        x_grid = torch.linspace(0, 500-1, 500, dtype=torch.float)
        x_grid = x_grid.view(500, 1).expand(500,500)
        y_grid = torch.linspace(0, 500-1,500, dtype=torch.float)
        y_grid = y_grid.view(1, 500).expand(500,500)
        mesh_grid_2d = torch.stack((x_grid, y_grid), -1)
        mesh_grid_2d = mesh_grid_2d.view(-1, 2)
        instance_ = instance_.view(-1, 1)

        semantics_lifted = []
        for ii in range(hmin, hmax): 
            semantics_lifted_ = torch.cat((mesh_grid_2d, ii*torch.ones_like(mesh_grid_2d[:,0:1])),dim=-1)
            semantics_lifted_ = torch.cat((semantics_lifted_, instance_),dim=-1)
            semantics_lifted.append(semantics_lifted_)

        semantics_lifted = np.array(torch.cat(semantics_lifted, dim=0))

        kept = semantics_lifted[:,-1]!=0
        semantics_lifted = semantics_lifted[kept]
        if semantics_lifted.shape[0] == 0:
            semantics_lifted = np.zeros((1,4))

        lidar_sample = nusc.get('sample_data', lidar_token_cur)
        lidar_sample_calib = nusc.get('calibrated_sensor', lidar_sample['calibrated_sensor_token'])
        lidar_sensor_rotation = Quaternion(lidar_sample_calib['rotation'])
        lidar_sensor_translation = np.array(lidar_sample_calib['translation'])[:, None]
        lidar_to_lidarego = np.vstack([
            np.hstack((lidar_sensor_rotation.rotation_matrix, lidar_sensor_translation)),
            np.array([0, 0, 0, 1])
        ])
        lidarego_to_lidar = np.linalg.inv(lidar_to_lidarego)
        points = np.ones_like(semantics_lifted)
        points[:,:3] = semantics_lifted[:,:3]
        points[:,:3] = points[:,:3] * voxel_size[None, :] + pc_range[:3][None, :] 
        points = lidarego_to_lidar @ points.T
        semantics_lifted_transformed = np.ones_like(semantics_lifted)
        semantics_lifted_transformed[:,:3] = (points.T)[:,:3]
        semantics_lifted_transformed[:,-1] = semantics_lifted[:,-1]
        semantics_lifted_transformed[:,:3] = (semantics_lifted_transformed[:,:3] - pc_range_new[:3][None, :]) / voxel_size_new[None, :] 

        pred_segmentation = np.zeros((dimension[0], dimension[1], dimension[2]))
        for j in range(semantics_lifted_transformed.shape[0]):
            cur_ind = semantics_lifted_transformed[j, :3].astype(int)
            cur_label = semantics_lifted_transformed[j, -1]
            if cur_label != 0:
                pred_segmentation[cur_ind[0],cur_ind[1],cur_ind[2]] = 1

        gt_segmentation = np.zeros((dimension[0], dimension[1], dimension[2]))
        gt_segmentation_raw = gt_segmentation_seqs[t].cpu().numpy()
        gt_segmentation[gt_segmentation_raw[:,0].astype(int),gt_segmentation_raw[:,1].astype(int),gt_segmentation_raw[:,2].astype(int)] = gt_segmentation_raw[:, -1]

        hist_cur, iou_per_pred = fast_hist(pred_segmentation.astype(int), gt_segmentation.astype(int), max_label=2)
        
        if t <= 3:
            future_ious[0] = future_ious[0] + hist_cur
        if t <= 4:
            future_ious[1] = future_ious[1] + hist_cur
        if t <= 5:
            future_ious[2] = future_ious[2] + hist_cur
        if t <= 6:
            future_ious[3] = future_ious[3] + hist_cur

for t in range(len(future_ious)):
    print("iou for step "+str(t), cm_to_ious(future_ious[t]))

================================================
FILE: other_baselines/static_world/eval_static_world.py
================================================
# Developed by Junyi Ma
# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
# https://github.com/haomo-ai/Cam4DOcc

import numpy as np
import os
import copy
from tqdm import trange

# Setups =================================================================================================
test_idx_dir = "../../data/cam4docc/test_ids/"
test_results_dir = "../../data/cam4docc/results/"
gt_dir = "../../data/cam4docc/MMO/segmentation/"

objects_max_label = 9

test_seqs = os.listdir(test_idx_dir)
test_segmentations = os.listdir(test_results_dir)
dimension = [512, 512, 40]
future_ious = [0, 0, 0, 0]
# ========================================================================================================

def cm_to_ious(cm):
    mean_ious = []
    cls_num = len(cm)
    for i in range(cls_num):
        tp = cm[i, i]
        p = cm[:, i].sum()
        g = cm[i, :].sum()
        union = p + g - tp
        mean_ious.append(tp / union)
    return mean_ious

def fast_hist(pred, label, max_label=18):
    pred = copy.deepcopy(pred.flatten())
    label = copy.deepcopy(label.flatten())
    bin_count = np.bincount(max_label * label.astype(int) + pred, minlength=max_label ** 2)
    iou_per_pred = (bin_count[-1]/(bin_count[-1]+bin_count[1]+bin_count[2]))
    return bin_count[:max_label ** 2].reshape(max_label, max_label),iou_per_pred

for i in trange(len(test_seqs)):
    segmentation_file = test_results_dir + test_seqs[i]
    if not os.path.exists(segmentation_file):
        continue
    segmentation = np.load(segmentation_file,allow_pickle=True)['arr_0']
    test_seqs_idxs = np.load(os.path.join(test_idx_dir, test_seqs[i]))["arr_0"]
    gt_segmentation_file = os.path.join(gt_dir, test_seqs[i])
    gt_segmentation_seqs = np.load(gt_segmentation_file,allow_pickle=True)['arr_0']

    # hard coding for input:3 output:4
    for t in range(3,7):
        # static world using present predictions
        segmentation_t = segmentation[0]
        pred_segmentation = np.zeros((dimension[0], dimension[1], dimension[2]))
        for j in range(segmentation_t.shape[0]):
            cur_ind = segmentation_t[j, :3].astype(int)
            cur_label = segmentation_t[j, -1]
            pred_segmentation[cur_ind[0],cur_ind[1],cur_ind[2]] = cur_label

        gt_segmentation = np.zeros((dimension[0], dimension[1], dimension[2]))
        gt_segmentation_raw = gt_segmentation_seqs[t]
        for k in range(gt_segmentation_raw.shape[0]):
            cur_ind = gt_segmentation_raw[k, :3].astype(int)
            cur_label = gt_segmentation_raw[k, -1]
            gt_segmentation[cur_ind[0],cur_ind[1],cur_ind[2]] = cur_label
        hist_cur, iou_per_pred = fast_hist(pred_segmentation.astype(int), gt_segmentation.astype(int), max_label=objects_max_label)
        if t <= 3:
            future_ious[0] = future_ious[0] + hist_cur
        if t <= 4:
            future_ious[1] = future_ious[1] + hist_cur
        if t <= 5:
            future_ious[2] = future_ious[2] + hist_cur
        if t <= 6:
            future_ious[3] = future_ious[3] + hist_cur

for t in range(len(future_ious)):
    print("iou for step "+str(t), cm_to_ious(future_ious[t]))

================================================
FILE: other_baselines/voxel_pcp/eval_voxel_pcp.py
================================================
# Developed by Junyi Ma
# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
# https://github.com/haomo-ai/Cam4DOcc

import numpy as np
import os
import copy
from tqdm import trange
import open3d as o3d
from nuscenes import NuScenes
from pyquaternion import Quaternion

# Setups =================================================================================================
test_idx_dir = "../../data/cam4docc/test_ids/"
test_results_dir = "../../data/cam4docc/pcpnet_results/"
occ_path = "../../data/nuScenes-Occupancy"

test_seqs = os.listdir(test_idx_dir)
test_segmentations = os.listdir(test_results_dir)
pc_range= np.array([-51.2, -51.2, -5.0, 51.2, 51.2, 3.0])
dimension = [512, 512, 40]
grid_size= np.array(dimension)
voxel_size = (pc_range[3:] -pc_range[:3]) / grid_size
future_ious = [0, 0, 0, 0]

nusc = NuScenes(version='v1.0-trainval', dataroot="../../data/nuscenes", verbose=False)
# ========================================================================================================

lidar_token2sample_token = {}
for i in range(len(nusc.sample)):  
    my_sample = nusc.sample[i]
    frame_token = my_sample['token']
    lidar_token = my_sample['data']['LIDAR_TOP']
    lidar_token2sample_token[lidar_token] = frame_token

def voxel2world(voxel):
    """
    voxel: [N, 3]
    """
    return voxel *voxel_size[None, :] + pc_range[:3][None, :]

def world2voxel(world):
    """
    world: [N, 3]
    """
    return (world - pc_range[:3][None, :]) / voxel_size[None, :]

def cm_to_ious(cm):
    mean_ious = []
    cls_num = len(cm)
    for i in range(cls_num):
        tp = cm[i, i]
        p = cm[:, i].sum()
        g = cm[i, :].sum()
        union = p + g - tp
        mean_ious.append(tp / union)
    return mean_ious

def fast_hist(pred, label, max_label=18):
    pred = copy.deepcopy(pred.flatten())
    label = copy.deepcopy(label.flatten())
    bin_count = np.bincount(max_label * label.astype(int) + pred, minlength=max_label ** 2)
    iou_per_pred = (bin_count[-1]/(bin_count[-1]+bin_count[1]+bin_count[2]))
    return bin_count[:max_label ** 2].reshape(max_label, max_label),iou_per_pred

def nb_process_label(processed_label, sorted_label_voxel_pair):
    label_size = 256
    counter = np.zeros((label_size,), dtype=np.uint16)
    counter[sorted_label_voxel_pair[0, 3]] = 1
    cur_sear_ind = sorted_label_voxel_pair[0, :3]
    for i in range(1, sorted_label_voxel_pair.shape[0]):
        cur_ind = sorted_label_voxel_pair[i, :3]
        if not np.all(np.equal(cur_ind, cur_sear_ind)):
            processed_label[cur_sear_ind[0], cur_sear_ind[1], cur_sear_ind[2]] = np.argmax(counter)
            counter = np.zeros((label_size,), dtype=np.uint16)
            cur_sear_ind = cur_ind
        counter[sorted_label_voxel_pair[i, 3]] += 1
    processed_label[cur_sear_ind[0], cur_sear_ind[1], cur_sear_ind[2]] = np.argmax(counter)
    
    return processed_label

def get_ego2lidar_pose(rec):
    lidar_top_data = nusc.get('sample_data', rec['data']['LIDAR_TOP'])
    lidar2ego_translation = nusc.get('calibrated_sensor', lidar_top_data['calibrated_sensor_token'])['translation']
    lidar2ego_rotation =  nusc.get('calibrated_sensor', lidar_top_data['calibrated_sensor_token'])['rotation']
    trans = -np.array(lidar2ego_translation)
    rot = Quaternion(lidar2ego_rotation).inverse
    return trans, rot

def get_lidar_pose(rec):
    current_sample = nusc.get('sample', rec['token'])
    egopose = nusc.get('ego_pose', nusc.get('sample_data', current_sample['data']['LIDAR_TOP'])['ego_pose_token'])
    trans = -np.array(egopose['translation'])
    rot = Quaternion(egopose['rotation']).inverse  
    return trans, rot

for i in trange(len(test_seqs)):
    test_seqs_idxs = np.load(os.path.join(test_idx_dir, test_seqs[i]))['arr_0']
    scene_token_present = test_seqs[i].split("_")[0]
    lidar_token_present = test_seqs[i].split("_")[1][:-4]

    # transform past point clouds to the present frame
    # point cloud prediction baseline is limited by sparsity of laser points, so we aggregate
    # past point clouds to mitigate in this version
    # More reasonable versions will be released
    past_voxels = []
    for t in range(1, 4):
        scene_token_ = test_seqs_idxs[t-1].split("_")[0]
        lidar_token_ = test_seqs_idxs[t-1].split("_")[1]
        point_file = test_results_dir+"point_clouds/"+scene_token_present+"_"+lidar_token_present+"/past/00000"+str(t)+".ply"
        label_file = test_results_dir+"saved_labels/"+scene_token_present+"_"+lidar_token_present+"/past/00000"+str(t)+".label"
        
        pcd_load = o3d.io.read_point_cloud(point_file)
        xyz_load = np.asarray(pcd_load.points)

        sample_token_present = lidar_token2sample_token[lidar_token_present]
        rec_present = nusc.get('sample', sample_token_present)
        translation_present, rotation_present = get_lidar_pose(rec_present)
        ego2lidar_translation_present, ego2lidar_rotation_present = get_ego2lidar_pose(rec_present)

        sample_token_ = lidar_token2sample_token[lidar_token_]
        rec_ = nusc.get('sample', sample_token_)
        translation_, rotation_ = get_lidar_pose(rec_)
        ego2lidar_translation_, ego2lidar_rotation_ = get_ego2lidar_pose(rec_)

        present_global2ego = [translation_present, rotation_present]
        present_ego2lidar = [ego2lidar_translation_present, ego2lidar_rotation_present]
        cur_global2ego = [translation_, rotation_]
        cur_ego2lidar = [ego2lidar_translation_, ego2lidar_rotation_]
        pcd_np_cor = np.dot(cur_ego2lidar[1].inverse.rotation_matrix, xyz_load.T)
        pcd_np_cor = pcd_np_cor.T
        pcd_np_cor = pcd_np_cor - cur_ego2lidar[0]
        pcd_np_cor = np.dot(cur_global2ego[1].inverse.rotation_matrix, pcd_np_cor.T)
        pcd_np_cor = pcd_np_cor.T
        pcd_np_cor = pcd_np_cor - cur_global2ego[0]
        pcd_np_cor = pcd_np_cor + present_global2ego[0]
        pcd_np_cor = np.dot(present_global2ego[1].rotation_matrix, pcd_np_cor.T)
        pcd_np_cor = pcd_np_cor.T
        pcd_np_cor = pcd_np_cor + present_ego2lidar[0]   # trans
        pcd_np_cor = np.dot(present_ego2lidar[1].rotation_matrix, pcd_np_cor.T)
        xyz_load = pcd_np_cor.T   

        xyz_load = world2voxel(xyz_load)
        label = np.fromfile(label_file, dtype=np.uint32)
        label = label.reshape((-1,1))
        segmentation_t = np.concatenate((xyz_load, label), axis=-1)
        kept = (segmentation_t[:,0]>0) & (segmentation_t[:,0]<dimension[0]) & (segmentation_t[:,1]>0) & (segmentation_t[:,1]<dimension[1])  & (segmentation_t[:,2]>0) & (segmentation_t[:,2]<dimension[2])
        segmentation_t = segmentation_t[kept]
        past_voxels.append(segmentation_t)

    past_voxel_aggregated = np.concatenate(past_voxels, axis=0)

    # for future forecasting
    for t in range(3, 7):
        scene_token_ = test_seqs_idxs[t].split("_")[0]
        lidar_token_ = test_seqs_idxs[t].split("_")[1]

        point_file = test_results_dir+"point_clouds/"+scene_token_present+"_"+lidar_token_present+"/pred/00000"+str(t-3)+".ply"
        label_file = test_results_dir+"saved_labels/"+scene_token_present+"_"+lidar_token_present+"/pred/00000"+str(t-3)+".label"

        pcd_load = o3d.io.read_point_cloud(point_file)
        xyz_load = np.asarray(pcd_load.points)
        xyz_load = world2voxel(xyz_load)

        label = np.fromfile(label_file, dtype=np.uint32)
        label = label.reshape((-1,1))

        segmentation_t = np.concatenate((xyz_load, label), axis=-1)
        kept = (segmentation_t[:,0]>0) & (segmentation_t[:,0]<dimension[0]) & (segmentation_t[:,1]>0) & (segmentation_t[:,1]<dimension[1])  & (segmentation_t[:,2]>0) & (segmentation_t[:,2]<dimension[2])
        segmentation_t = segmentation_t[kept]
        segmentation_t = np.concatenate((segmentation_t, past_voxel_aggregated), axis=0)

        pred_segmentation = np.zeros((dimension[0], dimension[1], dimension[2]))
        pred_segmentation[segmentation_t[:, 0].astype(int), segmentation_t[:, 1].astype(int), segmentation_t[:, 2].astype(int)] = segmentation_t[:, -1]

        # eval according to setups
        # hardcoding for classes of interest
        for otheridx in [0,1,8,11,12,13,14,15,16,17,18,255]:
            pred_segmentation[pred_segmentation==otheridx] = 0
        for vehidx in [2,3,4,5,6,7,9,10]:
            pred_segmentation[pred_segmentation==vehidx] = 1

        # load nuScenes-Occupancy
        rel_path = 'scene_{0}/occupancy/{1}.npy'.format(scene_token_, lidar_token_)
        gt_segmentation_file = os.path.join(occ_path, rel_path)
        pcd = np.load(gt_segmentation_file)

        pcd_label = pcd[..., -1:]
        pcd_label[pcd_label==0] = 255
        pcd_np_cor = voxel2world(pcd[..., [2,1,0]] + 0.5)
        pcd_np_cor = world2voxel(pcd_np_cor)

        # make sure the point is in the grid
        pcd_np_cor = np.clip(pcd_np_cor, np.array([0,0,0]), grid_size - 1)
        transformed_occ = copy.deepcopy(pcd_np_cor)
        pcd_np = np.concatenate([pcd_np_cor, pcd_label], axis=-1)

        # 255: noise, 1-16 normal classes, 0 unoccupied
        pcd_np = pcd_np[np.lexsort((pcd_np_cor[:, 0], pcd_np_cor[:, 1], pcd_np_cor[:, 2])), :]
        pcd_np = pcd_np.astype(np.int64)
        processed_label = np.ones(grid_size, dtype=np.uint8) * 0.0
        processed_label = nb_process_label(processed_label, pcd_np)
        # Opt.
        # processed_label[pcd_np[:, 0].astype(int), pcd_np[:, 1].astype(int), pcd_np[:, 2].astype(int)] = pcd_np[:, -1]

        for otheridx in [0,1,8,11,12,13,14,15,16,17,18,255]:
            processed_label[processed_label==otheridx] = 0
        for vehidx in [2,3,4,5,6,7,9,10]:
            processed_label[processed_label==vehidx] = 1

        hist_cur, iou_per_pred = fast_hist(pred_segmentation.astype(int), processed_label.astype(int), max_label=2)

        if t <= 3:
            future_ious[0] = future_ious[0] + hist_cur
        if t <= 4:
            future_ious[1] = future_ious[1] + hist_cur
        if t <= 5:
            future_ious[2] = future_ious[2] + hist_cur
        if t <= 6:
            future_ious[3] = future_ious[3] + hist_cur
            
for t in range(len(future_ious)):
    print("ious for step "+str(t), cm_to_ious(future_ious[t]))

================================================
FILE: projects/__init__.py
================================================


================================================
FILE: projects/configs/_base_/datasets/custom_lyft-3d.py
================================================
# If point cloud range is changed, the models should also change their point
# cloud range accordingly
point_cloud_range = [-80, -80, -5, 80, 80, 3]
# For Lyft we usually do 9-class detection
class_names = [
    'car', 'truck', 'bus', 'emergency_vehicle', 'other_vehicle', 'motorcycle',
    'bicycle', 'pedestrian', 'animal'
]
dataset_type = 'CustomLyftDataset'
data_root = 'data/lyft/'
# Input modality for Lyft dataset, this is consistent with the submission
# format which requires the information in input_modality.
input_modality = dict(
    use_lidar=True,
    use_camera=False,
    use_radar=False,
    use_map=False,
    use_external=True)
file_client_args = dict(backend='disk')
# Uncomment the following if use ceph or other file clients.
# See https://mmcv.readthedocs.io/en/latest/api.html#mmcv.fileio.FileClient
# for more details.
# file_client_args = dict(
#     backend='petrel',
#     path_mapping=dict({
#         './data/lyft/': 's3://lyft/lyft/',
#         'data/lyft/': 's3://lyft/lyft/'
#    }))
train_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=5,
        use_dim=5,
        file_client_args=file_client_args),
    dict(
        type='LoadPointsFromMultiSweeps',
        sweeps_num=10,
        file_client_args=file_client_args),
    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
    dict(
        type='GlobalRotScaleTrans',
        rot_range=[-0.3925, 0.3925],
        scale_ratio_range=[0.95, 1.05],
        translation_std=[0, 0, 0]),
    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
    dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='PointShuffle'),
    dict(type='DefaultFormatBundle3D', class_names=class_names),
    dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
test_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=5,
        use_dim=5,
        file_client_args=file_client_args),
    dict(
        type='LoadPointsFromMultiSweeps',
        sweeps_num=10,
        file_client_args=file_client_args),
    dict(
        type='MultiScaleFlipAug3D',
        img_scale=(1333, 800),
        pts_scale_ratio=1,
        flip=False,
        transforms=[
            dict(
                type='GlobalRotScaleTrans',
                rot_range=[0, 0],
                scale_ratio_range=[1., 1.],
                translation_std=[0, 0, 0]),
            dict(type='RandomFlip3D'),
            dict(
                type='PointsRangeFilter', point_cloud_range=point_cloud_range),
            dict(
                type='DefaultFormatBundle3D',
                class_names=class_names,
                with_label=False),
            dict(type='Collect3D', keys=['points'])
        ])
]
# construct a pipeline for data and gt loading in show function
# please keep its loading function consistent with test_pipeline (e.g. client)
eval_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=5,
        use_dim=5,
        file_client_args=file_client_args),
    dict(
        type='LoadPointsFromMultiSweeps',
        sweeps_num=10,
        file_client_args=file_client_args),
    dict(
        type='DefaultFormatBundle3D',
        class_names=class_names,
        with_label=False),
    dict(type='Collect3D', keys=['points'])
]

data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'lyft_infos_train.pkl',
        pipeline=train_pipeline,
        classes=class_names,
        modality=input_modality,
        test_mode=False),
    val=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'lyft_infos_val.pkl',
        pipeline=test_pipeline,
        classes=class_names,
        modality=input_modality,
        test_mode=True),
    test=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'lyft_infos_val.pkl',
        pipeline=test_pipeline,
        classes=class_names,
        modality=input_modality,
        test_mode=True))
# For Lyft dataset, we usually evaluate the model at the end of training.
# Since the models are trained by 24 epochs by default, we set evaluation
# interval to be 24. Please change the interval accordingly if you do not
# use a default schedule.
evaluation = dict(interval=24, pipeline=eval_pipeline)

================================================
FILE: projects/configs/_base_/datasets/custom_nus-3d.py
================================================
# If point cloud range is changed, the models should also change their point
# cloud range accordingly
point_cloud_range = [-50, -50, -5, 50, 50, 3]
# For nuScenes we usually do 10-class detection
class_names = [
    'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle',
    'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'
]
dataset_type = 'NuScenesDataset_eval_modified'
data_root = 'data/nuscenes/'
# Input modality for nuScenes dataset, this is consistent with the submission
# format which requires the information in input_modality.
input_modality = dict(
    use_lidar=True,
    use_camera=False,
    use_radar=False,
    use_map=False,
    use_external=False)
file_client_args = dict(backend='disk')
# Uncomment the following if use ceph or other file clients.
# See https://mmcv.readthedocs.io/en/latest/api.html#mmcv.fileio.FileClient
# for more details.
# file_client_args = dict(
#     backend='petrel',
#     path_mapping=dict({
#         './data/nuscenes/': 's3://nuscenes/nuscenes/',
#         'data/nuscenes/': 's3://nuscenes/nuscenes/'
#     }))
train_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=5,
        use_dim=5,
        file_client_args=file_client_args),
    dict(
        type='LoadPointsFromMultiSweeps',
        sweeps_num=10,
        file_client_args=file_client_args),
    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
    dict(
        type='GlobalRotScaleTrans',
        rot_range=[-0.3925, 0.3925],
        scale_ratio_range=[0.95, 1.05],
        translation_std=[0, 0, 0]),
    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
    dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='ObjectNameFilter', classes=class_names),
    dict(type='PointShuffle'),
    dict(type='DefaultFormatBundle3D', class_names=class_names),
    dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
test_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=5,
        use_dim=5,
        file_client_args=file_client_args),
    dict(
        type='LoadPointsFromMultiSweeps',
        sweeps_num=10,
        file_client_args=file_client_args),
    dict(
        type='MultiScaleFlipAug3D',
        img_scale=(1333, 800),
        pts_scale_ratio=1,
        flip=False,
        transforms=[
            dict(
                type='GlobalRotScaleTrans',
                rot_range=[0, 0],
                scale_ratio_range=[1., 1.],
                translation_std=[0, 0, 0]),
            dict(type='RandomFlip3D'),
            dict(
                type='PointsRangeFilter', point_cloud_range=point_cloud_range),
            dict(
                type='DefaultFormatBundle3D',
                class_names=class_names,
                with_label=False),
            dict(type='Collect3D', keys=['points'])
        ])
]
# construct a pipeline for data and gt loading in show function
# please keep its loading function consistent with test_pipeline (e.g. client)
eval_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=5,
        use_dim=5,
        file_client_args=file_client_args),
    dict(
        type='LoadPointsFromMultiSweeps',
        sweeps_num=10,
        file_client_args=file_client_args),
    dict(
        type='DefaultFormatBundle3D',
        class_names=class_names,
        with_label=False),
    dict(type='Collect3D', keys=['points'])
]

data = dict(
    samples_per_gpu=4,
    workers_per_gpu=4,
    train=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'nuscenes_infos_train.pkl',
        pipeline=train_pipeline,
        classes=class_names,
        modality=input_modality,
        test_mode=False,
        # we use box_type_3d='LiDAR' in kitti and nuscenes dataset
        # and box_type_3d='Depth' in sunrgbd and scannet dataset.
        box_type_3d='LiDAR'),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'nuscenes_infos_val.pkl',
        pipeline=test_pipeline,
        classes=class_names,
        modality=input_modality,
        test_mode=True,
        box_type_3d='LiDAR'),
    test=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'nuscenes_infos_val.pkl',
        pipeline=test_pipeline,
        classes=class_names,
        modality=input_modality,
        test_mode=True,
        box_type_3d='LiDAR'))
# For nuScenes dataset, we usually evaluate the model at the end of training.
# Since the models are trained by 24 epochs by default, we set evaluation
# interval to be 24. Please change the interval accordingly if you do not
# use a default schedule.
evaluation = dict(interval=24, pipeline=eval_pipeline)


================================================
FILE: projects/configs/_base_/datasets/custom_waymo-3d.py
================================================
# dataset settings
# D5 in the config name means the whole dataset is divided into 5 folds
# We only use one fold for efficient experiments
dataset_type = 'CustomWaymoDataset'
data_root = 'data/waymo/kitti_format/'
file_client_args = dict(backend='disk')
# Uncomment the following if use ceph or other file clients.
# See https://mmcv.readthedocs.io/en/latest/api.html#mmcv.fileio.FileClient
# for more details.
# file_client_args = dict(
#     backend='petrel', path_mapping=dict(data='s3://waymo_data/'))

img_norm_cfg = dict(
    mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
class_names = ['Car', 'Pedestrian', 'Cyclist']
point_cloud_range = [-74.88, -74.88, -2, 74.88, 74.88, 4]
input_modality = dict(use_lidar=False, use_camera=True)
db_sampler = dict(
    data_root=data_root,
    info_path=data_root + 'waymo_dbinfos_train.pkl',
    rate=1.0,
    prepare=dict(
        filter_by_difficulty=[-1],
        filter_by_min_points=dict(Car=5, Pedestrian=10, Cyclist=10)),
    classes=class_names,
    sample_groups=dict(Car=15, Pedestrian=10, Cyclist=10),
    points_loader=dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=5,
        use_dim=[0, 1, 2, 3, 4],
        file_client_args=file_client_args))



train_pipeline = [
    dict(type='LoadMultiViewImageFromFiles', to_float32=True),
    dict(type='PhotoMetricDistortionMultiViewImage'),
    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),
    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='ObjectNameFilter', classes=class_names),
    dict(type='NormalizeMultiviewImage', **img_norm_cfg),
    dict(type='PadMultiViewImage', size_divisor=32),
    dict(type='DefaultFormatBundle3D', class_names=class_names),
    dict(type='CustomCollect3D', keys=['gt_bboxes_3d', 'gt_labels_3d', 'img'])
]


test_pipeline = [
    dict(type='LoadMultiViewImageFromFiles', to_float32=True),
    dict(type='NormalizeMultiviewImage', **img_norm_cfg),
    dict(type='PadMultiViewImage', size_divisor=32),
    dict(
        type='MultiScaleFlipAug3D',
        img_scale=(1920, 1280),
        pts_scale_ratio=1,
        flip=False,
        transforms=[
            dict(
                type='DefaultFormatBundle3D',
                class_names=class_names,
                with_label=False),
            dict(type='CustomCollect3D', keys=['img'])
        ])
]


# construct a pipeline for data and gt loading in show function
# please keep its loading function consistent with test_pipeline (e.g. client)

data = dict(
    samples_per_gpu=2,
    workers_per_gpu=4,
    train=dict(
        type='RepeatDataset',
        times=2,
        dataset=dict(
            type=dataset_type,
            data_root=data_root,
            ann_file=data_root + 'waymo_infos_train.pkl',
            split='training',
            pipeline=train_pipeline,
            modality=input_modality,
            classes=class_names,
            test_mode=False,
            # we use box_type_3d='LiDAR' in kitti and nuscenes dataset
            # and box_type_3d='Depth' in sunrgbd and scannet dataset.
            box_type_3d='LiDAR',
            # load one frame every five frames
            load_interval=5)),
    val=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'waymo_infos_val.pkl',
        split='training',
        pipeline=test_pipeline,
        modality=input_modality,
        classes=class_names,
        test_mode=True,
        box_type_3d='LiDAR'),
    test=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'waymo_infos_val.pkl',
        split='training',
        pipeline=test_pipeline,
        modality=input_modality,
        classes=class_names,
        test_mode=True,
        box_type_3d='LiDAR'))

evaluation = dict(interval=24, pipeline=test_pipeline)

================================================
FILE: projects/configs/_base_/default_runtime.py
================================================
checkpoint_config = dict(interval=1)
# yapf:disable push
# By default we use textlogger hook and tensorboard
# For more loggers see
# https://mmcv.readthedocs.io/en/latest/api.html#mmcv.runner.LoggerHook
log_config = dict(
    interval=1,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])
# yapf:enable
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = None
load_from = None
resume_from = None
workflow = [('train', 1)]


================================================
FILE: projects/configs/_base_/schedules/cosine.py
================================================
# This schedule is mainly used by models with dynamic voxelization
# optimizer
lr = 0.003  # max learning rate
optimizer = dict(
    type='AdamW',
    lr=lr,
    betas=(0.95, 0.99),  # the momentum is change during training
    weight_decay=0.001)
optimizer_config = dict(grad_clip=dict(max_norm=10, norm_type=2))

lr_config = dict(
    policy='CosineAnnealing',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=1.0 / 10,
    min_lr_ratio=1e-5)

momentum_config = None

runner = dict(type='EpochBasedRunner', max_epochs=40)


================================================
FILE: projects/configs/_base_/schedules/cyclic_20e.py
================================================
# For nuScenes dataset, we usually evaluate the model at the end of training.
# Since the models are trained by 24 epochs by default, we set evaluation
# interval to be 20. Please change the interval accordingly if you do not
# use a default schedule.
# optimizer
# This schedule is mainly used by models on nuScenes dataset
optimizer = dict(type='AdamW', lr=1e-4, weight_decay=0.01)
# max_norm=10 is better for SECOND
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
lr_config = dict(
    policy='cyclic',
    target_ratio=(10, 1e-4),
    cyclic_times=1,
    step_ratio_up=0.4,
)
momentum_config = dict(
    policy='cyclic',
    target_ratio=(0.85 / 0.95, 1),
    cyclic_times=1,
    step_ratio_up=0.4,
)

# runtime settings
runner = dict(type='EpochBasedRunner', max_epochs=20)


================================================
FILE: projects/configs/_base_/schedules/cyclic_40e.py
================================================
# The schedule is usually used by models trained on KITTI dataset

# The learning rate set in the cyclic schedule is the initial learning rate
# rather than the max learning rate. Since the target_ratio is (10, 1e-4),
# the learning rate will change from 0.0018 to 0.018, than go to 0.0018*1e-4
lr = 0.0018
# The optimizer follows the setting in SECOND.Pytorch, but here we use
# the offcial AdamW optimizer implemented by PyTorch.
optimizer = dict(type='AdamW', lr=lr, betas=(0.95, 0.99), weight_decay=0.01)
optimizer_config = dict(grad_clip=dict(max_norm=10, norm_type=2))
# We use cyclic learning rate and momentum schedule following SECOND.Pytorch
# https://github.com/traveller59/second.pytorch/blob/3aba19c9688274f75ebb5e576f65cfe54773c021/torchplus/train/learning_schedules_fastai.py#L69  # noqa
# We implement them in mmcv, for more details, please refer to
# https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py#L327  # noqa
# https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/momentum_updater.py#L130  # noqa
lr_config = dict(
    policy='cyclic',
    target_ratio=(10, 1e-4),
    cyclic_times=1,
    step_ratio_up=0.4,
)
momentum_config = dict(
    policy='cyclic',
    target_ratio=(0.85 / 0.95, 1),
    cyclic_times=1,
    step_ratio_up=0.4,
)
# Although the max_epochs is 40, this schedule is usually used we
# RepeatDataset with repeat ratio N, thus the actual max epoch
# number could be Nx40
runner = dict(type='EpochBasedRunner', max_epochs=40)


================================================
FILE: projects/configs/_base_/schedules/mmdet_schedule_1x.py
================================================
# optimizer
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)


================================================
FILE: projects/configs/_base_/schedules/schedule_2x.py
================================================
# optimizer
# This schedule is mainly used by models on nuScenes dataset
optimizer = dict(type='AdamW', lr=0.001, weight_decay=0.01)
# max_norm=10 is better for SECOND
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=1.0 / 1000,
    step=[20, 23])
momentum_config = None
# runtime settings
runner = dict(type='EpochBasedRunner', max_epochs=24)


================================================
FILE: projects/configs/_base_/schedules/schedule_3x.py
================================================
# optimizer
# This schedule is mainly used by models on indoor dataset,
# e.g., VoteNet on SUNRGBD and ScanNet
lr = 0.008  # max learning rate
optimizer = dict(type='AdamW', lr=lr, weight_decay=0.01)
optimizer_config = dict(grad_clip=dict(max_norm=10, norm_type=2))
lr_config = dict(policy='step', warmup=None, step=[24, 32])
# runtime settings
runner = dict(type='EpochBasedRunner', max_epochs=36)


================================================
FILE: projects/configs/_base_/schedules/seg_cosine_150e.py
================================================
# optimizer
# This schedule is mainly used on S3DIS dataset in segmentation task
optimizer = dict(type='SGD', lr=0.2, weight_decay=0.0001, momentum=0.9)
optimizer_config = dict(grad_clip=None)
lr_config = dict(policy='CosineAnnealing', warmup=None, min_lr=0.002)
momentum_config = None

# runtime settings
runner = dict(type='EpochBasedRunner', max_epochs=150)


================================================
FILE: projects/configs/_base_/schedules/seg_cosine_200e.py
================================================
# optimizer
# This schedule is mainly used on ScanNet dataset in segmentation task
optimizer = dict(type='Adam', lr=0.001, weight_decay=0.01)
optimizer_config = dict(grad_clip=None)
lr_config = dict(policy='CosineAnnealing', warmup=None, min_lr=1e-5)
momentum_config = None

# runtime settings
runner = dict(type='EpochBasedRunner', max_epochs=200)


================================================
FILE: projects/configs/_base_/schedules/seg_cosine_50e.py
================================================
# optimizer
# This schedule is mainly used on S3DIS dataset in segmentation task
optimizer = dict(type='Adam', lr=0.001, weight_decay=0.001)
optimizer_config = dict(grad_clip=None)
lr_config = dict(policy='CosineAnnealing', warmup=None, min_lr=1e-5)
momentum_config = None

# runtime settings
runner = dict(type='EpochBasedRunner', max_epochs=50)


================================================
FILE: projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.1.py
================================================
# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
# https://github.com/haomo-ai/Cam4DOcc
# 2 classes: inflated GMO and others

# Basic params ******************************************
_base_ = [
    '../datasets/custom_nus-3d.py',
    '../_base_/default_runtime.py'
]
find_unused_parameters = True
# whether training and test together with dataset generation
only_generate_dataset = False
# we only consider use_camera in Cam4DOcc in the current version
input_modality = dict(
    use_lidar=False,
    use_camera=True,
    use_radar=False,
    use_map=False,
    use_external=False)
plugin = True
plugin_dir = "projects/occ_plugin/"
occ_path = "./data/nuScenes-Occupancy"
depth_gt_path = './data/depth_gt'
train_ann_file = "./data/nuscenes/nuscenes_occ_infos_train.pkl"
val_ann_file = "./data/nuscenes/nuscenes_occ_infos_val.pkl"
cam4docc_dataset_path = "./data/cam4docc/"
nusc_root = './data/nuscenes/'
# GMO class names
class_names = ['vehicle', 'human']
use_separate_classes = False
use_fine_occ = False

# Forecasting-related params ******************************************
# we use *time_receptive_field* past frames to forecast future *n_future_frames* frames
# for 3D instance prediction, n_future_frames_plus > n_future_frames has to be set
time_receptive_field = 3
n_future_frames = 4
n_future_frames_plus = 6
iou_thresh_for_vpq = 0.2
test_present = False

# Occupancy-related params ******************************************
point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]
occ_size = [512, 512, 40]
lss_downsample = [4, 4, 4]
voxel_x = (point_cloud_range[3] - point_cloud_range[0]) / occ_size[0]
voxel_y = (point_cloud_range[4] - point_cloud_range[1]) / occ_size[1]
voxel_z = (point_cloud_range[5] - point_cloud_range[2]) / occ_size[2]
empty_idx = 0
if use_separate_classes:
    num_cls = len(class_names) + 1
else:
    num_cls = 2
img_norm_cfg = None

# Save params ******************************************
save_pred = False
save_path = "./data/cam4docc/results"

# Data-generation and pipeline params ******************************************
dataset_type = 'Cam4DOccDataset'
file_client_args = dict(backend='disk')
data_config={
    'cams': ['CAM_FRONT_LEFT', 'CAM_FRONT', 'CAM_FRONT_RIGHT',
             'CAM_BACK_LEFT', 'CAM_BACK', 'CAM_BACK_RIGHT'],
    'Ncams': 6,
    'input_size': (896, 1600),
    'src_size': (900, 1600),
    # image-view augmentation
    'resize': (-0.06, 0.11),
    'rot': (-5.4, 5.4),
    'flip': False,
    'crop_h': (0.0, 0.0),
    'resize_test': 0.00,
}

bda_aug_conf = dict(
            rot_lim=(-0, 0),
            scale_lim=(0.95, 1.05),
            flip_dx_ratio=0.5,
            flip_dy_ratio=0.5)

train_capacity = 23930 # default: use all sequences
test_capacity = 5119 # default: use all sequences
train_pipeline = [
    dict(type='LoadInstanceWithFlow', cam4docc_dataset_path=cam4docc_dataset_path, grid_size=occ_size, use_flow=True, background=empty_idx, pc_range=point_cloud_range,
                use_separate_classes=use_separate_classes),
    dict(type='LoadMultiViewImageFromFiles_BEVDet', is_train=True, data_config=data_config,
                sequential=False, aligned=True, trans_only=False, depth_gt_path=depth_gt_path, data_root=nusc_root,
                mmlabnorm=True, load_depth=True, img_norm_cfg=img_norm_cfg),
    dict(type='LoadOccupancy', to_float32=True, occ_path=occ_path, grid_size=occ_size, unoccupied=empty_idx, pc_range=point_cloud_range, use_fine_occ=use_fine_occ, test_mode=False),
    dict(type='OccDefaultFormatBundle3D', class_names=class_names),
    dict(type='Collect3D', keys=['img_inputs_seq', 'gt_occ', 'segmentation', 'instance', 'flow', 'future_egomotion']),
]

test_pipeline = [
    dict(type='LoadInstanceWithFlow', cam4docc_dataset_path=cam4docc_dataset_path, grid_size=occ_size, use_flow=True, background=empty_idx, pc_range=point_cloud_range,
         use_separate_classes=use_separate_classes),
    dict(type='LoadMultiViewImageFromFiles_BEVDet', data_config=data_config, depth_gt_path=depth_gt_path, data_root=nusc_root,
         sequential=False, aligned=True, trans_only=False, mmlabnorm=True, img_norm_cfg=img_norm_cfg, test_mode=True),
    dict(type='LoadOccupancy', to_float32=True, occ_path=occ_path, grid_size=occ_size, unoccupied=empty_idx, pc_range=point_cloud_range, use_fine_occ=use_fine_occ, test_mode=True),
    dict(type='OccDefaultFormatBundle3D', class_names=class_names, with_label=False), 
    dict(type='Collect3D', keys=['img_inputs_seq', 'gt_occ', 'segmentation', 'instance', 'flow', 'future_egomotion'], meta_keys=['pc_range', 'occ_size', 'scene_token', 'lidar_token']),
]

train_config=dict(
        type=dataset_type,
        data_root=nusc_root,
        occ_root=occ_path,
        idx_root=cam4docc_dataset_path,
        ori_data_root=cam4docc_dataset_path,
        ann_file=train_ann_file,
        pipeline=train_pipeline,
        classes=class_names,
        use_separate_classes=use_separate_classes,
        modality=input_modality,
        test_mode=False,
        use_valid_flag=True,
        occ_size=occ_size,
        pc_range=point_cloud_range,
        box_type_3d='LiDAR',
        time_receptive_field=time_receptive_field,
        n_future_frames=n_future_frames,
        train_capacity=train_capacity,
        test_capacity=test_capacity,
        ) 

test_config=dict(
    type=dataset_type,
    occ_root=occ_path,
    data_root=nusc_root,
    idx_root=cam4docc_dataset_path,
    ori_data_root=cam4docc_dataset_path,
    ann_file=val_ann_file,
    pipeline=test_pipeline,
    classes=class_names,
    use_separate_classes=use_separate_classes,
    modality=input_modality,
    occ_size=occ_size,
    pc_range=point_cloud_range,
    time_receptive_field=time_receptive_field, 
    n_future_frames=n_future_frames,
    train_capacity=train_capacity,
    test_capacity=test_capacity,
    ) 

# in our work we use 8 NVIDIA A100 GPUs
data = dict(
    samples_per_gpu=1,
    workers_per_gpu=1,
    train=train_config,
    val=test_config,
    test=test_config,
    shuffler_sampler=dict(type='DistributedGroupSampler'),
    nonshuffler_sampler=dict(type='DistributedSampler'),
)

# Model params ******************************************
grid_config = {
    'xbound': [point_cloud_range[0], point_cloud_range[3], voxel_x*lss_downsample[0]],
    'ybound': [point_cloud_range[1], point_cloud_range[4], voxel_y*lss_downsample[1]],
    'zbound': [point_cloud_range[2], point_cloud_range[5], voxel_z*lss_downsample[2]],
    'dbound': [2.0, 58.0, 0.5],
}

voxel_channels = [32*(time_receptive_field), 32*2*(time_receptive_field), 32*4*(time_receptive_field), 32*8*(time_receptive_field)]
pred_channels = [32, 32*2, 32*4, 32*8]
decoder_channels = [32*(n_future_frames_plus), 32*2*(n_future_frames_plus), 32*4*(n_future_frames_plus), 32*8*(n_future_frames_plus)]

numC_Trans = 64
occ_encoder_input_channel = (numC_Trans+6)*time_receptive_field
voxel_out_channel = 32*(n_future_frames_plus)
flow_out_channel = 32*(n_future_frames_plus)
voxel_out_channel_per_frame = 32
voxel_out_indices = (0, 1, 2, 3)
my_voxel_out_indices = (0, 1, 2, 3)

model = dict(
    type='OCFNet',
    only_generate_dataset=only_generate_dataset,
    loss_norm=False,
    disable_loss_depth=True,
    point_cloud_range=point_cloud_range,
    time_receptive_field=time_receptive_field,
    n_future_frames=n_future_frames,
    n_future_frames_plus=n_future_frames_plus,
    max_label=num_cls,
    iou_thresh_for_vpq=iou_thresh_for_vpq,
    test_present=test_present,
    record_time=False,
    save_pred=save_pred,
    save_path=save_path,
    img_backbone=dict(
        pretrained='torchvision://resnet50',
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=0,
        with_cp=False,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        norm_eval=False,
        style='pytorch'),
    img_neck=dict(
        type='SECONDFPN',
        in_channels=[256, 512, 1024, 2048],
        upsample_strides=[0.25, 0.5, 1, 2],
        out_channels=[128, 128, 128, 128]),
    img_view_transformer=dict(type='ViewTransformerLiftSplatShootVoxel',
                              norm_cfg=dict(type='SyncBN', requires_grad=True),
                              loss_depth_weight=3.,
                              loss_depth_type='kld',
                              grid_config=grid_config,
                              data_config=data_config,
                              numC_Trans=numC_Trans,
                              vp_megvii=False),
    occ_encoder_backbone=dict(
        type='CustomResNet3D',
        depth=18,
        n_input_channels=occ_encoder_input_channel,
        block_inplanes=voxel_channels,
        out_indices=my_voxel_out_indices,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    occ_predictor=dict(
        type='Predictor',
        n_input_channels=pred_channels,
        in_timesteps=time_receptive_field,
        out_timesteps=n_future_frames_plus,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    occ_encoder_neck=dict(
        type='FPN3D',
        with_cp=False,
        in_channels=decoder_channels,
        out_channels=voxel_out_channel,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    flow_encoder_backbone=dict(
        type='CustomResNet3D',
        depth=18,
        n_input_channels=occ_encoder_input_channel,
        block_inplanes=voxel_channels,
        out_indices=my_voxel_out_indices,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    flow_predictor=dict(
        type='Predictor',
        n_input_channels=pred_channels,
        in_timesteps=time_receptive_field,
        out_timesteps=n_future_frames_plus,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    flow_encoder_neck=dict(
        type='FPN3D',
        with_cp=False,
        in_channels=decoder_channels,
        out_channels=flow_out_channel,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    flow_head=dict(
        type='FlowHead',
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        soft_weights=True, 
        final_occ_size=occ_size,
        fine_topk=15000,
        empty_idx=empty_idx,
        num_level=len(my_voxel_out_indices),
        in_channels=[voxel_out_channel_per_frame] * len(my_voxel_out_indices),
        out_channel=3,  # 3-dim flow
        point_cloud_range=point_cloud_range,
    ),
    pts_bbox_head=dict(
        type='OccHead',
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        soft_weights=True,
        final_occ_size=occ_size,
        fine_topk=15000,
        empty_idx=empty_idx,
        num_level=len(my_voxel_out_indices),
        in_channels=[voxel_out_channel_per_frame] * len(my_voxel_out_indices),
        out_channel=num_cls,
        point_cloud_range=point_cloud_range,
        loss_weight_cfg=dict(
            loss_voxel_ce_weight=1.0,
            loss_voxel_sem_scal_weight=1.0,
            loss_voxel_geo_scal_weight=1.0,
            loss_voxel_lovasz_weight=1.0,
        ),
    ),
    empty_idx=empty_idx,
)

# Learning policy params ******************************************
optimizer = dict(
    type='AdamW',
    lr=3e-4,   
    paramwise_cfg=dict(
        custom_keys={
            'img_backbone': dict(lr_mult=0.1),
        }),
    weight_decay=0.01)

optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
lr_config = dict(
    policy='CosineAnnealing',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    min_lr_ratio=1e-3)

runner = dict(type='EpochBasedRunner', max_epochs=24)
evaluation = dict(
    interval=1,
    pipeline=test_pipeline,
    save_best='SSC_mean',
    rule='greater',
)

custom_hooks = [
    dict(type='OccEfficiencyHook'),
]

================================================
FILE: projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.1_lyft.py
================================================
# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
# https://github.com/haomo-ai/Cam4DOcc
# 2 classes: inflated GMO and others

# Basic params ******************************************
_base_ = [
    '../datasets/custom_nus-3d.py',
    '../_base_/default_runtime.py'
]
find_unused_parameters = True
# whether training and test together with dataset generation
only_generate_dataset = False
# we only consider use_camera in Cam4DOcc in the current version
input_modality = dict(
    use_lidar=False,
    use_camera=True,
    use_radar=False,
    use_map=False,
    use_external=False)
plugin = True
plugin_dir = "projects/occ_plugin/"

# path unused for lyft
occ_path = " "
depth_gt_path = " "
train_ann_file = " "
val_ann_file = " "

cam4docc_dataset_path = "./data/cam4docc/"
nusc_root = './data/lyft/'
# GMO class names
class_names = ['vehicle', 'human']
use_separate_classes = False
use_fine_occ = False

# Forecasting-related params ******************************************
# we use *time_receptive_field* past frames to forecast future *n_future_frames* frames
# for 3D instance prediction, n_future_frames_plus > n_future_frames has to be set
time_receptive_field = 3
n_future_frames = 4
n_future_frames_plus = 6
iou_thresh_for_vpq = 0.2
test_present = False

# Occupancy-related params ******************************************
point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]
occ_size = [512, 512, 40]
lss_downsample = [4, 4, 4]
voxel_x = (point_cloud_range[3] - point_cloud_range[0]) / occ_size[0]
voxel_y = (point_cloud_range[4] - point_cloud_range[1]) / occ_size[1]
voxel_z = (point_cloud_range[5] - point_cloud_range[2]) / occ_size[2]
empty_idx = 0
if use_separate_classes:
    num_cls = len(class_names) + 1
else:
    num_cls = 2
img_norm_cfg = None

# Save params ******************************************
save_pred = False
save_path = "./data/cam4docc/results"

# Data-generation and pipeline params ******************************************
dataset_type = 'Cam4DOccLyftDataset'
file_client_args = dict(backend='disk')
data_config={
    'cams': ['CAM_FRONT_LEFT', 'CAM_FRONT', 'CAM_FRONT_RIGHT',
             'CAM_BACK_LEFT', 'CAM_BACK', 'CAM_BACK_RIGHT'],
    'Ncams': 6,
    'input_size': (896, 1600),
    'src_size': (900, 1600),
    # image-view augmentation
    'resize': (-0.06, 0.11),
    'rot': (-5.4, 5.4),
    'flip': False,
    'crop_h': (0.0, 0.0),
    'resize_test': 0.00,
}

bda_aug_conf = dict(
            rot_lim=(-0, 0),
            scale_lim=(0.95, 1.05),
            flip_dx_ratio=0.5,
            flip_dy_ratio=0.5)

train_capacity = 15720 # default: use all sequences
test_capacity = 5880 # default: use all sequences
train_pipeline = [
    dict(type='LoadInstanceWithFlow', cam4docc_dataset_path=cam4docc_dataset_path, grid_size=occ_size, use_flow=True, background=empty_idx, pc_range=point_cloud_range,
                use_separate_classes=use_separate_classes, use_lyft=True),
    dict(type='LoadMultiViewImageFromFiles_BEVDet', is_train=True, data_config=data_config,
                sequential=False, aligned=True, trans_only=False, depth_gt_path=depth_gt_path, data_root=nusc_root,
                mmlabnorm=True, load_depth=True, img_norm_cfg=img_norm_cfg, use_lyft=True),
    dict(type='LoadOccupancy', to_float32=True, occ_path=occ_path, grid_size=occ_size, unoccupied=empty_idx, pc_range=point_cloud_range, use_fine_occ=use_fine_occ, test_mode=False),
    dict(type='OccDefaultFormatBundle3D', class_names=class_names),
    dict(type='Collect3D', keys=['img_inputs_seq', 'gt_occ', 'segmentation', 'instance', 'flow', 'future_egomotion']),
]

test_pipeline = [
    dict(type='LoadInstanceWithFlow', cam4docc_dataset_path=cam4docc_dataset_path, grid_size=occ_size, use_flow=True, background=empty_idx, pc_range=point_cloud_range,
         use_separate_classes=use_separate_classes, use_lyft=True),
    dict(type='LoadMultiViewImageFromFiles_BEVDet', data_config=data_config, depth_gt_path=depth_gt_path, data_root=nusc_root,
         sequential=False, aligned=True, trans_only=False, mmlabnorm=True, img_norm_cfg=img_norm_cfg, test_mode=True, use_lyft=True),
    dict(type='LoadOccupancy', to_float32=True, occ_path=occ_path, grid_size=occ_size, unoccupied=empty_idx, pc_range=point_cloud_range, use_fine_occ=use_fine_occ, test_mode=True),
    dict(type='OccDefaultFormatBundle3D', class_names=class_names, with_label=False), 
    dict(type='Collect3D', keys=['img_inputs_seq', 'gt_occ', 'segmentation', 'instance', 'flow', 'future_egomotion'], meta_keys=['pc_range', 'occ_size', 'scene_token', 'lidar_token']),
]

train_config=dict(
        type=dataset_type,
        data_root=nusc_root,
        occ_root=occ_path,
        idx_root=cam4docc_dataset_path,
        ori_data_root=cam4docc_dataset_path,
        ann_file=train_ann_file,
        pipeline=train_pipeline,
        classes=class_names,
        use_separate_classes=use_separate_classes,
        modality=input_modality,
        test_mode=False,
        use_valid_flag=True,
        occ_size=occ_size,
        pc_range=point_cloud_range,
        box_type_3d='LiDAR',
        time_receptive_field=time_receptive_field,
        n_future_frames=n_future_frames,
        train_capacity=train_capacity,
        test_capacity=test_capacity,
        ) 

test_config=dict(
    type=dataset_type,
    occ_root=occ_path,
    data_root=nusc_root,
    idx_root=cam4docc_dataset_path,
    ori_data_root=cam4docc_dataset_path,
    ann_file=val_ann_file,
    pipeline=test_pipeline,
    classes=class_names,
    use_separate_classes=use_separate_classes,
    modality=input_modality,
    occ_size=occ_size,
    pc_range=point_cloud_range,
    time_receptive_field=time_receptive_field, 
    n_future_frames=n_future_frames,
    train_capacity=train_capacity,
    test_capacity=test_capacity,
    ) 

# in our work we use 8 NVIDIA A100 GPUs
data = dict(
    samples_per_gpu=1,
    workers_per_gpu=1,
    train=train_config,
    val=test_config,
    test=test_config,
    shuffler_sampler=dict(type='DistributedGroupSampler'),
    nonshuffler_sampler=dict(type='DistributedSampler'),
)

# Model params ******************************************
grid_config = {
    'xbound': [point_cloud_range[0], point_cloud_range[3], voxel_x*lss_downsample[0]],
    'ybound': [point_cloud_range[1], point_cloud_range[4], voxel_y*lss_downsample[1]],
    'zbound': [point_cloud_range[2], point_cloud_range[5], voxel_z*lss_downsample[2]],
    'dbound': [2.0, 58.0, 0.5],
}

voxel_channels = [32*(time_receptive_field), 32*2*(time_receptive_field), 32*4*(time_receptive_field), 32*8*(time_receptive_field)]
pred_channels = [32, 32*2, 32*4, 32*8]
decoder_channels = [32*(n_future_frames_plus), 32*2*(n_future_frames_plus), 32*4*(n_future_frames_plus), 32*8*(n_future_frames_plus)]

numC_Trans = 64
occ_encoder_input_channel = (numC_Trans+6)*time_receptive_field
voxel_out_channel = 32*(n_future_frames_plus)
flow_out_channel = 32*(n_future_frames_plus)
voxel_out_channel_per_frame = 32
voxel_out_indices = (0, 1, 2, 3)
my_voxel_out_indices = (0, 1, 2, 3)

model = dict(
    type='OCFNet',
    only_generate_dataset=only_generate_dataset,
    loss_norm=False,
    disable_loss_depth=True,
    point_cloud_range=point_cloud_range,
    time_receptive_field=time_receptive_field,
    n_future_frames=n_future_frames,
    n_future_frames_plus=n_future_frames_plus,
    max_label=num_cls,
    iou_thresh_for_vpq=iou_thresh_for_vpq,
    test_present=test_present,
    record_time=False,
    save_pred=save_pred,
    save_path=save_path,
    img_backbone=dict(
        pretrained='torchvision://resnet50',
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=0,
        with_cp=False,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        norm_eval=False,
        style='pytorch'),
    img_neck=dict(
        type='SECONDFPN',
        in_channels=[256, 512, 1024, 2048],
        upsample_strides=[0.25, 0.5, 1, 2],
        out_channels=[128, 128, 128, 128]),
    img_view_transformer=dict(type='ViewTransformerLiftSplatShootVoxel',
                              norm_cfg=dict(type='SyncBN', requires_grad=True),
                              loss_depth_weight=3.,
                              loss_depth_type='kld',
                              grid_config=grid_config,
                              data_config=data_config,
                              numC_Trans=numC_Trans,
                              vp_megvii=False),
    occ_encoder_backbone=dict(
        type='CustomResNet3D',
        depth=18,
        n_input_channels=occ_encoder_input_channel,
        block_inplanes=voxel_channels,
        out_indices=my_voxel_out_indices,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    occ_predictor=dict(
        type='Predictor',
        n_input_channels=pred_channels,
        in_timesteps=time_receptive_field,
        out_timesteps=n_future_frames_plus,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    occ_encoder_neck=dict(
        type='FPN3D',
        with_cp=False,
        in_channels=decoder_channels,
        out_channels=voxel_out_channel,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    flow_encoder_backbone=dict(
        type='CustomResNet3D',
        depth=18,
        n_input_channels=occ_encoder_input_channel,
        block_inplanes=voxel_channels,
        out_indices=my_voxel_out_indices,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    flow_predictor=dict(
        type='Predictor',
        n_input_channels=pred_channels,
        in_timesteps=time_receptive_field,
        out_timesteps=n_future_frames_plus,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    flow_encoder_neck=dict(
        type='FPN3D',
        with_cp=False,
        in_channels=decoder_channels,
        out_channels=flow_out_channel,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    flow_head=dict(
        type='FlowHead',
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        soft_weights=True, 
        final_occ_size=occ_size,
        fine_topk=15000,
        empty_idx=empty_idx,
        num_level=len(my_voxel_out_indices),
        in_channels=[voxel_out_channel_per_frame] * len(my_voxel_out_indices),
        out_channel=3,  # 3-dim flow
        point_cloud_range=point_cloud_range,
    ),
    pts_bbox_head=dict(
        type='OccHead',
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        soft_weights=True,
        final_occ_size=occ_size,
        fine_topk=15000,
        empty_idx=empty_idx,
        num_level=len(my_voxel_out_indices),
        in_channels=[voxel_out_channel_per_frame] * len(my_voxel_out_indices),
        out_channel=num_cls,
        point_cloud_range=point_cloud_range,
        loss_weight_cfg=dict(
            loss_voxel_ce_weight=1.0,
            loss_voxel_sem_scal_weight=1.0,
            loss_voxel_geo_scal_weight=1.0,
            loss_voxel_lovasz_weight=1.0,
        ),
    ),
    empty_idx=empty_idx,
)

# Learning policy params ******************************************
optimizer = dict(
    type='AdamW',
    lr=3e-4,   
    paramwise_cfg=dict(
        custom_keys={
            'img_backbone': dict(lr_mult=0.1),
        }),
    weight_decay=0.01)

optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
lr_config = dict(
    policy='CosineAnnealing',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    min_lr_ratio=1e-3)

runner = dict(type='EpochBasedRunner', max_epochs=24)
evaluation = dict(
    interval=1,
    pipeline=test_pipeline,
    save_best='SSC_mean',
    rule='greater',
)

custom_hooks = [
    dict(type='OccEfficiencyHook'),
]

================================================
FILE: projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.2.py
================================================
# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
# https://github.com/haomo-ai/Cam4DOcc
# multiple classes: inflated multiple MO classes

# Basic params ******************************************
_base_ = [
    '../datasets/custom_nus-3d.py',
    '../_base_/default_runtime.py'
]
find_unused_parameters = True
# whether training and test together with dataset generation
only_generate_dataset = False
# we only consider use_camera in Cam4DOcc in the current version
input_modality = dict(
    use_lidar=False,
    use_camera=True,
    use_radar=False,
    use_map=False,
    use_external=False)
plugin = True
plugin_dir = "projects/occ_plugin/"
occ_path = "./data/nuScenes-Occupancy"
depth_gt_path = './data/depth_gt'
train_ann_file = "./data/nuscenes/nuscenes_occ_infos_train.pkl"
val_ann_file = "./data/nuscenes/nuscenes_occ_infos_val.pkl"
cam4docc_dataset_path = "./data/cam4docc/"
nusc_root = './data/nuscenes/'
# GMO class names
class_names = [
    'vehicle.bicycle', 'bus', 'car', 'construction', 'motorcycle', 'trailer', 'truck', 'pedestrian'
]
use_separate_classes = True
use_fine_occ = False

# Forecasting-related params ******************************************
# we use *time_receptive_field* past frames to forecast future *n_future_frames* frames
# for 3D instance prediction, n_future_frames_plus > n_future_frames has to be set
time_receptive_field = 3
n_future_frames = 4
n_future_frames_plus = 6
iou_thresh_for_vpq = 0.2
test_present = False

# Occupancy-related params ******************************************
point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]
occ_size = [512, 512, 40]
lss_downsample = [4, 4, 4]
voxel_x = (point_cloud_range[3] - point_cloud_range[0]) / occ_size[0]
voxel_y = (point_cloud_range[4] - point_cloud_range[1]) / occ_size[1]
voxel_z = (point_cloud_range[5] - point_cloud_range[2]) / occ_size[2]
empty_idx = 0
if use_separate_classes:
    num_cls = len(class_names) + 1
else:
    num_cls = 2
img_norm_cfg = None

# Save params ******************************************
save_pred = False
save_path = "./data/cam4docc/results"

# Data-generation and pipeline params ******************************************
dataset_type = 'Cam4DOccDataset'
file_client_args = dict(backend='disk')
data_config={
    'cams': ['CAM_FRONT_LEFT', 'CAM_FRONT', 'CAM_FRONT_RIGHT',
             'CAM_BACK_LEFT', 'CAM_BACK', 'CAM_BACK_RIGHT'],
    'Ncams': 6,
    'input_size': (896, 1600),
    'src_size': (900, 1600),
    # image-view augmentation
    'resize': (-0.06, 0.11),
    'rot': (-5.4, 5.4),
    'flip': False,
    'crop_h': (0.0, 0.0),
    'resize_test': 0.00,
}

bda_aug_conf = dict(
            rot_lim=(-0, 0),
            scale_lim=(0.95, 1.05),
            flip_dx_ratio=0.5,
            flip_dy_ratio=0.5)


train_capacity = 23930 # default: use all sequences
test_capacity = 5119 # default: use all sequences
train_pipeline = [
    dict(type='LoadInstanceWithFlow', cam4docc_dataset_path=cam4docc_dataset_path, grid_size=occ_size, use_flow=True, background=empty_idx, pc_range=point_cloud_range,
                use_separate_classes=use_separate_classes),
    dict(type='LoadMultiViewImageFromFiles_BEVDet', is_train=True, data_config=data_config,
                sequential=False, aligned=True, trans_only=False, depth_gt_path=depth_gt_path, data_root=nusc_root,
                mmlabnorm=True, load_depth=True, img_norm_cfg=img_norm_cfg),
    dict(type='LoadOccupancy', to_float32=True, occ_path=occ_path, grid_size=occ_size, unoccupied=empty_idx, pc_range=point_cloud_range, use_fine_occ=use_fine_occ, test_mode=False),
    dict(type='OccDefaultFormatBundle3D', class_names=class_names),
    dict(type='Collect3D', keys=['img_inputs_seq', 'gt_occ', 'segmentation', 'instance', 'flow', 'future_egomotion']),
]

test_pipeline = [
    dict(type='LoadInstanceWithFlow', cam4docc_dataset_path=cam4docc_dataset_path, grid_size=occ_size, use_flow=True, background=empty_idx, pc_range=point_cloud_range,
         use_separate_classes=use_separate_classes),
    dict(type='LoadMultiViewImageFromFiles_BEVDet', data_config=data_config, depth_gt_path=depth_gt_path, data_root=nusc_root,
         sequential=False, aligned=True, trans_only=False, mmlabnorm=True, img_norm_cfg=img_norm_cfg, test_mode=True),
    dict(type='LoadOccupancy', to_float32=True, occ_path=occ_path, grid_size=occ_size, unoccupied=empty_idx, pc_range=point_cloud_range, use_fine_occ=use_fine_occ, test_mode=True),
    dict(type='OccDefaultFormatBundle3D', class_names=class_names, with_label=False), 
    dict(type='Collect3D', keys=['img_inputs_seq', 'gt_occ', 'segmentation', 'instance', 'flow', 'future_egomotion'], meta_keys=['pc_range', 'occ_size', 'scene_token', 'lidar_token']),
]

train_config=dict(
        type=dataset_type,
        data_root=nusc_root,
        occ_root=occ_path,
        idx_root=cam4docc_dataset_path,
        ori_data_root=cam4docc_dataset_path,
        ann_file=train_ann_file,
        pipeline=train_pipeline,
        classes=class_names,
        use_separate_classes=use_separate_classes,
        modality=input_modality,
        test_mode=False,
        use_valid_flag=True,
        occ_size=occ_size,
        pc_range=point_cloud_range,
        box_type_3d='LiDAR',
        time_receptive_field=time_receptive_field,
        n_future_frames=n_future_frames,
        train_capacity=train_capacity,
        test_capacity=test_capacity,
        ) 

test_config=dict(
    type=dataset_type,
    occ_root=occ_path,
    data_root=nusc_root,
    idx_root=cam4docc_dataset_path,
    ori_data_root=cam4docc_dataset_path,
    ann_file=val_ann_file,
    pipeline=test_pipeline,
    classes=class_names,
    use_separate_classes=use_separate_classes,
    modality=input_modality,
    occ_size=occ_size,
    pc_range=point_cloud_range,
    time_receptive_field=time_receptive_field, 
    n_future_frames=n_future_frames,
    train_capacity=train_capacity,
    test_capacity=test_capacity,
    ) 

# in our work we use 8 NVIDIA A100 GPUs
data = dict(
    samples_per_gpu=1,
    workers_per_gpu=1,
    train=train_config,
    val=test_config,
    test=test_config,
    shuffler_sampler=dict(type='DistributedGroupSampler'),
    nonshuffler_sampler=dict(type='DistributedSampler'),
)

# Model params ******************************************
grid_config = {
    'xbound': [point_cloud_range[0], point_cloud_range[3], voxel_x*lss_downsample[0]],
    'ybound': [point_cloud_range[1], point_cloud_range[4], voxel_y*lss_downsample[1]],
    'zbound': [point_cloud_range[2], point_cloud_range[5], voxel_z*lss_downsample[2]],
    'dbound': [2.0, 58.0, 0.5],
}

voxel_channels = [32*(time_receptive_field), 32*2*(time_receptive_field), 32*4*(time_receptive_field), 32*8*(time_receptive_field)]
pred_channels = [32, 32*2, 32*4, 32*8]
decoder_channels = [32*(n_future_frames_plus), 32*2*(n_future_frames_plus), 32*4*(n_future_frames_plus), 32*8*(n_future_frames_plus)]

numC_Trans = 64
occ_encoder_input_channel = (numC_Trans+6)*time_receptive_field
voxel_out_channel = 32*(n_future_frames_plus)
flow_out_channel = 32*(n_future_frames_plus)
voxel_out_channel_per_frame = 32
voxel_out_indices = (0, 1, 2, 3)
my_voxel_out_indices = (0, 1, 2, 3)

model = dict(
    type='OCFNet',
    only_generate_dataset=only_generate_dataset,
    loss_norm=False,
    disable_loss_depth=True,
    point_cloud_range=point_cloud_range,
    time_receptive_field=time_receptive_field,
    n_future_frames=n_future_frames,
    n_future_frames_plus=n_future_frames_plus,
    max_label=num_cls,
    iou_thresh_for_vpq=iou_thresh_for_vpq,
    test_present=test_present,
    record_time=False,
    save_pred=save_pred,
    save_path=save_path,
    img_backbone=dict(
        pretrained='torchvision://resnet50',
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=0,
        with_cp=False,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        norm_eval=False,
        style='pytorch'),
    img_neck=dict(
        type='SECONDFPN',
        in_channels=[256, 512, 1024, 2048],
        upsample_strides=[0.25, 0.5, 1, 2],
        out_channels=[128, 128, 128, 128]),
    img_view_transformer=dict(type='ViewTransformerLiftSplatShootVoxel',
                              norm_cfg=dict(type='SyncBN', requires_grad=True),
                              loss_depth_weight=3.,
                              loss_depth_type='kld',
                              grid_config=grid_config,
                              data_config=data_config,
                              numC_Trans=numC_Trans,
                              vp_megvii=False),
    occ_encoder_backbone=dict(
        type='CustomResNet3D',
        depth=18,
        n_input_channels=occ_encoder_input_channel,
        block_inplanes=voxel_channels,
        out_indices=my_voxel_out_indices,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    occ_predictor=dict(
        type='Predictor',
        n_input_channels=pred_channels,
        in_timesteps=time_receptive_field,
        out_timesteps=n_future_frames_plus,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    occ_encoder_neck=dict(
        type='FPN3D',
        with_cp=False,
        in_channels=decoder_channels,
        out_channels=voxel_out_channel,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    flow_encoder_backbone=dict(
        type='CustomResNet3D',
        depth=18,
        n_input_channels=occ_encoder_input_channel,
        block_inplanes=voxel_channels,
        out_indices=my_voxel_out_indices,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    flow_predictor=dict(
        type='Predictor',
        n_input_channels=pred_channels,
        in_timesteps=time_receptive_field,
        out_timesteps=n_future_frames_plus,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    flow_encoder_neck=dict(
        type='FPN3D',
        with_cp=False,
        in_channels=decoder_channels,
        out_channels=flow_out_channel,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    flow_head=dict(
        type='FlowHead',
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        soft_weights=True, 
        final_occ_size=occ_size,
        fine_topk=15000,
        empty_idx=empty_idx,
        num_level=len(my_voxel_out_indices),
        in_channels=[voxel_out_channel_per_frame] * len(my_voxel_out_indices),
        out_channel=3,  # 3-dim flow
        point_cloud_range=point_cloud_range,
    ),
    pts_bbox_head=dict(
        type='OccHead',
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        soft_weights=True,
        final_occ_size=occ_size,
        fine_topk=15000,
        empty_idx=empty_idx,
        num_level=len(my_voxel_out_indices),
        in_channels=[voxel_out_channel_per_frame] * len(my_voxel_out_indices),
        out_channel=num_cls,
        point_cloud_range=point_cloud_range,
        loss_weight_cfg=dict(
            loss_voxel_ce_weight=1.0,
            loss_voxel_sem_scal_weight=1.0,
            loss_voxel_geo_scal_weight=1.0,
            loss_voxel_lovasz_weight=1.0,
        ),
    ),
    empty_idx=empty_idx,
)

# Learning policy params ******************************************
optimizer = dict(
    type='AdamW',
    lr=3e-4,   
    paramwise_cfg=dict(
        custom_keys={
            'img_backbone': dict(lr_mult=0.1),
        }),
    weight_decay=0.01)

optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
lr_config = dict(
    policy='CosineAnnealing',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    min_lr_ratio=1e-3)

runner = dict(type='EpochBasedRunner', max_epochs=24)
evaluation = dict(
    interval=1,
    pipeline=test_pipeline,
    save_best='SSC_mean',
    rule='greater',
)

custom_hooks = [
    dict(type='OccEfficiencyHook'),
]

================================================
FILE: projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.2_lyft.py
================================================
# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
# https://github.com/haomo-ai/Cam4DOcc
# multiple classes: inflated multiple MO classes

# Basic params ******************************************
_base_ = [
    '../datasets/custom_nus-3d.py',
    '../_base_/default_runtime.py'
]
find_unused_parameters = True
# whether training and test together with dataset generation
only_generate_dataset = False
# we only consider use_camera in Cam4DOcc in the current version
input_modality = dict(
    use_lidar=False,
    use_camera=True,
    use_radar=False,
    use_map=False,
    use_external=False)
plugin = True
plugin_dir = "projects/occ_plugin/"

# path unused for lyft
occ_path = " "
depth_gt_path = " "
train_ann_file = " "
val_ann_file = " "

cam4docc_dataset_path = "./data/cam4docc/"
nusc_root = './data/lyft/'
# GMO class names
# refine the classes for lyft datasets according to your needs
class_names = [
    'bicycle', 'bus', 'car', 'construction', 'motorcycle', 'trailer', 'truck', 'pedestrian'
]
use_separate_classes = True
use_fine_occ = False

# Forecasting-related params ******************************************
# we use *time_receptive_field* past frames to forecast future *n_future_frames* frames
# for 3D instance prediction, n_future_frames_plus > n_future_frames has to be set
time_receptive_field = 3
n_future_frames = 4
n_future_frames_plus = 6
iou_thresh_for_vpq = 0.2
test_present = False

# Occupancy-related params ******************************************
point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]
occ_size = [512, 512, 40]
lss_downsample = [4, 4, 4]
voxel_x = (point_cloud_range[3] - point_cloud_range[0]) / occ_size[0]
voxel_y = (point_cloud_range[4] - point_cloud_range[1]) / occ_size[1]
voxel_z = (point_cloud_range[5] - point_cloud_range[2]) / occ_size[2]
empty_idx = 0
if use_separate_classes:
    num_cls = len(class_names) + 1
else:
    num_cls = 2
img_norm_cfg = None

# Save params ******************************************
save_pred = False
save_path = "./data/cam4docc/results"

# Data-generation and pipeline params ******************************************
dataset_type = 'Cam4DOccLyftDataset'
file_client_args = dict(backend='disk')
data_config={
    'cams': ['CAM_FRONT_LEFT', 'CAM_FRONT', 'CAM_FRONT_RIGHT',
             'CAM_BACK_LEFT', 'CAM_BACK', 'CAM_BACK_RIGHT'],
    'Ncams': 6,
    'input_size': (896, 1600),
    'src_size': (900, 1600),
    # image-view augmentation
    'resize': (-0.06, 0.11),
    'rot': (-5.4, 5.4),
    'flip': False,
    'crop_h': (0.0, 0.0),
    'resize_test': 0.00,
}

bda_aug_conf = dict(
            rot_lim=(-0, 0),
            scale_lim=(0.95, 1.05),
            flip_dx_ratio=0.5,
            flip_dy_ratio=0.5)


train_capacity = 15720 # default: use all sequences
test_capacity = 5880 # default: use all sequences
train_pipeline = [
    dict(type='LoadInstanceWithFlow', cam4docc_dataset_path=cam4docc_dataset_path, grid_size=occ_size, use_flow=True, background=empty_idx, pc_range=point_cloud_range,
                use_separate_classes=use_separate_classes, use_lyft=True),
    dict(type='LoadMultiViewImageFromFiles_BEVDet', is_train=True, data_config=data_config,
                sequential=False, aligned=True, trans_only=False, depth_gt_path=depth_gt_path, data_root=nusc_root,
                mmlabnorm=True, load_depth=True, img_norm_cfg=img_norm_cfg, use_lyft=True),
    dict(type='LoadOccupancy', to_float32=True, occ_path=occ_path, grid_size=occ_size, unoccupied=empty_idx, pc_range=point_cloud_range, use_fine_occ=use_fine_occ, test_mode=False),
    dict(type='OccDefaultFormatBundle3D', class_names=class_names),
    dict(type='Collect3D', keys=['img_inputs_seq', 'gt_occ', 'segmentation', 'instance', 'flow', 'future_egomotion']),
]

test_pipeline = [
    dict(type='LoadInstanceWithFlow', cam4docc_dataset_path=cam4docc_dataset_path, grid_size=occ_size, use_flow=True, background=empty_idx, pc_range=point_cloud_range,
         use_separate_classes=use_separate_classes, use_lyft=True),
    dict(type='LoadMultiViewImageFromFiles_BEVDet', data_config=data_config, depth_gt_path=depth_gt_path, data_root=nusc_root,
         sequential=False, aligned=True, trans_only=False, mmlabnorm=True, img_norm_cfg=img_norm_cfg, test_mode=True, use_lyft=True),
    dict(type='LoadOccupancy', to_float32=True, occ_path=occ_path, grid_size=occ_size, unoccupied=empty_idx, pc_range=point_cloud_range, use_fine_occ=use_fine_occ, test_mode=True),
    dict(type='OccDefaultFormatBundle3D', class_names=class_names, with_label=False), 
    dict(type='Collect3D', keys=['img_inputs_seq', 'gt_occ', 'segmentation', 'instance', 'flow', 'future_egomotion'], meta_keys=['pc_range', 'occ_size', 'scene_token', 'lidar_token']),
]

train_config=dict(
        type=dataset_type,
        data_root=nusc_root,
        occ_root=occ_path,
        idx_root=cam4docc_dataset_path,
        ori_data_root=cam4docc_dataset_path,
        ann_file=train_ann_file,
        pipeline=train_pipeline,
        classes=class_names,
        use_separate_classes=use_separate_classes,
        modality=input_modality,
        test_mode=False,
        use_valid_flag=True,
        occ_size=occ_size,
        pc_range=point_cloud_range,
        box_type_3d='LiDAR',
        time_receptive_field=time_receptive_field,
        n_future_frames=n_future_frames,
        train_capacity=train_capacity,
        test_capacity=test_capacity,
        ) 

test_config=dict(
    type=dataset_type,
    occ_root=occ_path,
    data_root=nusc_root,
    idx_root=cam4docc_dataset_path,
    ori_data_root=cam4docc_dataset_path,
    ann_file=val_ann_file,
    pipeline=test_pipeline,
    classes=class_names,
    use_separate_classes=use_separate_classes,
    modality=input_modality,
    occ_size=occ_size,
    pc_range=point_cloud_range,
    time_receptive_field=time_receptive_field, 
    n_future_frames=n_future_frames,
    train_capacity=train_capacity,
    test_capacity=test_capacity,
    ) 

# in our work we use 8 NVIDIA A100 GPUs
data = dict(
    samples_per_gpu=1,
    workers_per_gpu=1,
    train=train_config,
    val=test_config,
    test=test_config,
    shuffler_sampler=dict(type='DistributedGroupSampler'),
    nonshuffler_sampler=dict(type='DistributedSampler'),
)

# Model params ******************************************
grid_config = {
    'xbound': [point_cloud_range[0], point_cloud_range[3], voxel_x*lss_downsample[0]],
    'ybound': [point_cloud_range[1], point_cloud_range[4], voxel_y*lss_downsample[1]],
    'zbound': [point_cloud_range[2], point_cloud_range[5], voxel_z*lss_downsample[2]],
    'dbound': [2.0, 58.0, 0.5],
}

voxel_channels = [32*(time_receptive_field), 32*2*(time_receptive_field), 32*4*(time_receptive_field), 32*8*(time_receptive_field)]
pred_channels = [32, 32*2, 32*4, 32*8]
decoder_channels = [32*(n_future_frames_plus), 32*2*(n_future_frames_plus), 32*4*(n_future_frames_plus), 32*8*(n_future_frames_plus)]

numC_Trans = 64
occ_encoder_input_channel = (numC_Trans+6)*time_receptive_field
voxel_out_channel = 32*(n_future_frames_plus)
flow_out_channel = 32*(n_future_frames_plus)
voxel_out_channel_per_frame = 32
voxel_out_indices = (0, 1, 2, 3)
my_voxel_out_indices = (0, 1, 2, 3)

model = dict(
    type='OCFNet',
    only_generate_dataset=only_generate_dataset,
    loss_norm=False,
    disable_loss_depth=True,
    point_cloud_range=point_cloud_range,
    time_receptive_field=time_receptive_field,
    n_future_frames=n_future_frames,
    n_future_frames_plus=n_future_frames_plus,
    max_label=num_cls,
    iou_thresh_for_vpq=iou_thresh_for_vpq,
    test_present=test_present,
    record_time=False,
    save_pred=save_pred,
    save_path=save_path,
    img_backbone=dict(
        pretrained='torchvision://resnet50',
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=0,
        with_cp=False,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        norm_eval=False,
        style='pytorch'),
    img_neck=dict(
        type='SECONDFPN',
        in_channels=[256, 512, 1024, 2048],
        upsample_strides=[0.25, 0.5, 1, 2],
        out_channels=[128, 128, 128, 128]),
    img_view_transformer=dict(type='ViewTransformerLiftSplatShootVoxel',
                              norm_cfg=dict(type='SyncBN', requires_grad=True),
                              loss_depth_weight=3.,
                              loss_depth_type='kld',
                              grid_config=grid_config,
                              data_config=data_config,
                              numC_Trans=numC_Trans,
                              vp_megvii=False),
    occ_encoder_backbone=dict(
        type='CustomResNet3D',
        depth=18,
        n_input_channels=occ_encoder_input_channel,
        block_inplanes=voxel_channels,
        out_indices=my_voxel_out_indices,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    occ_predictor=dict(
        type='Predictor',
        n_input_channels=pred_channels,
        in_timesteps=time_receptive_field,
        out_timesteps=n_future_frames_plus,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    occ_encoder_neck=dict(
        type='FPN3D',
        with_cp=False,
        in_channels=decoder_channels,
        out_channels=voxel_out_channel,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    flow_encoder_backbone=dict(
        type='CustomResNet3D',
        depth=18,
        n_input_channels=occ_encoder_input_channel,
        block_inplanes=voxel_channels,
        out_indices=my_voxel_out_indices,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    flow_predictor=dict(
        type='Predictor',
        n_input_channels=pred_channels,
        in_timesteps=time_receptive_field,
        out_timesteps=n_future_frames_plus,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    flow_encoder_neck=dict(
        type='FPN3D',
        with_cp=False,
        in_channels=decoder_channels,
        out_channels=flow_out_channel,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
    ),
    flow_head=dict(
        type='FlowHead',
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        soft_weights=True, 
        final_occ_size=occ_size,
        fine_topk=15000,
        empty_idx=empty_idx,
        num_level=len(my_voxel_out_indices),
        in_channels=[voxel_out_channel_per_frame] * len(my_voxel_out_indices),
        out_channel=3,  # 3-dim flow
        point_cloud_range=point_cloud_range,
    ),
    pts_bbox_head=dict(
        type='OccHead',
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        soft_weights=True,
        final_occ_size=occ_size,
        fine_topk=15000,
        empty_idx=empty_idx,
        num_level=len(my_voxel_out_indices),
        in_channels=[voxel_out_channel_per_frame] * len(my_voxel_out_indices),
        out_channel=num_cls,
        point_cloud_range=point_cloud_range,
        loss_weight_cfg=dict(
            loss_voxel_ce_weight=1.0,
            loss_voxel_sem_scal_weight=1.0,
            loss_voxel_geo_scal_weight=1.0,
            loss_voxel_lovasz_weight=1.0,
        ),
    ),
    empty_idx=empty_idx,
)

# Learning policy params ******************************************
optimizer = dict(
    type='AdamW',
    lr=3e-4,   
    paramwise_cfg=dict(
        custom_keys={
            'img_backbone': dict(lr_mult=0.1),
        }),
    weight_decay=0.01)

optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
lr_config = dict(
    policy='CosineAnnealing',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    min_lr_ratio=1e-3)

runner = dict(type='EpochBasedRunner', max_epochs=24)
evaluation = dict(
    interval=1,
    pipeline=test_pipeline,
    save_best='SSC_mean',
    rule='greater',
)

custom_hooks = [
    dict(type='OccEfficiencyHook'),
]

================================================
FILE: projects/configs/datasets/custom_nus-3d.py
================================================
# If point cloud range is changed, the models should also change their point
# cloud range accordingly
point_cloud_range = [-50, -50, -5, 50, 50, 3]
# For nuScenes we usually do 10-class detection
class_names = [
    'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle',
    'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'
]
dataset_type = 'NuScenesDataset_eval_modified'
data_root = 'data/nuscenes/'
# Input modality for nuScenes dataset, this is consistent with the submission
# format which requires the information in input_modality.
input_modality = dict(
    use_lidar=True,
    use_camera=False,
    use_radar=False,
    use_map=False,
    use_external=False)
file_client_args = dict(backend='disk')
# Uncomment the following if use ceph or other file clients.
# See https://mmcv.readthedocs.io/en/latest/api.html#mmcv.fileio.FileClient
# for more details.
# file_client_args = dict(
#     backend='petrel',
#     path_mapping=dict({
#         './data/nuscenes/': 's3://nuscenes/nuscenes/',
#         'data/nuscenes/': 's3://nuscenes/nuscenes/'
#     }))
train_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=5,
        use_dim=5,
        file_client_args=file_client_args),
    dict(
        type='LoadPointsFromMultiSweeps',
        sweeps_num=10,
        file_client_args=file_client_args),
    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
    dict(
        type='GlobalRotScaleTrans',
        rot_range=[-0.3925, 0.3925],
        scale_ratio_range=[0.95, 1.05],
        translation_std=[0, 0, 0]),
    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
    dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='ObjectNameFilter', classes=class_names),
    dict(type='PointShuffle'),
    dict(type='DefaultFormatBundle3D', class_names=class_names),
    dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
test_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=5,
        use_dim=5,
        file_client_args=file_client_args),
    dict(
        type='LoadPointsFromMultiSweeps',
        sweeps_num=10,
        file_client_args=file_client_args),
    dict(
        type='MultiScaleFlipAug3D',
        img_scale=(1333, 800),
        pts_scale_ratio=1,
        flip=False,
        transforms=[
            dict(
                type='GlobalRotScaleTrans',
                rot_range=[0, 0],
                scale_ratio_range=[1., 1.],
                translation_std=[0, 0, 0]),
            dict(type='RandomFlip3D'),
            dict(
                type='PointsRangeFilter', point_cloud_range=point_cloud_range),
            dict(
                type='DefaultFormatBundle3D',
                class_names=class_names,
                with_label=False),
            dict(type='Collect3D', keys=['points'])
        ])
]
# construct a pipeline for data and gt loading in show function
# please keep its loading function consistent with test_pipeline (e.g. client)
eval_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=5,
        use_dim=5,
        file_client_args=file_client_args),
    dict(
        type='LoadPointsFromMultiSweeps',
        sweeps_num=10,
        file_client_args=file_client_args),
    dict(
        type='DefaultFormatBundle3D',
        class_names=class_names,
        with_label=False),
    dict(type='Collect3D', keys=['points'])
]

data = dict(
    samples_per_gpu=4,
    workers_per_gpu=4,
    train=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'nuscenes_infos_train.pkl',
        pipeline=train_pipeline,
        classes=class_names,
        modality=input_modality,
        test_mode=False,
        # we use box_type_3d='LiDAR' in kitti and nuscenes dataset
        # and box_type_3d='Depth' in sunrgbd and scannet dataset.
        box_type_3d='LiDAR'),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'nuscenes_infos_val.pkl',
        pipeline=test_pipeline,
        classes=class_names,
        modality=input_modality,
        test_mode=True,
        box_type_3d='LiDAR'),
    test=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'nuscenes_infos_val.pkl',
        pipeline=test_pipeline,
        classes=class_names,
        modality=input_modality,
        test_mode=True,
        box_type_3d='LiDAR'))
# For nuScenes dataset, we usually evaluate the model at the end of training.
# Since the models are trained by 24 epochs by default, we set evaluation
# interval to be 24. Please change the interval accordingly if you do not
# use a default schedule.
evaluation = dict(interval=24, pipeline=eval_pipeline)


================================================
FILE: projects/occ_plugin/__init__.py
================================================
from .core.evaluation.eval_hooks import OccDistEvalHook, OccEvalHook
from .core.evaluation.efficiency_hooks import OccEfficiencyHook
from .core.visualizer import save_occ
from .datasets.pipelines import (
  PhotoMetricDistortionMultiViewImage, PadMultiViewImage, 
  NormalizeMultiviewImage,  CustomCollect3D)
from .occupancy import *

================================================
FILE: projects/occ_plugin/core/__init__.py
================================================
from .evaluation import *
from .visualizer import *

================================================
FILE: projects/occ_plugin/core/evaluation/__init__.py
================================================
from .eval_hooks import OccDistEvalHook, OccEvalHook
from .efficiency_hooks import OccEfficiencyHook

================================================
FILE: projects/occ_plugin/core/evaluation/efficiency_hooks.py
================================================
import copy
from mmcv.runner import HOOKS, Hook
import time
try:
    from mmcv.cnn import get_model_complexity_info
except ImportError:
    raise ImportError('Please upgrade mmcv to >0.6.2')
import torch
import torch.distributed as dist

@HOOKS.register_module()
class OccEfficiencyHook(Hook):
    def __init__(self, dataloader,  **kwargs):
        self.dataloader = dataloader 
        self.warm_up = 5
        
    def construct_input(self, DUMMY_SHAPE=None, m_info=None):
        if m_info is None:
            m_info = next(iter(self.dataloader))
        img_metas = m_info['img_metas'].data
        input = dict(
            img_metas=img_metas,
        )
        if 'img_inputs' in m_info.keys():
            img_inputs = m_info['img_inputs']
            for i in range(len(img_inputs)):
                if isinstance(img_inputs[i], list):
                    for j in range(len(img_inputs[i])):
                        img_inputs[i][j] = img_inputs[i][j].cuda()
                else:
                    img_inputs[i] = img_inputs[i].cuda()
            input['img_inputs'] = img_inputs
            
        if 'points' in m_info.keys():
            points = m_info['points'].data[0]
            points[0] = points[0].cuda()
            input['points'] = points
        return input
    
    def before_run(self, runner):
        torch.cuda.reset_peak_memory_stats()
        
        # model = copy.deepcopy(runner.model)
        # if hasattr(model, 'module'):
        #     model = model.module
        # if hasattr(model, 'forward_dummy'):
        #     model.forward_train = model.forward_dummy
        #     model.forward_test = model.forward_dummy
        #     model.eval()
        # else:
        #     raise NotImplementedError(
        #         'FLOPs counter is currently not supported for {}'.format(
        #             model.__class__.__name__))
        # # inf time
        # pure_inf_time = 0
        # itv_sample = 10
        # for i, data in enumerate(self.dataloader):
        #     torch.cuda.synchronize()
        #     start_time = time.perf_counter()

        #     with torch.no_grad():
        #         model(return_loss=False, rescale=True, **self.construct_input(m_info=data))

        #     torch.cuda.synchronize()
        #     elapsed = time.perf_counter() - start_time

        #     if i >= self.warm_up:
        #         pure_inf_time += elapsed
        #         if (i + 1) % itv_sample == 0:
        #             fps = (i + 1 - self.warm_up) / pure_inf_time
        #             if runner.rank == 0:
        #                 runner.logger.info(f'Done sample [{i + 1:<3}/ {itv_sample*5}], '
        #                     f'fps: {fps:.1f} sample / s')

        #     if (i + 1) == itv_sample*5:
        #         pure_inf_time += elapsed
        #         fps = (i + 1 - self.warm_up) / pure_inf_time
        #         if runner.rank == 0:
        #             runner.logger.info(f'Overall fps: {fps:.1f} sample / s')
        #         break
            
        # # flops and params   
        # if runner.rank == 0:
        #     flops, params = get_model_complexity_info(
        #         model, (None, None), input_constructor=self.construct_input)
            
        #     split_line = '=' * 30
        #     gpu_measure = torch.cuda.max_memory_allocated() / 1024. / 1024. /1024. 
        #     runner.logger.info(f'{split_line}\n' f'Flops: {flops}\nParams: {params}\nGPU memory: {gpu_measure:.2f}GB\n{split_line}')
            
        if dist.is_available() and dist.is_initialized():
            dist.barrier() 
        
        
    def after_run(self, runner):
        pass

    def before_epoch(self, runner):
        pass

    def after_epoch(self, runner):
        pass

    def before_iter(self, runner):
        pass

    def after_iter(self, runner):
        pass
  


================================================
FILE: projects/occ_plugin/core/evaluation/eval_hooks.py
================================================

# Note: Considering that MMCV's EvalHook updated its interface in V1.3.16,
# in order to avoid strong version dependency, we did not directly
# inherit EvalHook but BaseDistEvalHook.

import os.path as osp
import torch.distributed as dist
from mmcv.runner import DistEvalHook as BaseDistEvalHook
from torch.nn.modules.batchnorm import _BatchNorm
from mmcv.runner import EvalHook as BaseEvalHook


class OccEvalHook(BaseEvalHook):
    def __init__(self, *args,  **kwargs):
        super(OccEvalHook, self).__init__(*args, **kwargs)  
            
    def _do_evaluate(self, runner):
        """perform evaluation and save ckpt."""
        if not self._should_evaluate(runner):
            return

        from projects.occ_plugin.occupancy.apis.test import custom_single_gpu_test
        results = custom_single_gpu_test(runner.model, self.dataloader, show=False)
        
        runner.log_buffer.output['eval_iter_num'] = len(self.dataloader)
        key_score = self.evaluate(runner, results)
        if self.save_best:
            self._save_ckpt(runner, key_score)
            
            
class OccDistEvalHook(BaseDistEvalHook):
    def __init__(self, *args,  **kwargs):
        super(OccDistEvalHook, self).__init__(*args, **kwargs)       

    def _do_evaluate(self, runner):
        """perform evaluation and save ckpt."""
        # Synchronization of BatchNorm's buffer (running_mean
        # and running_var) is not supported in the DDP of pytorch,
        # which may cause the inconsistent performance of models in
        # different ranks, so we broadcast BatchNorm's buffers
        # of rank 0 to other ranks to avoid this.
        if self.broadcast_bn_buffer:
            model = runner.model
            for name, module in model.named_modules():
                if isinstance(module,
                              _BatchNorm) and module.track_running_stats:
                    dist.broadcast(module.running_var, 0)
                    dist.broadcast(module.running_mean, 0)

        if not self._should_evaluate(runner):
            return

        tmpdir = self.tmpdir
        if tmpdir is None:
            tmpdir = osp.join(runner.work_dir, '.eval_hook')

        from projects.occ_plugin.occupancy.apis.test import custom_multi_gpu_test # to solve circlur  import

        results = custom_multi_gpu_test(
            runner.model,
            self.dataloader,
            tmpdir=tmpdir,
            gpu_collect=self.gpu_collect)
        
        if runner.rank == 0:
            print('\n')
            runner.log_buffer.output['eval_iter_num'] = len(self.dataloader)
            
            key_score = self.evaluate(runner, results)

            if self.save_best:
                self._save_ckpt(runner, key_score)
  


================================================
FILE: projects/occ_plugin/core/visualizer/__init__.py
================================================
from .show_occ import save_occ

================================================
FILE: projects/occ_plugin/core/visualizer/show_occ.py
================================================

import torch.nn.functional as F
import torch
import numpy as np
from os import path as osp
import os

def save_occ(pred_c, pred_f, img_metas, path, visible_mask=None, gt_occ=None, free_id=0, thres_low=0.4, thres_high=0.99):

    """
    visualization saving for paper:
    1. gt
    2. pred_f pred_c
    3. gt visible
    4. pred_f visible
    """
    pred_f = F.softmax(pred_f, dim=1)
    pred_f = pred_f[0].cpu().numpy()  # C W H D
    pred_c = F.softmax(pred_c, dim=1)
    pred_c = pred_c[0].cpu().numpy()  # C W H D
    visible_mask = visible_mask[0].cpu().numpy().reshape(-1) > 0  # WHD
    gt_occ = gt_occ.data[0][0].cpu().numpy()  # W H D
    gt_occ[gt_occ==255] = 0
    _, W, H, D = pred_f.shape
    coordinates_3D_f = np.stack(np.meshgrid(np.arange(W), np.arange(H), np.arange(D), indexing='ij'), axis=-1).reshape(-1, 3) # (W*H*D, 3)
    _, W, H, D = pred_c.shape
    coordinates_3D_c = np.stack(np.meshgrid(np.arange(W), np.arange(H), np.arange(D), indexing='ij'), axis=-1).reshape(-1, 3) # (W*H*D, 3)
    pred_f = np.argmax(pred_f, axis=0) # (W, H, D)
    pred_c = np.argmax(pred_c, axis=0) # (W, H, D)
    occ_pred_f_mask = (pred_f.reshape(-1))!=free_id
    occ_pred_c_mask = (pred_c.reshape(-1))!=free_id
    occ_gt_mask = (gt_occ.reshape(-1))!=free_id
    pred_f_save = np.concatenate([coordinates_3D_f[occ_pred_f_mask], pred_f.reshape(-1)[occ_pred_f_mask].reshape(-1, 1)], axis=1)[:, [2,1,0,3]]  # zyx cls
    pred_c_save = np.concatenate([coordinates_3D_c[occ_pred_c_mask], pred_c.reshape(-1)[occ_pred_c_mask].reshape(-1, 1)], axis=1)[:, [2,1,0,3]]  # zyx cls
    pred_f_visible_save = np.concatenate([coordinates_3D_f[occ_pred_f_mask&visible_mask], pred_f.reshape(-1)[occ_pred_f_mask&visible_mask].reshape(-1, 1)], axis=1)[:, [2,1,0,3]]  # zyx cls
    gt_save = np.concatenate([coordinates_3D_f[occ_gt_mask], gt_occ.reshape(-1)[occ_gt_mask].reshape(-1, 1)], axis=1)[:, [2,1,0,3]]  # zyx cls
    gt_visible_save = np.concatenate([coordinates_3D_f[occ_gt_mask&visible_mask], gt_occ.reshape(-1)[occ_gt_mask&visible_mask].reshape(-1, 1)], axis=1)[:, [2,1,0,3]]  # zyx cls
    
    scene_token = img_metas.data[0][0]['scene_token']
    lidar_token = img_metas.data[0][0]['lidar_token']
    save_path = osp.join(path, scene_token, lidar_token)
    if not osp.exists(save_path):
        os.makedirs(save_path)
    save_pred_f_path = osp.join(save_path, 'pred_f.npy')
    save_pred_c_path = osp.join(save_path, 'pred_c.npy')
    save_pred_f_v_path = osp.join(save_path, 'pred_f_visible.npy')
    save_gt_path = osp.join(save_path, 'gt.npy')
    save_gt_v_path = osp.join(save_path, 'gt_visible.npy')
    np.save(save_pred_f_path, pred_f_save)
    np.save(save_pred_c_path, pred_c_save)
    np.save(save_pred_f_v_path, pred_f_visible_save)
    np.save(save_gt_path, gt_save)
    np.save(save_gt_v_path, gt_visible_save)


================================================
FILE: projects/occ_plugin/datasets/__init__.py
================================================
from .nuscenes_dataset import CustomNuScenesDataset
from .cam4docc_dataset import Cam4DOccDataset
from .cam4docc_lyft_dataset import Cam4DOccLyftDataset
from .builder import custom_build_dataset

__all__ = [
    'CustomNuScenesDataset', 'NuscOCCDataset'
]


================================================
FILE: projects/occ_plugin/datasets/builder.py
================================================
# Copyright (c) OpenMMLab. All rights reserved.
import copy
import platform
import random
from functools import partial

import numpy as np
from mmcv.parallel import collate
from mmcv.runner import get_dist_info
from mmcv.utils import Registry, build_from_cfg
from torch.utils.data import DataLoader

from mmdet.datasets.samplers import GroupSampler
from projects.occ_plugin.datasets.samplers.group_sampler import DistributedGroupSampler
from projects.occ_plugin.datasets.samplers.distributed_sampler import DistributedSampler
from projects.occ_plugin.datasets.samplers.sampler import build_sampler

def build_dataloader(dataset,
                     samples_per_gpu,
                     workers_per_gpu,
                     num_gpus=1,
                     dist=True,
                     shuffle=True,
                     seed=None,
                     shuffler_sampler=None,
                     nonshuffler_sampler=None,
                     **kwargs):
    """Build PyTorch DataLoader.
    In distributed training, each GPU/process has a dataloader.
    In non-distributed training, there is only one dataloader for all GPUs.
    Args:
        dataset (Dataset): A PyTorch dataset.
        samples_per_gpu (int): Number of training samples on each GPU, i.e.,
            batch size of each GPU.
        workers_per_gpu (int): How many subprocesses to use for data loading
            for each GPU.
        num_gpus (int): Number of GPUs. Only used in non-distributed training.
        dist (bool): Distributed training/test or not. Default: True.
        shuffle (bool): Whether to shuffle the data at every epoch.
            Default: True.
        kwargs: any keyword argument to be used to initialize DataLoader
    Returns:
        DataLoader: A PyTorch dataloader.
    """
    rank, world_size = get_dist_info()
    if dist:
        # DistributedGroupSampler will definitely shuffle the data to satisfy
        # that images on each GPU are in the same group
        if shuffle:
            sampler = build_sampler(shuffler_sampler if shuffler_sampler is not None else dict(type='DistributedGroupSampler'),
                                     dict(
                                         dataset=dataset,
                                         samples_per_gpu=samples_per_gpu,
                                         num_replicas=world_size,
                                         rank=rank,
                                         seed=seed)
                                     )

        else:
            sampler = build_sampler(nonshuffler_sampler if nonshuffler_sampler is not None else dict(type='DistributedSampler'),
                                     dict(
                                         dataset=dataset,
                                         num_replicas=world_size,
                                         rank=rank,
                                         shuffle=shuffle,
                                         seed=seed)
                                     )

        batch_size = samples_per_gpu
        num_workers = workers_per_gpu
    else:
        print('WARNING!!!!, Only can be used for obtain inference speed!!!!')
        sampler = GroupSampler(dataset, samples_per_gpu) if shuffle else None
        batch_size = num_gpus * samples_per_gpu
        num_workers = num_gpus * workers_per_gpu

    init_fn = partial(
        worker_init_fn, num_workers=num_workers, rank=rank,
        seed=seed) if seed is not None else None

    data_loader = DataLoader(
        dataset,
        batch_size=batch_size,
        sampler=sampler,
        num_workers=num_workers,
        collate_fn=partial(collate, samples_per_gpu=samples_per_gpu),
        pin_memory=False,
        worker_init_fn=init_fn,
        **kwargs)

    return data_loader


def worker_init_fn(worker_id, num_workers, rank, seed):
    # The seed of each worker equals to
    # num_worker * rank + worker_id + user_seed
    worker_seed = num_workers * rank + worker_id + seed
    np.random.seed(worker_seed)
    random.seed(worker_seed)


# Copyright (c) OpenMMLab. All rights reserved.
import platform
from mmcv.utils import Registry, build_from_cfg

from mmdet.datasets import DATASETS
from mmdet.datasets.builder import _concat_dataset

if platform.system() != 'Windows':
    # https://github.com/pytorch/pytorch/issues/973
    import resource
    rlimit = resource.getrlimit(resource.RLIMIT_NOFILE)
    base_soft_limit = rlimit[0]
    hard_limit = rlimit[1]
    soft_limit = min(max(4096, base_soft_limit), hard_limit)
    resource.setrlimit(resource.RLIMIT_NOFILE, (soft_limit, hard_limit))

OBJECTSAMPLERS = Registry('Object sampler')


def custom_build_dataset(cfg, default_args=None):
    from mmdet3d.datasets.dataset_wrappers import CBGSDataset
    from mmdet.datasets.dataset_wrappers import (ClassBalancedDataset,
                                                 ConcatDataset, RepeatDataset)
    if isinstance(cfg, (list, tuple)):
        dataset = ConcatDataset([custom_build_dataset(c, default_args) for c in cfg])
    elif cfg['type'] == 'ConcatDataset':
        dataset = ConcatDataset(
            [custom_build_dataset(c, default_args) for c in cfg['datasets']],
            cfg.get('separate_eval', True))
    elif cfg['type'] == 'RepeatDataset':
        dataset = RepeatDataset(
            custom_build_dataset(cfg['dataset'], default_args), cfg['times'])
    elif cfg['type'] == 'ClassBalancedDataset':
        dataset = ClassBalancedDataset(
            custom_build_dataset(cfg['dataset'], default_args), cfg['oversample_thr'])
    elif cfg['type'] == 'CBGSDataset':
        dataset = CBGSDataset(custom_build_dataset(cfg['dataset'], default_args))
    elif isinstance(cfg.get('ann_file'), (list, tuple)):
        dataset = _concat_dataset(cfg, default_args)
    else:
        dataset = build_from_cfg(cfg, DATASETS, default_args)

    return dataset


================================================
FILE: projects/occ_plugin/datasets/cam4docc_dataset.py
================================================
# Developed by Junyi Ma based on the codebase of OpenOccupancy and PowerBEV
# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
# https://github.com/haomo-ai/Cam4DOcc

import numpy as np
from mmcv.runner import get_dist_info
from mmdet.datasets import DATASETS
from mmdet3d.datasets import NuScenesDataset
import os
from nuscenes.eval.common.utils import quaternion_yaw, Quaternion
from projects.occ_plugin.utils.formating import cm_to_ious, format_iou_results
from projects.occ_plugin.utils.geometry import convert_egopose_to_matrix_numpy, invert_matrix_egopose_numpy
from nuscenes import NuScenes
from pyquaternion import Quaternion
import torch
import random
import time

@DATASETS.register_module()
class Cam4DOccDataset(NuScenesDataset):
    def __init__(self, occ_size, pc_range, occ_root, idx_root, ori_data_root, data_root, time_receptive_field, n_future_frames, classes, use_separate_classes,
                  train_capacity, test_capacity, **kwargs):
        
        '''
        Cam4DOccDataset contains sequential occupancy states as well as instance flow for training occupancy forecasting models. We unify the related operations in the LiDAR coordinate system following OpenOccupancy.

        occ_size: number of grids along H W L, default: [512, 512, 40]
        pc_range: predefined ranges along H W L, default: [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]
        occ_root: data path of nuScenes-Occupancy
        idx_root: save path of test indexes
        time_receptive_field: number of historical frames used for forecasting (including the present one), default: 3
        n_future_frames: number of forecasted future frames, default: 4
        classes: predefiend categories in GMO
        use_separate_classes: separate movable objects instead of the general one
        train_capacity: number of sequences used for training, default: 23930
        test_capacity: number of sequences used for testing, default: 5119
        '''
        
        self.train_capacity = train_capacity
        self.test_capacity = test_capacity

        super().__init__(**kwargs)

        rank, world_size = get_dist_info()

        self.time_receptive_field = time_receptive_field
        self.n_future_frames = n_future_frames
        self.sequence_length = time_receptive_field + n_future_frames

        if rank == 0:
            print("-------------")
            print("use past " + str(self.time_receptive_field) + " frames to forecast future " + str(self.n_future_frames) + " frames")
            print("-------------")

        self.data_infos = list(sorted(self.data_infos, key=lambda e: e['timestamp']))
        self.data_infos = self.data_infos[::self.load_interval]
        self.occ_size = occ_size
        self.pc_range = pc_range
        self.occ_root = occ_root
        self.idx_root = idx_root
        self.ori_data_root = ori_data_root
        self.data_root = data_root
        self.classes = classes
        self.use_separate_classes = use_separate_classes

        self.indices = self.get_indices()
        self.present_scene_lidar_token = " "
        self._set_group_flag()

        # load origin nusc dataset for instance annotation
        self.nusc = NuScenes(version='v1.0-trainval', dataroot=self.data_root, verbose=False)
        if self.test_mode:
            self.chosen_list = random.sample(range(0, self.test_capacity) , self.test_capacity)
            self.chosen_list_num = len(self.chosen_list)
        else:
            self.chosen_list = random.sample(range(0, self.train_capacity) , self.train_capacity)
            self.chosen_list_num = len(self.chosen_list)
    
    def _set_group_flag(self):
        if self.test_mode:
            self.flag = np.zeros(self.test_capacity, dtype=np.uint8)
        else:
            self.flag = np.zeros(self.train_capacity, dtype=np.uint8)

    def __len__(self):
        if self.test_mode:
            return self.test_capacity
        else:
            return self.train_capacity

    def __getitem__(self, idx):
     
        idx = int(self.chosen_list[idx])

        self.egopose_list = []
        self.ego2lidar_list = []
        self.visible_instance_set = set()
        self.instance_dict = {}

        if self.test_mode:
            return self.prepare_test_data(idx)
            
        while True:
            data = self.prepare_train_data(idx)
            if data is None:
                idx = self._rand_another(idx)
                idx = int(self.chosen_list[idx])
                continue
            
            return data

    def get_indices(self):
        '''
        Generate sequential indexes for training and testing
        '''
        indices = []
        for index in range(len(self.data_infos)): 
            is_valid_data = True
            previous_rec = None
            current_indices = []
            for t in range(self.sequence_length):
                index_t = index + t
                # Going over the dataset size limit.
                if index_t >= len(self.data_infos):
                    is_valid_data = False
                    break
                rec = self.data_infos[index_t]
                # Check if scene is the same
                if (previous_rec is not None) and (rec['scene_token'] != previous_rec['scene_token']):
                    is_valid_data = False
                    break

                current_indices.append(index_t)
                previous_rec = rec

            if is_valid_data:
                indices.append(current_indices)

        return np.asarray(indices)

    def get_lidar_pose(self, rec):
        '''
        Get global poses for following bbox transforming
        '''
        ego2global_translation = rec['ego2global_translation']
        ego2global_rotation = rec['ego2global_rotation']
        trans = -np.array(ego2global_translation)
        rot = Quaternion(ego2global_rotation).inverse
        
        return trans, rot
    
    def get_ego2lidar_pose(self, rec):
        '''
        Get LiDAR poses in ego system
        '''
        lidar2ego_translation = rec['lidar2ego_translation']
        lidar2ego_rotation = rec['lidar2ego_rotation']
        trans = -np.array(lidar2ego_translation)
        rot = Quaternion(lidar2ego_rotation).inverse
        return trans, rot

    def record_instance(self, idx, instance_map):
        """
        Record information about each visible instance in the sequence and assign a unique ID to it
        """
        rec = self.data_infos[idx]
        translation, rotation = self.get_lidar_pose(rec)
        self.egopose_list.append([translation, rotation])
        ego2lidar_translation, ego2lidar_rotation = self.get_ego2lidar_pose(rec)
        self.ego2lidar_list.append([ego2lidar_translation, ego2lidar_rotation])

        current_sample = self.nusc.get('sample', rec['token'])
        for annotation_token in current_sample['anns']:
            annotation = self.nusc.get('sample_annotation', annotation_token)
            # Instance extraction for Cam4DOcc-V1 
            # Filter out all non vehicle instances
            # if 'vehicle' not in annotation['category_name']:
            #     continue
            gmo_flag = False
            for class_name in self.classes:
                if class_name in annotation['category_name']:
                    gmo_flag = True
                    break
            if not gmo_flag:
                continue
            # Specify semantic id if use_separate_classes
            semantic_id = 1
            if self.use_separate_classes:
                if 'vehicle.bicycle' in annotation['category_name']: # rm static_object.bicycle_rack
                    semantic_id = 1
                elif 'bus'  in annotation['category_name']:
                    semantic_id = 2
                elif 'car'  in annotation['category_name']:
                    semantic_id = 3
                elif 'construction'  in annotation['category_name']:
                    semantic_id = 4
                elif 'motorcycle'  in annotation['category_name']:
                    semantic_id = 5
                elif 'trailer'  in annotation['category_name']:
                    semantic_id = 6
                elif 'truck'  in annotation['category_name']:
                    semantic_id = 7
                elif 'pedestrian'  in annotation['category_name']:
                    semantic_id = 8

            # Filter out invisible vehicles
            FILTER_INVISIBLE_VEHICLES = True
            if FILTER_INVISIBLE_VEHICLES and int(annotation['visibility_token']) == 1 and annotation['instance_token'] not in self.visible_instance_set:
                continue
            # Filter out vehicles that have not been seen in the past
            if self.counter >= self.time_receptive_field and annotation['instance_token'] not in self.visible_instance_set:
                continue
            self.visible_instance_set.add(annotation['instance_token'])

            if annotation['instance_token'] not in instance_map:
                instance_map[annotation['instance_token']] = len(instance_map) + 1
            instance_id = instance_map[annotation['instance_token']]
            instance_attribute = int(annotation['visibility_token'])

            if annotation['instance_token'] not in self.instance_dict:
                # For the first occurrence of an instance
                self.instance_dict[annotation['instance_token']] = {
                    'timestep': [self.counter],
                    'translation': [annotation['translation']],
                    'rotation': [annotation['rotation']],
                    'size': annotation['size'],
                    'instance_id': instance_id,
                    'semantic_id': semantic_id,
                    'attribute_label': [instance_attribute],
                }
            else:
                # For the instance that have appeared before
                self.instance_dict[annotation['instance_token']]['timestep'].append(self.counter)
                self.instance_dict[annotation['instance_token']]['translation'].append(annotation['translation'])
                self.instance_dict[annotation['instance_token']]['rotation'].append(annotation['rotation'])
                self.instance_dict[annotation['instance_token']]['attribute_label'].append(instance_attribute)

        return instance_map

    def get_future_egomotion(self, idx):
        '''
        Calculate LiDAR pose updates between idx and idx+1
        '''
        rec_t0 = self.data_infos[idx]
        future_egomotion = np.eye(4, dtype=np.float32)

        if idx < len(self.data_infos) - 1:
            rec_t1 = self.data_infos[idx + 1]

            if rec_t0['scene_token'] == rec_t1['scene_token']:
                egopose_t0_trans = rec_t0['ego2global_translation']
                egopose_t0_rot = rec_t0['ego2global_rotation']
                egopose_t1_trans = rec_t1['ego2global_translation']
                egopose_t1_rot = rec_t1['ego2global_rotation']
                egopose_t0 = convert_egopose_to_matrix_numpy(egopose_t0_trans, egopose_t0_rot)
                egopose_t1 = convert_egopose_to_matrix_numpy(egopose_t1_trans, egopose_t1_rot)

                lidar2ego_t0_trans = rec_t0['lidar2ego_translation']
                lidar2ego_t0_rot = rec_t0['lidar2ego_rotation']
                lidar2ego_t1_trans = rec_t1['lidar2ego_translation']
                lidar2ego_t1_rot = rec_t1['lidar2ego_rotation']
                lidar2ego_t0 = convert_egopose_to_matrix_numpy(lidar2ego_t0_trans, lidar2ego_t0_rot)
                lidar2ego_t1 = convert_egopose_to_matrix_numpy(lidar2ego_t1_trans, lidar2ego_t1_rot)

                future_egomotion = invert_matrix_egopose_numpy(lidar2ego_t1).dot(invert_matrix_egopose_numpy(egopose_t1)).dot(egopose_t0).dot(lidar2ego_t0)   

        future_egomotion = torch.Tensor(future_egomotion).float()

        # Convert to 6DoF vector
        return future_egomotion.unsqueeze(0)

    @staticmethod
    def _check_consistency(translation, prev_translation, threshold=1.0):
        """
        Check for significant displacement of the instance adjacent moments
        """
        x, y = translation[:2]
        prev_x, prev_y = prev_translation[:2]

        if abs(x - prev_x) > threshold or abs(y - prev_y) > threshold:
            return False
        return True

    def refine_instance_poly(self, instance):
        """
        Fix the missing frames and disturbances of ground truth caused by noise
        """
        pointer = 1
        for i in range(instance['timestep'][0] + 1, self.sequence_length):
            # Fill in the missing frames
            if i not in instance['timestep']:
                instance['timestep'].insert(pointer, i)
                instance['translation'].insert(pointer, instance['translation'][pointer-1])
                instance['rotation'].insert(pointer, instance['rotation'][pointer-1])
                instance['attribute_label'].insert(pointer, instance['attribute_label'][pointer-1])
                pointer += 1
                continue
            
            # Eliminate observation disturbances
            if self._check_consistency(instance['translation'][pointer], instance['translation'][pointer-1]):
                instance['translation'][pointer] = instance['translation'][pointer-1]
                instance['rotation'][pointer] = instance['rotation'][pointer-1]
                instance['attribute_label'][pointer] = instance['attribute_label'][pointer-1]
            pointer += 1
        
        return instance

    def prepare_train_data(self, index):
        '''
        Generate a training sequence
        '''
        input_dict = self.get_data_info(index)
        if input_dict is None:
            return None
        
        example = self.prepare_sequential_data(index)
        return example

    def prepare_test_data(self, index):
        '''
        Generate a test sequence
        TODO: Give additional functions here such as visualization
        '''
        input_dict = self.get_data_info(index)
        if input_dict is None:
            return None
        
        example = self.prepare_sequential_data(index)
        # TODO: visualize example data
        return example
    
    def prepare_sequential_data(self, index):
        '''
        Use the predefined pipeline to generate inputs of the baseline network and ground truth for the standard evaluation protocol in Cam4DOcc
        '''
        instance_map = {}
        input_seq_data = {}
        keys = ['input_dict','future_egomotion', 'sample_token']
        for key in keys:
            input_seq_data[key] = []
        scene_lidar_token = []

        for self.counter, index_t in enumerate(self.indices[index]):
            input_dict_per_frame = self.get_data_info(index_t)
            if input_dict_per_frame is None:
                return None
            
            input_seq_data['input_dict'].append(input_dict_per_frame)
            input_seq_data['sample_token'].append(input_dict_per_frame['sample_idx'])

            instance_map = self.record_instance(index_t, instance_map)
            future_egomotion = self.get_future_egomotion(index_t)
            input_seq_data['future_egomotion'].append(future_egomotion)

            scene_lidar_token.append(input_dict_per_frame['scene_token']+"_"+input_dict_per_frame['lidar_token'])
            if self.counter == self.time_receptive_field - 1:
                self.present_scene_lidar_token = input_dict_per_frame['scene_token']+"_"+input_dict_per_frame['lidar_token']

        # save sequential test indexes for possible evaluation
        if self.test_mode:
            test_idx_path = os.path.join(self.idx_root, "test_ids")
            if not os.path.exists(test_idx_path):
                os.mkdir(test_idx_path)
            np.savez(os.path.join(test_idx_path, self.present_scene_lidar_token), scene_lidar_token)

        for token in self.instance_dict.keys():
            self.instance_dict[token] = self.refine_instance_poly(self.instance_dict[token])

        input_seq_data.update(
            dict(
                time_receptive_field=self.time_receptive_field,
                sequence_length=self.sequence_length,
                egopose_list=self.egopose_list,
                ego2lidar_list=self.ego2lidar_list,
                instance_dict=self.instance_dict,
                instance_map=instance_map,
                indices=self.indices[index],
                scene_token=self.present_scene_lidar_token,
            ))

        example = self.pipeline(input_seq_data)

        return example


    def get_data_info(self, index):
        '''
        get_data_info from .pkl also used by OpenOccupancy
        '''
        
        info = self.data_infos[index]
        
        # standard protocal modified from SECOND.Pytorch
        input_dict = dict(
            sample_idx=info['token'],
            pts_filename=info['lidar_path'],
            sweeps=info['sweeps'],
            lidar2ego_translation=info['lidar2ego_translation'],
            lidar2ego_rotation=info['lidar2ego_rotation'],
            ego2global_translation=info['ego2global_translation'],
            ego2global_rotation=info['ego2global_rotation'],
            prev_idx=info['prev'],
            next_idx=info['next'],
            scene_token=info['scene_token'],
            can_bus=info['can_bus'],
            # frame_idx=info['frame_idx'],
            timestamp=info['timestamp'] / 1e6,
            occ_size = np.array(self.occ_size),
            pc_range = np.array(self.pc_range),
            lidar_token=info['lidar_token'],
            lidarseg=info['lidarseg'],
            curr=info,
        )

        if self.modality['use_camera']:
            image_paths = []
            lidar2img_rts = []
            lidar2cam_rts = []
            cam_intrinsics = []
            
            lidar2cam_dic = {}
            
            for cam_type, cam_info in info['cams'].items():
                image_paths.append(cam_info['data_path'])
                # obtain lidar to image transformation matrix
                lidar2cam_r = np.linalg.inv(cam_info['sensor2lidar_rotation'])
                lidar2cam_t = cam_info[
                    'sensor2lidar_translation'] @ lidar2cam_r.T
                lidar2cam_rt = np.eye(4)
                lidar2cam_rt[:3, :3] = lidar2cam_r.T
                lidar2cam_rt[3, :3] = -lidar2cam_t
                intrinsic = cam_info['cam_intrinsic']
                viewpad = np.eye(4)
                viewpad[:intrinsic.shape[0], :intrinsic.shape[1]] = intrinsic
                lidar2img_rt = (viewpad @ lidar2cam_rt.T)
                lidar2img_rts.append(lidar2img_rt)

                cam_intrinsics.append(viewpad)
                lidar2cam_rts.append(lidar2cam_rt.T)
                
                lidar2cam_dic[cam_type] = lidar2cam_rt.T

            input_dict.update(
                dict(
                    img_filename=image_paths,
                    lidar2img=lidar2img_rts,
                    cam_intrinsic=cam_intrinsics,
                    lidar2cam=lidar2cam_rts,
                    lidar2cam_dic=lidar2cam_dic,
                ))

        return input_dict


    def evaluate(self, results, logger=None, **kawrgs):
        '''
        Evaluate by IOU and VPQ metrics for model evaluation
        '''
        eval_results = {}
        
        ''' calculate IOU '''
        hist_for_iou = sum(results['hist_for_iou'])
        ious = cm_to_ious(hist_for_iou)
        res_table, res_dic = format_iou_results(ious, return_dic=True)
        for key, val in res_dic.items():
            eval_results['IOU_{}'.format(key)] = val
        if logger is not None:
            logger.info('IOU Evaluation')
            logger.info(res_table)        

        ''' calculate VPQ '''
        if 'vpq_metric' in results.keys() and 'vpq_len' in results.keys():
            vpq_sum = sum(results['vpq_metric'])
            eval_results['VPQ'] = vpq_sum/results['vpq_len']

        return eval_results


================================================
FILE: projects/occ_plugin/datasets/cam4docc_lyft_dataset.py
================================================
# Developed by Junyi Ma based on the codebase of OpenOccupancy and PowerBEV
# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
# https://github.com/haomo-ai/Cam4DOcc

import numpy as np
from mmcv.runner import get_dist_info
from mmdet.datasets import DATASETS
from mmdet3d.datasets import NuScenesDataset
from mmdet3d.datasets.pipelines import Compose
from torch.utils.data import Dataset
from lyft_dataset_sdk.lyftdataset import LyftDataset
import os
from nuscenes.eval.common.utils import quaternion_yaw, Quaternion
from projects.occ_plugin.utils.formating import cm_to_ious, format_iou_results
from projects.occ_plugin.utils.geometry import convert_egopose_to_matrix_numpy, invert_matrix_egopose_numpy
from nuscenes import NuScenes
from pyquaternion import Quaternion
import torch
import random
import time

@DATASETS.register_module()
class Cam4DOccLyftDataset(Dataset):
    def __init__(self, occ_size, pc_range, occ_root, idx_root, ori_data_root, data_root, time_receptive_field, n_future_frames, classes, use_separate_classes,
                  train_capacity, test_capacity, test_mode=False, pipeline=None, **kwargs):
        
        '''
        Cam4DOccLyftDataset contains sequential occupancy states as well as instance flow for training occupancy forecasting models. We unify the related operations in the LiDAR coordinate system following OpenOccupancy.

        occ_size: number of grids along H W L, default: [512, 512, 40]
        pc_range: predefined ranges along H W L, default: [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]
        occ_root: data path of nuScenes-Occupancy
        idx_root: save path of test indexes
        time_receptive_field: number of historical frames used for forecasting (including the present one), default: 3
        n_future_frames: number of forecasted future frames, default: 4
        classes: predefiend categories in GMO
        use_separate_classes: separate movable objects instead of the general one
        train_capacity: number of sequences used for training, default: 23930
        test_capacity: number of sequences used for testing, default: 5119
        '''

        self.test_mode = test_mode
        self.CLASSES = classes
        
        self.train_capacity = train_capacity
        self.test_capacity = test_capacity

        super().__init__()

        # training and test indexes following PowerBEV
        self.TRAIN_LYFT_INDICES = [1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16,
                      17, 18, 19, 20, 21, 23, 24, 27, 28, 29, 30, 31, 32,
                      33, 35, 36, 37, 39, 41, 43, 44, 45, 46, 47, 48, 49,
                      50, 51, 52, 53, 55, 56, 59, 60, 62, 63, 65, 68, 69,
                      70, 71, 72, 73, 74, 75, 76, 78, 79, 81, 82, 83, 84,
                      86, 87, 88, 89, 93, 95, 97, 98, 99, 103, 104, 107, 108,
                      109, 110, 111, 113, 114, 115, 116, 117, 118, 119, 121, 122, 124,
                      127, 128, 130, 131, 132, 134, 135, 136, 137, 138, 139, 143, 144,
                      146, 147, 148, 149, 150, 151, 152, 153, 154, 156, 157, 158, 159,
                      161, 162, 165, 166, 167, 171, 172, 173, 174, 175, 176, 177, 178,
                      179]

        self.VAL_LYFT_INDICES = [0, 2, 4, 13, 22, 25, 26, 34, 38, 40, 42, 54, 57,
                    58, 61, 64, 66, 67, 77, 80, 85, 90, 91, 92, 94, 96,
                    100, 101, 102, 105, 106, 112, 120, 123, 125, 126, 129, 133, 140,
                    141, 142, 145, 155, 160, 163, 164, 168, 169, 170]


        rank, world_size = get_dist_info()

        self.time_receptive_field = time_receptive_field
        self.n_future_frames = n_future_frames
        self.sequence_length = time_receptive_field + n_future_frames

        if rank == 0:
            print("-------------")
            print("use past " + str(self.time_receptive_field) + " frames to forecast future " + str(self.n_future_frames) + " frames")
            print("-------------")

        self.occ_size = occ_size
        self.pc_range = pc_range
        self.occ_root = occ_root
        self.idx_root = idx_root
        self.ori_data_root = ori_data_root
        self.data_root = data_root
        self.classes = classes
        self.use_separate_classes = use_separate_classes

        self.pipeline = Compose(pipeline)

        # load origin nusc dataset for instance annotation
        self.lyft = LyftDataset(data_path=self.data_root, json_path=os.path.join(self.data_root, 'train_data'), verbose=False)

        self.scenes = self.get_scenes()
        self.ixes = self.get_samples()
        self.indices = self.get_indices()

        self.present_scene_lidar_token = " "
        self._set_group_flag()

        if self.test_mode:
            self.chosen_list = random.sample(range(0, self.test_capacity) , self.test_capacity)
            self.chosen_list_num = len(self.chosen_list)
        else:
            self.chosen_list = random.sample(range(0, self.train_capacity) , self.train_capacity)
            self.chosen_list_num = len(self.chosen_list)
    
    def _set_group_flag(self):
        if self.test_mode:
            self.flag = np.zeros(self.test_capacity, dtype=np.uint8)
        else:
            self.flag = np.zeros(self.train_capacity, dtype=np.uint8)

    def __len__(self):
        if self.test_mode:
            return self.test_capacity
        else:
            return self.train_capacity

    def __getitem__(self, idx):
     
        idx = int(self.chosen_list[idx])

        self.egopose_list = []
        self.ego2lidar_list = []
        self.visible_instance_set = set()
        self.instance_dict = {}

        if self.test_mode:
            return self.prepare_test_data(idx)
            
        while True:
            data = self.prepare_train_data(idx)
            if data is None:
                idx = self._rand_another(idx)
                idx = int(self.chosen_list[idx])
                continue
            
            return data

    def get_scenes(self):
        """
        Obtain the list of scenes names in the given split.
        """
        scenes = [row['name'] for row in self.lyft.scene]
        # split in train/val
        indices = self.VAL_LYFT_INDICES  if self.test_mode else  self.TRAIN_LYFT_INDICES
        scenes = [scenes[i] for i in indices]
        return scenes

    def get_samples(self):
        """
        Find and sort the samples in the given split by scene.
        """
        samples = [sample for sample in self.lyft.sample]
        # remove samples that aren't in this split
        samples = [sample for sample in samples if self.lyft.get('scene', sample['scene_token'])['name'] in self.scenes]
        # sort by scene, timestamp (only to make chronological viz easier)
        samples.sort(key=lambda x: (x['scene_token'], x['timestamp']))

        return samples

    def get_indices(self):
        '''
        Generate sequential indexes for training and testing
        '''
        indices = []
        for index in range(len(self.ixes)):
            is_valid_data = True
            previous_rec = None
            current_indices = []
            for t in range(self.sequence_length):
                index_t = index + t
                # Going over the dataset size limit.
                if index_t >= len(self.ixes):
                    is_valid_data = False
                    break
                rec = self.ixes[index_t]
                # Check if scene is the same
                if (previous_rec is not None) and (rec['scene_token'] != previous_rec['scene_token']):
                    is_valid_data = False
                    break

                current_indices.append(index_t)
                previous_rec = rec

            if is_valid_data:
                indices.append(current_indices)

        return np.asarray(indices)

    def get_lidar_pose(self, rec):
        '''
        Get global poses for following bbox transforming
        '''
        current_sample = self.lyft.get('sample', rec['token'])
        egopose = self.lyft.get('ego_pose', self.lyft.get('sample_data', current_sample['data']['LIDAR_TOP'])['ego_pose_token'])
        ego2global_translation = egopose['translation']
        ego2global_rotation = egopose['rotation']
        trans = -np.array(ego2global_translation)
        rot = Quaternion(ego2global_rotation).inverse
        
        return trans, rot
    
    def get_ego2lidar_pose(self, rec):
        '''
        Get LiDAR poses in ego system
        '''
        current_sample = self.lyft.get('sample', rec['token'])
        lidar_top_data = self.lyft.get('sample_data', current_sample['data']['LIDAR_TOP'])
        lidar2ego_translation = self.lyft.get('calibrated_sensor', lidar_top_data['calibrated_sensor_token'])['translation']
        lidar2ego_rotation =  self.lyft.get('calibrated_sensor', lidar_top_data['calibrated_sensor_token'])['rotation']

        trans = -np.array(lidar2ego_translation)
        rot = Quaternion(lidar2ego_rotation).inverse
        return trans, rot

    def record_instance(self, idx, instance_map):
        """
        Record information about each visible instance in the sequence and assign a unique ID to it
        """
        rec = self.ixes[idx]
        translation, rotation = self.get_lidar_pose(rec)
        self.egopose_list.append([translation, rotation])
        ego2lidar_translation, ego2lidar_rotation = self.get_ego2lidar_pose(rec)
        self.ego2lidar_list.append([ego2lidar_translation, ego2lidar_rotation])

        current_sample = self.lyft.get('sample', rec['token'])
        for annotation_token in current_sample['anns']:
            annotation = self.lyft.get('sample_annotation', annotation_token)
            # Instance extraction for Cam4DOcc-V1 
            # Filter out all non vehicle instances
            # if 'vehicle' not in annotation['category_name']:
            #     continue
            gmo_flag = False
            for class_name in self.classes:
                if class_name in annotation['category_name']:
                    gmo_flag = True
                    break
            if not gmo_flag:
                continue
            # Specify semantic id if use_separate_classes
            semantic_id = 1
            if self.use_separate_classes:
                if 'bicycle' in annotation['category_name']:
                    semantic_id = 1
                elif 'bus'  in annotation['category_name']:
                    semantic_id = 2
                elif 'car'  in annotation['category_name']:
                    semantic_id = 3
                elif 'construction'  in annotation['category_name']:
                    semantic_id = 4
                elif 'motorcycle'  in annotation['category_name']:
                    semantic_id = 5
                elif 'trailer'  in annotation['category_name']:
                    semantic_id = 6
                elif 'truck'  in annotation['category_name']:
                    semantic_id = 7
                elif 'pedestrian'  in annotation['category_name']:
                    semantic_id = 8

            if annotation['instance_token'] not in instance_map:
                instance_map[annotation['instance_token']] = len(instance_map) + 1
            instance_id = instance_map[annotation['instance_token']]
            instance_attribute = 1 # deprecated

            if annotation['instance_token'] not in self.instance_dict:
                # For the first occurrence of an instance
                self.instance_dict[annotation['instance_token']] = {
                    'timestep': [self.counter],
                    'translation': [annotation['translation']],
                    'rotation': [annotation['rotation']],
                    'size': annotation['size'],
                    'instance_id': instance_id,
                    'semantic_id': semantic_id,
                    'attribute_label': [instance_attribute],
                }
            else:
                # For the instance that have appeared before
                self.instance_dict[annotation['instance_token']]['timestep'].append(self.counter)
                self.instance_dict[annotation['instance_token']]['translation'].append(annotation['translation'])
                self.instance_dict[annotation['instance_token']]['rotation'].append(annotation['rotation'])
                self.instance_dict[annotation['instance_token']]['attribute_label'].append(instance_attribute)

        return instance_map

    def get_future_egomotion(self, idx):
        '''
        Calculate LiDAR pose updates between idx and idx+1
        '''
        rec_t0 = self.ixes[idx]
        future_egomotion = np.eye(4, dtype=np.float32)

        if idx < len(self.ixes) - 1:
            rec_t1 = self.ixes[idx + 1]

            if rec_t0['scene_token'] == rec_t1['scene_token']:
                egopose_t0 = self.lyft.get('ego_pose', self.lyft.get('sample_data', rec_t0['data']['LIDAR_TOP'])['ego_pose_token'])
                egopose_t0_trans = egopose_t0['translation']
                egopose_t0_rot = egopose_t0['rotation']

                egopose_t1 = self.lyft.get('ego_pose', self.lyft.get('sample_data', rec_t1['data']['LIDAR_TOP'])['ego_pose_token'])
                egopose_t1_trans = egopose_t1['translation']
                egopose_t1_rot = egopose_t1['rotation']

                egopose_t0 = convert_egopose_to_matrix_numpy(egopose_t0_trans, egopose_t0_rot)
                egopose_t1 = convert_egopose_to_matrix_numpy(egopose_t1_trans, egopose_t1_rot)

                lidar_top_data_t0 = self.lyft.get('sample_data', rec_t0['data']['LIDAR_TOP'])
                lidar2ego_t0_trans = self.lyft.get('calibrated_sensor', lidar_top_data_t0['calibrated_sensor_token'])['translation']
                lidar2ego_t0_rot =  self.lyft.get('calibrated_sensor', lidar_top_data_t0['calibrated_sensor_token'])['rotation']
                lidar_top_data_t1 = self.lyft.get('sample_data', rec_t1['data']['LIDAR_TOP'])
                lidar2ego_t1_trans = self.lyft.get('calibrated_sensor', lidar_top_data_t1['calibrated_sensor_token'])['translation']
                lidar2ego_t1_rot =  self.lyft.get('calibrated_sensor', lidar_top_data_t1['calibrated_sensor_token'])['rotation']


                lidar2ego_t0 = convert_egopose_to_matrix_numpy(lidar2ego_t0_trans, lidar2ego_t0_rot)
                lidar2ego_t1 = convert_egopose_to_matrix_numpy(lidar2ego_t1_trans, lidar2ego_t1_rot)

                future_egomotion = invert_matrix_egopose_numpy(lidar2ego_t1).dot(invert_matrix_egopose_numpy(egopose_t1)).dot(egopose_t0).dot(lidar2ego_t0)   

        future_egomotion = torch.Tensor(future_egomotion).float()
        return future_egomotion.unsqueeze(0)

    @staticmethod
    def _check_consistency(translation, prev_translation, threshold=1.0):
        """
        Check for significant displacement of the instance adjacent moments
        """
        x, y = translation[:2]
        prev_x, prev_y = prev_translation[:2]

        if abs(x - prev_x) > threshold or abs(y - prev_y) > threshold:
            return False
        return True

    def refine_instance_poly(self, instance):
        """
        Fix the missing frames and disturbances of ground truth caused by noise
        """
        pointer = 1
        for i in range(instance['timestep'][0] + 1, self.sequence_length):
            # Fill in the missing frames
            if i not in instance['timestep']:
                instance['timestep'].insert(pointer, i)
                instance['translation'].insert(pointer, instance['translation'][pointer-1])
                instance['rotation'].insert(pointer, instance['rotation'][pointer-1])
                instance['attribute_label'].insert(pointer, instance['attribute_label'][pointer-1])
                pointer += 1
                continue
            
            # Eliminate observation disturbances
            if self._check_consistency(instance['translation'][pointer], instance['translation'][pointer-1]):
                instance['translation'][pointer] = instance['translation'][pointer-1]
                instance['rotation'][pointer] = instance['rotation'][pointer-1]
                instance['attribute_label'][pointer] = instance['attribute_label'][pointer-1]
            pointer += 1
        
        return instance

    def prepare_train_data(self, index):
        '''
        Generate a training sequence
        '''
        
        example = self.prepare_sequential_data(index)
        return example

    def prepare_test_data(self, index):
        '''
        Generate a test sequence
        TODO: Give additional functions here such as visualization
        '''
        
        example = self.prepare_sequential_data(index)
        # TODO: visualize example data
        return example
    
    def prepare_sequential_data(self, index):
        '''
        Use the predefined pipeline to generate inputs of the baseline network and ground truth for the standard evaluation protocol in Cam4DOcc
        '''
        instance_map = {}
        input_seq_data = {}
        keys = ['input_dict','future_egomotion', 'sample_token']
        for key in keys:
            input_seq_data[key] = []
        scene_lidar_token = []

        for self.counter, index_t in enumerate(self.indices[index]):

            input_dict_per_frame = {}
            rec = self.ixes[index_t]  # sample

            lidar_top_data = self.lyft.get('sample_data', rec['data']['LIDAR_TOP'])
            lidar2ego_translation = self.lyft.get('calibrated_sensor', lidar_top_data['calibrated_sensor_token'])['translation']
            lidar2ego_rotation =  self.lyft.get('calibrated_sensor', lidar_top_data['calibrated_sensor_token'])['rotation']

            egopose = self.lyft.get('ego_pose', self.lyft.get('sample_data', rec['data']['LIDAR_TOP'])['ego_pose_token'])
            ego2global_translation = egopose['translation']
            ego2global_rotation = egopose['rotation']

            input_dict_per_frame['lidar2ego_translation'] = lidar2ego_translation
            input_dict_per_frame['lidar2ego_rotation'] = lidar2ego_rotation
            input_dict_per_frame['ego2global_translation'] = ego2global_translation
            input_dict_per_frame['ego2global_rotation'] = ego2global_rotation
            input_dict_per_frame['scene_token'] = rec['scene_token']
            input_dict_per_frame['lidar_token'] = rec['data']['LIDAR_TOP']
            input_dict_per_frame['occ_size'] = np.array(self.occ_size)
            input_dict_per_frame['pc_range'] = np.array(self.pc_range)
            input_dict_per_frame['sample_idx'] = rec['token']


            image_paths = []
            lidar2img_rts = []
            lidar2cam_rts = []
            cam_intrinsics = []
            cam_intrinsics_ori = []
            lidar2cam_dic = {}

            lidar_sample = self.lyft.get('sample_data', rec['data']['LIDAR_TOP'])
            lidar_pose = self.lyft.get('ego_pose', lidar_sample['ego_pose_token'])
            lidar_rotation = Quaternion(lidar_pose['rotation'])
            lidar_translation = np.array(lidar_pose['translation'])[:, None]
            lidar_to_world = np.vstack([
                np.hstack((lidar_rotation.rotation_matrix, lidar_translation)),
                np.array([0, 0, 0, 1])
            ])

            lidar_sample_calib = self.lyft.get('calibrated_sensor', lidar_sample['calibrated_sensor_token'])
            lidar_sensor_rotation = Quaternion(lidar_sample_calib['rotation'])
            lidar_sensor_translation = np.array(lidar_sample_calib['translation'])[:, None]
            lidar_to_lidarego = np.vstack([
                np.hstack((lidar_sensor_rotation.rotation_matrix, lidar_sensor_translation)),
                np.array([0, 0, 0, 1])
            ])

            cameras = ['CAM_FRONT_LEFT', 'CAM_FRONT', 'CAM_FRONT_RIGHT', 'CAM_BACK_LEFT', 'CAM_BACK', 'CAM_BACK_RIGHT']

            for cam in cameras:
                camera_sample = self.lyft.get('sample_data', rec['data'][cam])
                image_paths.append(os.path.join("/tos://haomo-algorithms/c6089dc67ff976615510d22b5eaaaa4e/mjy/cam4docc/data/lyft/", camera_sample['filename']))

                car_egopose = self.lyft.get('ego_pose', camera_sample['ego_pose_token'])
                egopose_rotation = Quaternion(car_egopose['rotation']).inverse
                egopose_translation = -np.array(car_egopose['translation'])[:, None]
                world_to_car_egopose = np.vstack([
                    np.hstack((egopose_rotation.rotation_matrix, egopose_rotation.rotation_matrix @ egopose_translation)),
                    np.array([0, 0, 0, 1])
                ])

                sensor_sample = self.lyft.get('calibrated_sensor', camera_sample['calibrated_sensor_token'])
                intrinsic = torch.Tensor(sensor_sample['camera_intrinsic'])
                cam_intrinsics_ori.append(intrinsic)
                sensor_rotation = Quaternion(sensor_sample['rotation'])
                sensor_translation = np.array(sensor_sample['translation'])[:, None]
                car_egopose_to_sensor = np.vstack([
                    np.hstack((sensor_rotation.rotation_matrix, sensor_translation)),
                    np.array([0, 0, 0, 1])
                ])
                car_egopose_to_sensor = np.linalg.inv(car_egopose_to_sensor)

                lidar_to_sensor = car_egopose_to_sensor @ world_to_car_egopose @ lidar_to_world @ lidar_to_lidarego
                sensor_to_lidar =np.linalg.inv(lidar_to_sensor)

                lidar2cam_r = lidar_to_sensor[:3, :3] 
                lidar2cam_t = sensor_to_lidar[:3, -1].reshape(1,3) @ lidar2cam_r.T
                lidar2cam_rt = np.eye(4)
                lidar2cam_rt[:3, :3] = lidar2cam_r.T
                lidar2cam_rt[3, :3] = -lidar2cam_t
                viewpad = np.eye(4)
                viewpad[:intrinsic.shape[0], :intrinsic.shape[1]] = intrinsic
                lidar2img_rt = (viewpad @ lidar2cam_rt.T)
                lidar2img_rts.append(lidar2img_rt)

                cam_intrinsics.append(viewpad)
                lidar2cam_rts.append(lidar2cam_rt.T)
                
                lidar2cam_dic[cam] = lidar2cam_rt.T

            input_dict_per_frame.update(
                dict(
                    img_filename=image_paths,
                    lidar2img=lidar2img_rts,
                    cam_intrinsic=cam_intrinsics,
                    cam_intrinsics=cam_intrinsics_ori,
                    lidar2cam=lidar2cam_rts,
                    lidar2cam_dic=lidar2cam_dic,
                ))
                        
            input_seq_data['input_dict'].append(input_dict_per_frame)

            instance_map = self.record_instance(index_t, instance_map)
            future_egomotion = self.get_future_egomotion(index_t)
            input_seq_data['future_egomotion'].append(future_egomotion)
            input_seq_data['sample_token'].append(input_dict_per_frame['sample_idx'])

            scene_lidar_token.append(input_dict_per_frame['scene_token']+"_"+input_dict_per_frame['lidar_token'])
            if self.counter == self.time_receptive_field - 1:
                self.present_scene_lidar_token = input_dict_per_frame['scene_token']+"_"+input_dict_per_frame['lidar_token']

        for token in self.instance_dict.keys():
            self.instance_dict[token] = self.refine_instance_poly(self.instance_dict[token])

        input_seq_data.update(
            dict(
                time_receptive_field=self.time_receptive_field,
                sequence_length=self.sequence_length,
                egopose_list=self.egopose_list,
                ego2lidar_list=self.ego2lidar_list,
                instance_dict=self.instance_dict,
                instance_map=instance_map,
                indices=self.indices[index],
                scene_token=self.present_scene_lidar_token,
            ))

        example = self.pipeline(input_seq_data)
        return example


    def evaluate(self, results, logger=None, **kawrgs):
        '''
        Evaluate by IOU and VPQ metrics for model evaluation
        '''
        eval_results = {}
        
        ''' calculate IOU '''
        hist_for_iou = sum(results['hist_for_iou'])
        ious = cm_to_ious(hist_for_iou)
        res_table, res_dic = format_iou_results(ious, return_dic=True)
        for key, val in res_dic.items():
            eval_results['IOU_{}'.format(key)] = val
        if logger is not None:
            logger.info('IOU Evaluation')
            logger.info(res_table)        

        ''' calculate VPQ '''
        if 'vpq_metric' in results.keys() and 'vpq_len' in results.keys():
            vpq_sum = sum(results['vpq_metric'])
            eval_results['VPQ'] = vpq_sum/results['vpq_len']

        return eval_results


================================================
FILE: projects/occ_plugin/datasets/nuscenes_dataset.py
================================================
import copy
import numpy as np
from mmdet.datasets import DATASETS
from mmdet3d.datasets import NuScenesDataset
import mmcv
from os import path as osp
from mmdet.datasets import DATASETS
import torch
import numpy as np
from nuscenes.eval.common.utils import quaternion_yaw, Quaternion
from mmcv.parallel import DataContainer as DC
import random


@DATASETS.register_module()
class CustomNuScenesDataset(NuScenesDataset):
    r"""NuScenes Dataset.

    This datset only add camera intrinsics and extrinsics to the results.
    """

    def __init__(self, queue_length=4, bev_size=(200, 200), overlap_test=False, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.queue_length = queue_length
        self.overlap_test = overlap_test
        self.bev_size = bev_size
        
    def prepare_train_data(self, index):
        """
        Training data preparation.
        Args:
            index (int): Index for accessing the target data.
        Returns:
            dict: Training data dict of the corresponding index.
        """
        queue = []
        index_list = list(range(index-self.queue_length, index))
        random.shuffle(index_list)
        index_list = sorted(index_list[1:])
        index_list.append(index)
        for i in index_list:
            i = max(0, i)
            input_dict = self.get_data_info(i)
            if input_dict is None:
                return None
            self.pre_pipeline(input_dict)
            example = self.pipeline(input_dict)
            if self.filter_empty_gt and \
                    (example is None or ~(example['gt_labels_3d']._data != -1).any()):
                return None
            queue.append(example)
        return self.union2one(queue)


    def union2one(self, queue):
        imgs_list = [each['img'].data for each in queue]
        metas_map = {}
        prev_scene_token = None
        prev_pos = None
        prev_angle = None
        for i, each in enumerate(queue):
            metas_map[i] = each['img_metas'].data
            if metas_map[i]['scene_token'] != prev_scene_token:
                metas_map[i]['prev_bev_exists'] = False
                prev_scene_token = metas_map[i]['scene_token']
                prev_pos = copy.deepcopy(metas_map[i]['can_bus'][:3])
                prev_angle = copy.deepcopy(metas_map[i]['can_bus'][-1])
                metas_map[i]['can_bus'][:3] = 0
                metas_map[i]['can_bus'][-1] = 0
            else:
                metas_map[i]['prev_bev_exists'] = True
                tmp_pos = copy.deepcopy(metas_map[i]['can_bus'][:3])
                tmp_angle = copy.deepcopy(metas_map[i]['can_bus'][-1])
                metas_map[i]['can_bus'][:3] -= prev_pos
                metas_map[i]['can_bus'][-1] -= prev_angle
                prev_pos = copy.deepcopy(tmp_pos)
                prev_angle = copy.deepcopy(tmp_angle)
        queue[-1]['img'] = DC(torch.stack(imgs_list), cpu_only=False, stack=True)
        queue[-1]['img_metas'] = DC(metas_map, cpu_only=True)
        queue = queue[-1]
        return queue

    def get_data_info(self, index):
        """Get data info according to the given index.

        Args:
            index (int): Index of the sample data to get.

        Returns:
            dict: Data information that will be passed to the data \
                preprocessing pipelines. It includes the following keys:

                - sample_idx (str): Sample index.
                - pts_filename (str): Filename of point clouds.
                - sweeps (list[dict]): Infos of sweeps.
                - timestamp (float): Sample timestamp.
                - img_filename (str, optional): Image filename.
                - lidar2img (list[np.ndarray], optional): Transformations \
                    from lidar to different cameras.
                - ann_info (dict): Annotation info.
        """
        info = self.data_infos[index]
        # standard protocal modified from SECOND.Pytorch
        input_dict = dict(
            sample_idx=info['token'],
            pts_filename=info['lidar_path'],
            sweeps=info['sweeps'],
            ego2global_translation=info['ego2global_translation'],
            ego2global_rotation=info['ego2global_rotation'],
            prev_idx=info['prev'],
            next_idx=info['next'],
            scene_token=info['scene_token'],
            can_bus=info['can_bus'],
            frame_idx=info['frame_idx'],
            timestamp=info['timestamp'] / 1e6,
        )

        if self.modality['use_camera']:
            image_paths = []
            lidar2img_rts = []
            lidar2cam_rts = []
            cam_intrinsics = []
            for cam_type, cam_info in info['cams'].items():
                image_paths.append(cam_info['data_path'])
                # obtain lidar to image transformation matrix
                lidar2cam_r = np.linalg.inv(cam_info['sensor2lidar_rotation'])
                lidar2cam_t = cam_info[
                    'sensor2lidar_translation'] @ lidar2cam_r.T
                lidar2cam_rt = np.eye(4)
                lidar2cam_rt[:3, :3] = lidar2cam_r.T
                lidar2cam_rt[3, :3] = -lidar2cam_t
                intrinsic = cam_info['cam_intrinsic']
                viewpad = np.eye(4)
                viewpad[:intrinsic.shape[0], :intrinsic.shape[1]] = intrinsic
                lidar2img_rt = (viewpad @ lidar2cam_rt.T)
                lidar2img_rts.append(lidar2img_rt)

                cam_intrinsics.append(viewpad)
                lidar2cam_rts.append(lidar2cam_rt.T)

            input_dict.update(
                dict(
                    img_filename=image_paths,
                    lidar2img=lidar2img_rts,
                    cam_intrinsic=cam_intrinsics,
                    lidar2cam=lidar2cam_rts,
                ))

        if not self.test_mode:
            annos = self.get_ann_info(index)
            input_dict['ann_info'] = annos

        rotation = Quaternion(input_dict['ego2global_rotation'])
        translation = input_dict['ego2global_translation']
        can_bus = input_dict['can_bus']
        can_bus[:3] = translation
        can_bus[3:7] = rotation
        patch_angle = quaternion_yaw(rotation) / np.pi * 180
        if patch_angle < 0:
            patch_angle += 360
        can_bus[-2] = patch_angle / 180 * np.pi
        can_bus[-1] = patch_angle

        return input_dict

    def __getitem__(self, idx):
        """Get item from infos according to the given index.
        Returns:
            dict: Data dictionary of the corresponding index.
        """
        if self.test_mode:
            return self.prepare_test_data(idx)
        while True:

            data = self.prepare_train_data(idx)
            if data is None:
                idx = self._rand_another(idx)
                continue
            return data

   

================================================
FILE: projects/occ_plugin/datasets/pipelines/__init__.py
================================================
from .transform_3d import (
    PadMultiViewImage, NormalizeMultiviewImage, 
    PhotoMetricDistortionMultiViewImage, CustomCollect3D, CustomOccCollect3D, RandomScaleImageMultiViewImage)
from .formating import OccDefaultFormatBundle3D
from .loading_occupancy import LoadOccupancy
from .loading_bevdet import LoadAnnotationsBEVDepth, LoadMultiViewImageFromFiles_BEVDet
from .loading_instance import LoadInstanceWithFlow

__all__ = [
    'PadMultiViewImage', 'NormalizeMultiviewImage', 'CustomOccCollect3D', 'LoadAnnotationsBEVDepth', 'LoadMultiViewImageFromFiles_BEVDet', 'LoadOccupancy',
    'PhotoMetricDistortionMultiViewImage', 'OccDefaultFormatBundle3D', 'CustomCollect3D', 'RandomScaleImageMultiViewImage', "LoadInstanceWithFlow",
]

================================================
FILE: projects/occ_plugin/datasets/pipelines/formating.py
================================================
# Copyright (c) OpenMMLab. All rights reserved.
import numpy as np
from mmcv.parallel import DataContainer as DC

from mmdet.datasets.builder import PIPELINES
from mmdet.datasets.pipelines import to_tensor
from mmdet3d.datasets.pipelines import DefaultFormatBundle3D


@PIPELINES.register_module()
class OccDefaultFormatBundle3D(DefaultFormatBundle3D):
    """Default formatting bundle.
    It simplifies the pipeline of formatting common fields for voxels,
    including "proposals", "gt_bboxes", "gt_labels", "gt_masks" and
    "gt_semantic_seg".
    These fields are formatted as follows.
    - img: (1)transpose, (2)to tensor, (3)to DataContainer (stack=True)
    - proposals: (1)to tensor, (2)to DataContainer
    - gt_bboxes: (1)to tensor, (2)to DataContainer
    - gt_bboxes_ignore: (1)to tensor, (2)to DataContainer
    - gt_labels: (1)to tensor, (2)to DataContainer
    """
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        
    def __call__(self, results):
        """Call function to transform and format common fields in results.
        Args:
            results (dict): Result dict contains the data to convert.
        Returns:
            dict: The result dict contains the data that is formatted with
                default bundle.
        """
        # Format 3D data
        results = super(OccDefaultFormatBundle3D, self).__call__(results)

        if 'gt_occ' in results.keys():
            results['gt_occ'] = DC(to_tensor(results['gt_occ']), stack=True)
        if 'gt_occ' in results.keys():
            results['segmentation'] = DC(to_tensor(results['segmentation']), stack=True)
        if 'gt_occ' in results.keys():
            results['instance'] = DC(to_tensor(results['instance']), stack=True)
        if 'gt_occ' in results.keys():
            results['attribute_label'] = DC(to_tensor(results['attribute_label']), stack=True)
        if 'gt_occ' in results.keys():
            results['flow'] = DC(to_tensor(results['flow']), stack=True)
        if 'gt_vel' in results.keys():
            results['gt_vel'] = DC(to_tensor(results['gt_vel']), stack=False)

        return results

================================================
FILE: projects/occ_plugin/datasets/pipelines/loading_bevdet.py
================================================
# Copyright (c) OpenMMLab. All rights reserved.

import mmcv
import numpy as np
from mmdet.datasets.builder import PIPELINES
import os
import torch
from PIL import Image
from pyquaternion import Quaternion
from mmdet3d.core.bbox import LiDARInstance3DBoxes

from numpy import random
import pdb

def mmlabNormalize(img, img_norm_cfg=None):
    from mmcv.image.photometric import imnormalize
    
    if img_norm_cfg is None:
        mean = np.array([123.675, 116.28, 103.53], dtype=np.float32)
        std = np.array([58.395, 57.12, 57.375], dtype=np.float32)
        to_rgb = True
    else:
        mean = np.array(img_norm_cfg['mean'], dtype=np.float32)
        std = np.array(img_norm_cfg['std'], dtype=np.float32)
        to_rgb = img_norm_cfg['to_rgb']
    
    img = imnormalize(np.array(img), mean, std, to_rgb)
    img = torch.tensor(img).float().permute(2, 0, 1).contiguous()
    
    return img

def depth_transform(cam_depth, resize, resize_dims, crop, flip, rotate):
    """Transform depth based on ida augmentation configuration.

    Args:
        cam_depth (np array): Nx3, 3: x,y,d.
        resize (float): Resize factor.
        resize_dims (list): Final dimension.
        crop (list): x1, y1, x2, y2
        flip (bool): Whether to flip.
        rotate (float): Rotation value.

    Returns:
        np array: [h/down_ratio, w/down_ratio, d]
    """

    H, W = resize_dims
    cam_depth[:, :2] = cam_depth[:, :2] * resize
    cam_depth[:, 0] -= crop[0]
    cam_depth[:, 1] -= crop[1]
    if flip:
        cam_depth[:, 0] = resize_dims[1] - cam_depth[:, 0]

    cam_depth[:, 0] -= W / 2.0
    cam_depth[:, 1] -= H / 2.0

    h = rotate / 180 * np.pi
    rot_matrix = [
        [np.cos(h), np.sin(h)],
        [-np.sin(h), np.cos(h)],
    ]
    cam_depth[:, :2] = np.matmul(rot_matrix, cam_depth[:, :2].T).T

    cam_depth[:, 0] += W / 2.0
    cam_depth[:, 1] += H / 2.0

    depth_coords = cam_depth[:, :2].astype(np.int16)

    depth_map = np.zeros(resize_dims)
    valid_mask = ((depth_coords[:, 1] < resize_dims[0])
                  & (depth_coords[:, 0] < resize_dims[1])
                  & (depth_coords[:, 1] >= 0)
                  & (depth_coords[:, 0] >= 0))
    depth_map[depth_coords[valid_mask, 1],
              depth_coords[valid_mask, 0]] = cam_depth[valid_mask, 2]

    return torch.Tensor(depth_map)

@PIPELINES.register_module()
class LoadMultiViewImageFromFiles_BEVDet(object):
    """Load multi channel images from a list of separate channel files.

    Expects results['img_filename'] to be a list of filenames.

    Args:
        to_float32 (bool): Whether to convert the img to float32.
            Defaults to False.
        color_type (str): Color type of the file. Defaults to 'unchanged'.
    """

    def __init__(self, data_config, is_train=False, using_ego=True, colorjitter=False,
                 sequential=False, aligned=False, trans_only=True, img_norm_cfg=None,
                 mmlabnorm=False, load_depth=False, depth_gt_path=None, data_root=None, test_mode=False, use_lyft=False):
        self.is_train = is_train
        self.data_config = data_config
        
        # using mean camera ego frame, rather than the lidar coordinates
        self.using_ego = using_ego
        self.normalize_img = mmlabNormalize
        self.img_norm_cfg = img_norm_cfg
        
        self.sequential = sequential
        self.aligned = aligned
        self.trans_only = trans_only
        self.load_depth = load_depth
        
        self.depth_gt_path = depth_gt_path
        self.data_root = data_root
        
        self.colorjitter = colorjitter
        self.pipeline_colorjitter = PhotoMetricDistortionMultiViewImage()

        self.test_mode = test_mode

        self.use_lyft = use_lyft

    def get_rot(self,h):
        return torch.Tensor([
            [np.cos(h), np.sin(h)],
            [-np.sin(h), np.cos(h)],
        ])

    def img_transform(self, img, post_rot, post_tran,
                      resize, resize_dims, crop,
                      flip, rotate):
        # adjust image
        img = self.img_transform_core(img, resize_dims, crop, flip, rotate)

        # post-homography transformation
        post_rot *= resize
        post_tran -= torch.Tensor(crop[:2])
        if flip:
            A = torch.Tensor([[-1, 0], [0, 1]])
            b = torch.Tensor([crop[2] - crop[0], 0])
            post_rot = A.matmul(post_rot)
            post_tran = A.matmul(post_tran) + b
        A = self.get_rot(rotate / 180 * np.pi)
        b = torch.Tensor([crop[2] - crop[0], crop[3] - crop[1]]) / 2
        b = A.matmul(-b) + b
        post_rot = A.matmul(post_rot)
        post_tran = A.matmul(post_tran) + b

        return img, post_rot, post_tran

    def img_transform_core(self, img, resize_dims, crop, flip, rotate):
        # adjust image
        img = img.resize(resize_dims)
        img = img.crop(crop)
        if flip:
            img = img.transpose(method=Image.FLIP_LEFT_RIGHT)
        img = img.rotate(rotate)
        return img

    def choose_cams(self):
        if self.is_train and self.data_config['Ncams'] < len(self.data_config['cams']):
            cam_names = np.random.choice(self.data_config['cams'], self.data_config['Ncams'],
                                    replace=False)
        else:
            cam_names = self.data_config['cams']
        return cam_names

    def sample_augmentation(self, H , W, flip=None, scale=None):
        fH, fW = self.data_config['input_size']
        if self.is_train:
            resize = float(fW)/float(W)
            resize += np.random.uniform(*self.data_config['resize'])
            resize_dims = (int(W * resize), int(H * resize))
            newW, newH = resize_dims
            crop_h = int((1 - np.random.uniform(*self.data_config['crop_h'])) * newH) - fH
            crop_w = int(np.random.uniform(0, max(0, newW - fW)))
            crop = (crop_w, crop_h, crop_w + fW, crop_h + fH)
            # We do not use flip here to keep right forecasting
            flip = None
            rotate = 0
        else:
            resize = float(fW)/float(W)
            resize += self.data_config.get('resize_test', 0.0)
            if scale is not None:
                resize = scale
            resize_dims = (int(W * resize), int(H * resize))
            newW, newH = resize_dims
            crop_h = int((1 - np.mean(self.data_config['crop_h'])) * newH) - fH
            crop_w = int(max(0, newW - fW) / 2)
            crop = (crop_w, crop_h, crop_w + fW, crop_h + fH)
            flip = None
            rotate = 0
        return resize, resize_dims, crop, flip, rotate

    def get_sensor2ego_transformation(self, cam_info, key_info, cam_name):
        w, x, y, z = cam_info['cams'][cam_name]['sensor2ego_rotation']
        # sweep sensor to sweep ego
        sweepsensor2sweepego_rot = torch.Tensor(
            Quaternion(w, x, y, z).rotation_matrix)
        sweepsensor2sweepego_tran = torch.Tensor(
            cam_info['cams'][cam_name]['sensor2ego_translation'])
        sweepsensor2sweepego = sweepsensor2sweepego_rot.new_zeros(
            (4, 4))
        sweepsensor2sweepego[3, 3] = 1
        sweepsensor2sweepego[:3, :3] = sweepsensor2sweepego_rot
        sweepsensor2sweepego[:3, -1] = sweepsensor2sweepego_tran
        # sweep ego to global
        w, x, y, z = cam_info['cams'][cam_name]['ego2global_rotation']
        sweepego2global_rot = torch.Tensor(
            Quaternion(w, x, y, z).rotation_matrix)
        sweepego2global_tran = torch.Tensor(
            cam_info['cams'][cam_name]['ego2global_translation'])
        sweepego2global = sweepego2global_rot.new_zeros((4, 4))
        sweepego2global[3, 3] = 1
        sweepego2global[:3, :3] = sweepego2global_rot
        sweepego2global[:3, -1] = sweepego2global_tran

        # global sensor to cur ego
        w, x, y, z = key_info['cams'][cam_name]['ego2global_rotation']
        keyego2global_rot = torch.Tensor(
            Quaternion(w, x, y, z).rotation_matrix)
        keyego2global_tran = torch.Tensor(
            key_info['cams'][cam_name]['ego2global_translation'])
        keyego2global = keyego2global_rot.new_zeros((4, 4))
        keyego2global[3, 3] = 1
        keyego2global[:3, :3] = keyego2global_rot
        keyego2global[:3, -1] = keyego2global_tran
        global2keyego = keyego2global.inverse()

        # cur ego to sensor
        w, x, y, z = key_info['cams'][cam_name]['sensor2ego_rotation']
        keysensor2keyego_rot = torch.Tensor(
            Quaternion(w, x, y, z).rotation_matrix)
        keysensor2keyego_tran = torch.Tensor(
            key_info['cams'][cam_name]['sensor2ego_translation'])
        keysensor2keyego = keysensor2keyego_rot.new_zeros((4, 4))
        keysensor2keyego[3, 3] = 1
        keysensor2keyego[:3, :3] = keysensor2keyego_rot
        keysensor2keyego[:3, -1] = keysensor2keyego_tran
        keyego2keysensor = keysensor2keyego.inverse()
        keysensor2sweepsensor = (
                keyego2keysensor @ global2keyego @ sweepego2global
                @ sweepsensor2sweepego).inverse()
        sweepsensor2keyego = global2keyego @ sweepego2global @ \
                             sweepsensor2sweepego
        return sweepsensor2keyego, keysensor2sweepsensor
    
    def get_sensor2lidar_transformation(self, cam_info, cam_name, sample_info):
        w, x, y, z = cam_info['cams'][cam_name]['sensor2ego_rotation']
        # sweep sensor to sweep ego
        sweepsensor2sweepego_rot = torch.Tensor(
            Quaternion(w, x, y, z).rotation_matrix)
        sweepsensor2sweepego_tran = torch.Tensor(
            cam_info['cams'][cam_name]['sensor2ego_translation'])
        sweepsensor2sweepego = sweepsensor2sweepego_rot.new_zeros(
            (4, 4))
        sweepsensor2sweepego[3, 3] = 1
        sweepsensor2sweepego[:3, :3] = sweepsensor2sweepego_rot
        sweepsensor2sweepego[:3, -1] = sweepsensor2sweepego_tran
        # sweep ego to global
        w, x, y, z = cam_info['cams'][cam_name]['ego2global_rotation']
        sweepego2global_rot = torch.Tensor(
            Quaternion(w, x, y, z).rotation_matrix)
        sweepego2global_tran = torch.Tensor(
            cam_info['cams'][cam_name]['ego2global_translation'])
        sweepego2global = sweepego2global_rot.new_zeros((4, 4))
        sweepego2global[3, 3] = 1
        sweepego2global[:3, :3] = sweepego2global_rot
        sweepego2global[:3, -1] = sweepego2global_tran
        
        # global to lidar ego
        w, x, y, z = sample_info['ego2global_rotation']
        lidarego2global_rot = torch.Tensor(
            Quaternion(w, x, y, z).rotation_matrix)
        lidarego2global_tran = torch.Tensor(sample_info['ego2global_translation'])
        lidarego2global = lidarego2global_rot.new_zeros((4, 4))
        lidarego2global[3, 3] = 1
        lidarego2global[:3, :3] = lidarego2global_rot
        lidarego2global[:3, -1] = lidarego2global_tran
        global2lidarego = lidarego2global.inverse()
        
        # lidar ego to lidar
        w, x, y, z = sample_info['lidar2ego_rotation']
        lidar2ego_rot = torch.Tensor(
            Quaternion(w, x, y, z).rotation_matrix)
        lidar2ego_tran = torch.Tensor(sample_info['lidar2ego_translation'])
        lidar2ego = lidar2ego_rot.new_zeros((4, 4))
        lidar2ego[3, 3] = 1
        lidar2ego[:3, :3] = lidar2ego_rot
        lidar2ego[:3, -1] = lidar2ego_tran
        ego2lidar = lidar2ego.inverse()
        
        # camera to lidar
        sweepsensor2lidar = ego2lidar @ global2lidarego @ sweepego2global @ sweepsensor2sweepego
        
        return sweepsensor2lidar


    def get_seq_inputs(self, results, flip=None, scale=None):
        
        cam_names = self.choose_cams()
        results['cam_names'] = cam_names

        if self.use_lyft:
            filename = results['input_dict'][0]['img_filename'][0]
        else:
            cam_data = results['input_dict'][0]['curr']['cams'][cam_names[0]]
            filename = cam_data['data_path']
            filename = os.path.join(self.data_root, filename.split('/')[-3], filename.split('/')[-2], filename.split('/')[-1])
            
        img = Image.open(filename)

        img_augs = self.sample_augmentation(H=img.height,
                                            W=img.width,
                                            flip=flip,
                                            scale=scale)
        resize, resize_dims, crop, flip, rotate = img_augs

        sequence_length = results['sequence_length']

        imgs_seq = []
        rots_seq = []
        trans_seq = []
        intrins_seq = []
        post_rots_seq = []
        post_trans_seq = []
        gt_depths_seq = list()
        canvas_seq = []
        sensor2sensors_seq = []

        for counter in range(sequence_length):

            input_dict_curr = results['input_dict'][counter]

            imgs = []
            rots = []
            trans = []
            intrins = []
            post_rots = []
            post_trans = []
            gt_depths = list()
            canvas = []
            sensor2sensors =

Download .txt

gitextract_hgf40bk9/

├── LICENSE
├── README.md
├── data/
│   ├── README.md
│   ├── cam4docc/
│   │   ├── .gitkeep
│   │   ├── GMO/
│   │   │   └── .gitkeep
│   │   ├── GMO_lyft/
│   │   │   └── .gitkeep
│   │   ├── MMO/
│   │   │   └── .gitkeep
│   │   └── MMO_lyft/
│   │       └── .gitkeep
│   └── nuscenes/
│       └── .gitkeep
├── other_baselines/
│   ├── README.md
│   ├── lifted_2d/
│   │   └── eval_lifted_2d.py
│   ├── static_world/
│   │   └── eval_static_world.py
│   └── voxel_pcp/
│       └── eval_voxel_pcp.py
├── projects/
│   ├── __init__.py
│   ├── configs/
│   │   ├── _base_/
│   │   │   ├── datasets/
│   │   │   │   ├── custom_lyft-3d.py
│   │   │   │   ├── custom_nus-3d.py
│   │   │   │   └── custom_waymo-3d.py
│   │   │   ├── default_runtime.py
│   │   │   └── schedules/
│   │   │       ├── cosine.py
│   │   │       ├── cyclic_20e.py
│   │   │       ├── cyclic_40e.py
│   │   │       ├── mmdet_schedule_1x.py
│   │   │       ├── schedule_2x.py
│   │   │       ├── schedule_3x.py
│   │   │       ├── seg_cosine_150e.py
│   │   │       ├── seg_cosine_200e.py
│   │   │       └── seg_cosine_50e.py
│   │   ├── baselines/
│   │   │   ├── OCFNet_in_Cam4DOcc_V1.1.py
│   │   │   ├── OCFNet_in_Cam4DOcc_V1.1_lyft.py
│   │   │   ├── OCFNet_in_Cam4DOcc_V1.2.py
│   │   │   └── OCFNet_in_Cam4DOcc_V1.2_lyft.py
│   │   └── datasets/
│   │       └── custom_nus-3d.py
│   └── occ_plugin/
│       ├── __init__.py
│       ├── core/
│       │   ├── __init__.py
│       │   ├── evaluation/
│       │   │   ├── __init__.py
│       │   │   ├── efficiency_hooks.py
│       │   │   └── eval_hooks.py
│       │   └── visualizer/
│       │       ├── __init__.py
│       │       └── show_occ.py
│       ├── datasets/
│       │   ├── __init__.py
│       │   ├── builder.py
│       │   ├── cam4docc_dataset.py
│       │   ├── cam4docc_lyft_dataset.py
│       │   ├── nuscenes_dataset.py
│       │   ├── pipelines/
│       │   │   ├── __init__.py
│       │   │   ├── formating.py
│       │   │   ├── loading_bevdet.py
│       │   │   ├── loading_instance.py
│       │   │   ├── loading_occupancy.py
│       │   │   └── transform_3d.py
│       │   └── samplers/
│       │       ├── __init__.py
│       │       ├── distributed_sampler.py
│       │       ├── group_sampler.py
│       │       └── sampler.py
│       ├── occupancy/
│       │   ├── __init__.py
│       │   ├── apis/
│       │   │   ├── __init__.py
│       │   │   ├── mmdet_train.py
│       │   │   ├── test.py
│       │   │   └── train.py
│       │   ├── backbones/
│       │   │   ├── __init__.py
│       │   │   ├── pred_block.py
│       │   │   └── resnet3d.py
│       │   ├── dense_heads/
│       │   │   ├── __init__.py
│       │   │   ├── flow_head.py
│       │   │   ├── lovasz_softmax.py
│       │   │   ├── occ_head.py
│       │   │   └── utils.py
│       │   ├── detectors/
│       │   │   ├── __init__.py
│       │   │   ├── bevdepth.py
│       │   │   └── ocfnet.py
│       │   ├── fuser/
│       │   │   ├── __init__.py
│       │   │   ├── addfuse.py
│       │   │   ├── convfuse.py
│       │   │   └── visfuse.py
│       │   ├── image2bev/
│       │   │   ├── ViewTransformerLSSBEVDepth.py
│       │   │   ├── ViewTransformerLSSVoxel.py
│       │   │   └── __init__.py
│       │   ├── necks/
│       │   │   ├── __init__.py
│       │   │   ├── fpn3d.py
│       │   │   └── second_fpn_3d.py
│       │   └── voxel_encoder/
│       │       ├── __init__.py
│       │       └── sparse_lidar_enc.py
│       ├── ops/
│       │   ├── __init__.py
│       │   └── occ_pooling/
│       │       ├── OCC_Pool.py
│       │       ├── __init__.py
│       │       └── src/
│       │           ├── occ_pool.cpp
│       │           └── occ_pool_cuda.cu
│       └── utils/
│           ├── __init__.py
│           ├── coordinate_transform.py
│           ├── formating.py
│           ├── gaussian.py
│           ├── geometry.py
│           ├── metric_util.py
│           ├── nusc_param.py
│           ├── semkitti.py
│           └── voxel_to_points.py
├── run.sh
├── run_eval.sh
├── setup.py
├── tools/
│   ├── dist_test.sh
│   ├── dist_train.sh
│   ├── gen_data/
│   │   └── gen_depth_gt.py
│   ├── misc/
│   │   ├── browse_dataset.py
│   │   ├── fuse_conv_bn.py
│   │   ├── print_config.py
│   │   └── visualize_results.py
│   ├── test.py
│   └── train.py
└── viz/
    ├── viz_gt.py
    └── viz_pred.py

Download .txt

SYMBOL INDEX (438 symbols across 57 files)

FILE: other_baselines/lifted_2d/eval_lifted_2d.py
  function cm_to_ious (line 37) | def cm_to_ious(cm):
  function fast_hist (line 48) | def fast_hist(pred, label, max_label=18):

FILE: other_baselines/static_world/eval_static_world.py
  function cm_to_ious (line 23) | def cm_to_ious(cm):
  function fast_hist (line 34) | def fast_hist(pred, label, max_label=18):

FILE: other_baselines/voxel_pcp/eval_voxel_pcp.py
  function voxel2world (line 36) | def voxel2world(voxel):
  function world2voxel (line 42) | def world2voxel(world):
  function cm_to_ious (line 48) | def cm_to_ious(cm):
  function fast_hist (line 59) | def fast_hist(pred, label, max_label=18):
  function nb_process_label (line 66) | def nb_process_label(processed_label, sorted_label_voxel_pair):
  function get_ego2lidar_pose (line 82) | def get_ego2lidar_pose(rec):
  function get_lidar_pose (line 90) | def get_lidar_pose(rec):

FILE: projects/occ_plugin/core/evaluation/efficiency_hooks.py
  class OccEfficiencyHook (line 12) | class OccEfficiencyHook(Hook):
    method __init__ (line 13) | def __init__(self, dataloader,  **kwargs):
    method construct_input (line 17) | def construct_input(self, DUMMY_SHAPE=None, m_info=None):
    method before_run (line 40) | def before_run(self, runner):
    method after_run (line 95) | def after_run(self, runner):
    method before_epoch (line 98) | def before_epoch(self, runner):
    method after_epoch (line 101) | def after_epoch(self, runner):
    method before_iter (line 104) | def before_iter(self, runner):
    method after_iter (line 107) | def after_iter(self, runner):

FILE: projects/occ_plugin/core/evaluation/eval_hooks.py
  class OccEvalHook (line 13) | class OccEvalHook(BaseEvalHook):
    method __init__ (line 14) | def __init__(self, *args,  **kwargs):
    method _do_evaluate (line 17) | def _do_evaluate(self, runner):
  class OccDistEvalHook (line 31) | class OccDistEvalHook(BaseDistEvalHook):
    method __init__ (line 32) | def __init__(self, *args,  **kwargs):
    method _do_evaluate (line 35) | def _do_evaluate(self, runner):

FILE: projects/occ_plugin/core/visualizer/show_occ.py
  function save_occ (line 8) | def save_occ(pred_c, pred_f, img_metas, path, visible_mask=None, gt_occ=...

FILE: projects/occ_plugin/datasets/builder.py
  function build_dataloader (line 18) | def build_dataloader(dataset,
  function worker_init_fn (line 94) | def worker_init_fn(worker_id, num_workers, rank, seed):
  function custom_build_dataset (line 121) | def custom_build_dataset(cfg, default_args=None):

FILE: projects/occ_plugin/datasets/cam4docc_dataset.py
  class Cam4DOccDataset (line 20) | class Cam4DOccDataset(NuScenesDataset):
    method __init__ (line 21) | def __init__(self, occ_size, pc_range, occ_root, idx_root, ori_data_ro...
    method _set_group_flag (line 79) | def _set_group_flag(self):
    method __len__ (line 85) | def __len__(self):
    method __getitem__ (line 91) | def __getitem__(self, idx):
    method get_indices (line 112) | def get_indices(self):
    method get_lidar_pose (line 141) | def get_lidar_pose(self, rec):
    method get_ego2lidar_pose (line 152) | def get_ego2lidar_pose(self, rec):
    method record_instance (line 162) | def record_instance(self, idx, instance_map):
    method get_future_egomotion (line 240) | def get_future_egomotion(self, idx):
    method _check_consistency (line 273) | def _check_consistency(translation, prev_translation, threshold=1.0):
    method refine_instance_poly (line 284) | def refine_instance_poly(self, instance):
    method prepare_train_data (line 308) | def prepare_train_data(self, index):
    method prepare_test_data (line 319) | def prepare_test_data(self, index):
    method prepare_sequential_data (line 332) | def prepare_sequential_data(self, index):
    method get_data_info (line 386) | def get_data_info(self, index):
    method evaluate (line 455) | def evaluate(self, results, logger=None, **kawrgs):

FILE: projects/occ_plugin/datasets/cam4docc_lyft_dataset.py
  class Cam4DOccLyftDataset (line 23) | class Cam4DOccLyftDataset(Dataset):
    method __init__ (line 24) | def __init__(self, occ_size, pc_range, occ_root, idx_root, ori_data_ro...
    method _set_group_flag (line 108) | def _set_group_flag(self):
    method __len__ (line 114) | def __len__(self):
    method __getitem__ (line 120) | def __getitem__(self, idx):
    method get_scenes (line 141) | def get_scenes(self):
    method get_samples (line 151) | def get_samples(self):
    method get_indices (line 163) | def get_indices(self):
    method get_lidar_pose (line 192) | def get_lidar_pose(self, rec):
    method get_ego2lidar_pose (line 205) | def get_ego2lidar_pose(self, rec):
    method record_instance (line 218) | def record_instance(self, idx, instance_map):
    method get_future_egomotion (line 287) | def get_future_egomotion(self, idx):
    method _check_consistency (line 326) | def _check_consistency(translation, prev_translation, threshold=1.0):
    method refine_instance_poly (line 337) | def refine_instance_poly(self, instance):
    method prepare_train_data (line 361) | def prepare_train_data(self, index):
    method prepare_test_data (line 369) | def prepare_test_data(self, index):
    method prepare_sequential_data (line 379) | def prepare_sequential_data(self, index):
    method evaluate (line 521) | def evaluate(self, results, logger=None, **kawrgs):

FILE: projects/occ_plugin/datasets/nuscenes_dataset.py
  class CustomNuScenesDataset (line 16) | class CustomNuScenesDataset(NuScenesDataset):
    method __init__ (line 22) | def __init__(self, queue_length=4, bev_size=(200, 200), overlap_test=F...
    method prepare_train_data (line 28) | def prepare_train_data(self, index):
    method union2one (line 55) | def union2one(self, queue):
    method get_data_info (line 83) | def get_data_info(self, index):
    method __getitem__ (line 166) | def __getitem__(self, idx):

FILE: projects/occ_plugin/datasets/pipelines/formating.py
  class OccDefaultFormatBundle3D (line 11) | class OccDefaultFormatBundle3D(DefaultFormatBundle3D):
    method __init__ (line 23) | def __init__(self, *args, **kwargs):
    method __call__ (line 27) | def __call__(self, results):

FILE: projects/occ_plugin/datasets/pipelines/loading_bevdet.py
  function mmlabNormalize (line 15) | def mmlabNormalize(img, img_norm_cfg=None):
  function depth_transform (line 32) | def depth_transform(cam_depth, resize, resize_dims, crop, flip, rotate):
  class LoadMultiViewImageFromFiles_BEVDet (line 80) | class LoadMultiViewImageFromFiles_BEVDet(object):
    method __init__ (line 91) | def __init__(self, data_config, is_train=False, using_ego=True, colorj...
    method get_rot (line 117) | def get_rot(self,h):
    method img_transform (line 123) | def img_transform(self, img, post_rot, post_tran,
    method img_transform_core (line 145) | def img_transform_core(self, img, resize_dims, crop, flip, rotate):
    method choose_cams (line 154) | def choose_cams(self):
    method sample_augmentation (line 162) | def sample_augmentation(self, H , W, flip=None, scale=None):
    method get_sensor2ego_transformation (line 189) | def get_sensor2ego_transformation(self, cam_info, key_info, cam_name):
    method get_sensor2lidar_transformation (line 242) | def get_sensor2lidar_transformation(self, cam_info, cam_name, sample_i...
    method get_seq_inputs (line 293) | def get_seq_inputs(self, results, flip=None, scale=None):
    method __call__ (line 427) | def __call__(self, results):
  function bev_transform (line 433) | def bev_transform(rotate_angle, scale_ratio, flip_dx, flip_dy):
  class LoadAnnotationsBEVDepth (line 450) | class LoadAnnotationsBEVDepth():
    method __init__ (line 451) | def __init__(self, bda_aug_conf, classes, is_train=True,
    method sample_bda_augmentation (line 465) | def sample_bda_augmentation(self):
    method __call__ (line 479) | def __call__(self, results):
  class PhotoMetricDistortionMultiViewImage (line 498) | class PhotoMetricDistortionMultiViewImage(object):
    method __init__ (line 517) | def __init__(self,
    method __call__ (line 527) | def __call__(self, img):

FILE: projects/occ_plugin/datasets/pipelines/loading_instance.py
  class LoadInstanceWithFlow (line 14) | class LoadInstanceWithFlow(object):
    method __init__ (line 15) | def __init__(self, cam4docc_dataset_path, grid_size=[512, 512, 40], pc...
    method get_poly_region (line 45) | def get_poly_region(self, instance_annotation, present_egopose, presen...
    method fill_occupancy (line 78) | def fill_occupancy(self, occ_instance, occ_segmentation, occ_attribute...
    method get_label (line 120) | def get_label(self, input_seq_data):
    method generate_flow (line 173) | def generate_flow(flow, occ_instance_seq, instance, instance_id):
    method get_flow_label (line 217) | def get_flow_label(self, input_seq_data, ignore_index=255):
    method convert_instance_mask_to_center_and_offset_label (line 239) | def convert_instance_mask_to_center_and_offset_label(input_seq_data, i...
    method __call__ (line 270) | def __call__(self, results):

FILE: projects/occ_plugin/datasets/pipelines/loading_occupancy.py
  class LoadOccupancy (line 15) | class LoadOccupancy(object):
    method __init__ (line 17) | def __init__(self, to_float32=True, occ_path=None, grid_size=[512, 512...
    method get_seq_pseudo_occ (line 34) | def get_seq_pseudo_occ(self, results):
    method get_seq_occ (line 46) | def get_seq_occ(self, results):
    method __call__ (line 114) | def __call__(self, results):
    method voxel2world (line 122) | def voxel2world(self, voxel):
    method world2voxel (line 129) | def world2voxel(self, world):
    method __repr__ (line 136) | def __repr__(self):
    method project_points (line 142) | def project_points(self, points, rots, trans, intrins, post_rots, post...
  function nb_process_img_points (line 164) | def nb_process_img_points(basic_valid_occ, depth_canva, nb_valid_mask):
  function nb_process_label_withvel (line 182) | def nb_process_label_withvel(processed_label, sorted_label_voxel_pair):
  function nb_process_label (line 201) | def nb_process_label(processed_label, sorted_label_voxel_pair):

FILE: projects/occ_plugin/datasets/pipelines/transform_3d.py
  class PadMultiViewImage (line 8) | class PadMultiViewImage(object):
    method __init__ (line 19) | def __init__(self, size=None, size_divisor=None, pad_val=0):
    method _pad_img (line 27) | def _pad_img(self, results):
    method __call__ (line 43) | def __call__(self, results):
    method __repr__ (line 53) | def __repr__(self):
  class NormalizeMultiviewImage (line 62) | class NormalizeMultiviewImage(object):
    method __init__ (line 72) | def __init__(self, mean, std, to_rgb=True):
    method __call__ (line 78) | def __call__(self, results):
    method __repr__ (line 92) | def __repr__(self):
  class PhotoMetricDistortionMultiViewImage (line 99) | class PhotoMetricDistortionMultiViewImage:
    method __init__ (line 118) | def __init__(self,
    method __call__ (line 128) | def __call__(self, results):
    method __repr__ (line 187) | def __repr__(self):
  class CustomCollect3D (line 200) | class CustomCollect3D(object):
    method __init__ (line 244) | def __init__(self,
    method __call__ (line 258) | def __call__(self, results):
    method __repr__ (line 281) | def __repr__(self):
  class CustomOccCollect3D (line 287) | class CustomOccCollect3D(object):
    method __init__ (line 331) | def __init__(self,
    method __call__ (line 345) | def __call__(self, results):
    method __repr__ (line 373) | def __repr__(self):
  class RandomScaleImageMultiViewImage (line 379) | class RandomScaleImageMultiViewImage(object):
    method __init__ (line 385) | def __init__(self, scales=[]):
    method __call__ (line 389) | def __call__(self, results):
    method __repr__ (line 414) | def __repr__(self):

FILE: projects/occ_plugin/datasets/samplers/distributed_sampler.py
  class DistributedSampler (line 9) | class DistributedSampler(_DistributedSampler):
    method __init__ (line 11) | def __init__(self,
    method __iter__ (line 22) | def __iter__(self):

FILE: projects/occ_plugin/datasets/samplers/group_sampler.py
  class DistributedGroupSampler (line 14) | class DistributedGroupSampler(Sampler):
    method __init__ (line 32) | def __init__(self,
    method __iter__ (line 61) | def __iter__(self):
    method __len__ (line 104) | def __len__(self):
    method set_epoch (line 107) | def set_epoch(self, epoch):

FILE: projects/occ_plugin/datasets/samplers/sampler.py
  function build_sampler (line 6) | def build_sampler(cfg, default_args):

FILE: projects/occ_plugin/occupancy/apis/mmdet_train.py
  function custom_train_detector (line 27) | def custom_train_detector(model,

FILE: projects/occ_plugin/occupancy/apis/test.py
  function custom_encode_mask_results (line 25) | def custom_encode_mask_results(mask_results):
  function custom_single_gpu_test (line 45) | def custom_single_gpu_test(model, data_loader, show=False, out_dir=None,...
  function custom_multi_gpu_test (line 74) | def custom_multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=F...
  function collect_results_cpu (line 141) | def collect_results_cpu(result_part, size, tmpdir=None, type='list'):

FILE: projects/occ_plugin/occupancy/apis/train.py
  function custom_train_model (line 5) | def custom_train_model(model,
  function train_model (line 30) | def train_model(model,

FILE: projects/occ_plugin/occupancy/backbones/pred_block.py
  class Residual (line 15) | class Residual(nn.Module):
    method __init__ (line 16) | def __init__(
    method forward (line 51) | def forward(self, x):
  class Predictor (line 60) | class Predictor(nn.Module):
    method __init__ (line 61) | def __init__(
    method forward (line 80) | def forward(self, x):

FILE: projects/occ_plugin/occupancy/backbones/resnet3d.py
  function get_inplanes (line 13) | def get_inplanes():
  function conv3x3x3 (line 17) | def conv3x3x3(in_planes, out_planes, stride=1):
  function conv1x1x1 (line 26) | def conv1x1x1(in_planes, out_planes, stride=1):
  class BasicBlock (line 34) | class BasicBlock(nn.Module):
    method __init__ (line 37) | def __init__(self, in_planes, planes, stride=1, downsample=None, norm_...
    method forward (line 48) | def forward(self, x):
  class Bottleneck (line 67) | class Bottleneck(nn.Module):
    method __init__ (line 70) | def __init__(self, in_planes, planes, stride=1, downsample=None, norm_...
    method forward (line 83) | def forward(self, x):
  class CustomResNet3D (line 106) | class CustomResNet3D(BaseModule):
    method __init__ (line 107) | def __init__(self,
    method _downsample_basic_block (line 160) | def _downsample_basic_block(self, x, planes, stride):
    method _make_layer (line 171) | def _make_layer(self, block, planes, blocks, shortcut_type, stride=1, ...
    method forward (line 196) | def forward(self, x):
  function generate_model (line 207) | def generate_model(model_depth, **kwargs):

FILE: projects/occ_plugin/occupancy/dense_heads/flow_head.py
  class FlowHead (line 18) | class FlowHead(nn.Module):
    method __init__ (line 19) | def __init__(
    method forward_coarse_voxel (line 89) | def forward_coarse_voxel(self, voxel_feats):
    method forward (line 113) | def forward(self, voxel_feats, img_feats=None, transform=None, **kwargs):
    method loss_voxel (line 125) | def loss_voxel(self, output_voxels, target_voxels, tag):
    method loss_point (line 141) | def loss_point(self, fine_coord, fine_output, target_voxels, tag):
    method loss (line 158) | def loss(self, output_voxels=None,

FILE: projects/occ_plugin/occupancy/dense_heads/lovasz_softmax.py
  function lovasz_grad (line 21) | def lovasz_grad(gt_sorted):
  function iou_binary (line 36) | def iou_binary(preds, labels, EMPTY=1., ignore=None, per_image=True):
  function iou (line 56) | def iou(preds, labels, C, EMPTY=1., ignore=None, per_image=False):
  function lovasz_hinge (line 81) | def lovasz_hinge(logits, labels, per_image=True, ignore=None):
  function lovasz_hinge_flat (line 97) | def lovasz_hinge_flat(logits, labels):
  function flatten_binary_scores (line 117) | def flatten_binary_scores(scores, labels, ignore=None):
  class StableBCELoss (line 132) | class StableBCELoss(torch.nn.modules.Module):
    method __init__ (line 133) | def __init__(self):
    method forward (line 135) | def forward(self, input, target):
  function binary_xloss (line 141) | def binary_xloss(logits, labels, ignore=None):
  function lovasz_softmax (line 156) | def lovasz_softmax(probas, labels, classes='present', per_image=False, i...
  function lovasz_softmax_flat (line 174) | def lovasz_softmax_flat(probas, labels, classes='present'):
  function flatten_probas (line 205) | def flatten_probas(probas, labels, ignore=None):
  function xloss (line 234) | def xloss(logits, labels, ignore=None):
  function jaccard_loss (line 240) | def jaccard_loss(probas, labels,ignore=None, smooth = 100, bk_class = No...
  function hinge_jaccard_loss (line 268) | def hinge_jaccard_loss(probas, labels,ignore=None, classes = 'present', ...
  function isnan (line 305) | def isnan(x):
  function mean (line 309) | def mean(l, ignore_nan=False, empty=0):

FILE: projects/occ_plugin/occupancy/dense_heads/occ_head.py
  class OccHead (line 17) | class OccHead(nn.Module):
    method __init__ (line 18) | def __init__(
    method forward_coarse_voxel (line 112) | def forward_coarse_voxel(self, voxel_feats):
    method forward (line 137) | def forward(self, voxel_feats, img_feats=None, transform=None, **kwargs):
    method loss_voxel (line 149) | def loss_voxel(self, output_voxels, target_voxels, tag):
    method loss_point (line 174) | def loss_point(self, fine_coord, fine_output, target_voxels, tag):
    method loss (line 191) | def loss(self, output_voxels=None,

FILE: projects/occ_plugin/occupancy/dense_heads/utils.py
  function initial_voxelize (line 17) | def initial_voxelize(z: PointTensor, after_res) -> SparseTensor:
  function point_to_voxel (line 46) | def point_to_voxel(x: SparseTensor, z: PointTensor) -> SparseTensor:
  function voxel_to_point (line 71) | def voxel_to_point(x: SparseTensor, z: PointTensor, nearest=False) -> to...
  function range_to_point (line 107) | def range_to_point(x,px,py):
  function point_to_range (line 118) | def point_to_range(range_shape,pF,px,py):

FILE: projects/occ_plugin/occupancy/detectors/bevdepth.py
  class BEVDet (line 15) | class BEVDet(CenterPoint):
    method __init__ (line 16) | def __init__(self, img_view_transformer=None,
    method image_encoder (line 36) | def image_encoder(self, img):
    method bev_encoder (line 50) | def bev_encoder(self, x):
    method extract_img_feat (line 57) | def extract_img_feat(self, img, img_metas):
    method extract_feat (line 64) | def extract_feat(self, points, img, img_metas):
    method forward_train (line 70) | def forward_train(self,
    method forward_test (line 115) | def forward_test(self, points=None, img_metas=None, img_inputs=None, *...
    method aug_test (line 147) | def aug_test(self, points, img_metas, img=None, rescale=False):
    method simple_test (line 157) | def simple_test(self, points, img_metas, img=None, rescale=False):
    method forward_dummy (line 167) | def forward_dummy(self, points=None, img_metas=None, img_inputs=None, ...
  class BEVDet4D (line 181) | class BEVDet4D(BEVDet):
    method __init__ (line 182) | def __init__(self, pre_process=None,
    method shift_feature (line 195) | def shift_feature(self, input, trans, rots):
    method prepare_bev_feat (line 252) | def prepare_bev_feat(self, img, rot, tran, intrin, post_rot, post_tran...
    method extract_img_feat (line 260) | def extract_img_feat(self, img, img_metas):
  class BEVDepth_Base (line 299) | class BEVDepth_Base(object):
    method extract_feat (line 300) | def extract_feat(self, points, img, img_metas):
    method simple_test (line 306) | def simple_test(self, points, img_metas, img=None, rescale=False):
    method forward_train (line 315) | def forward_train(self,
  class BEVDepth (line 370) | class BEVDepth(BEVDepth_Base, BEVDet):
    method extract_img_feat (line 371) | def extract_img_feat(self, img, img_metas):
  class BEVDepth4D (line 388) | class BEVDepth4D(BEVDepth_Base, BEVDet4D):
    method prepare_bev_feat (line 389) | def prepare_bev_feat(self, img, rot, tran, intrin,
    method extract_img_feat (line 398) | def extract_img_feat(self, img, img_metas):
  class BEVStereo (line 445) | class BEVStereo(BEVDepth4D):
    method __init__ (line 446) | def __init__(self, bevdet_model=False, **kwargs):
    method image_encoder (line 450) | def image_encoder(self, img):
    method extract_img_feat (line 474) | def extract_img_feat(self, img, img_metas):

FILE: projects/occ_plugin/occupancy/detectors/ocfnet.py
  class OCFNet (line 23) | class OCFNet(BEVDepth):
    method __init__ (line 24) | def __init__(self,
    method image_encoder (line 103) | def image_encoder(self, img):
    method occ_encoder (line 123) | def occ_encoder(self, x):
    method flow_encoder (line 132) | def flow_encoder(self, x):
    method mat2pose_vec (line 140) | def mat2pose_vec(self, matrix: torch.Tensor):
    method pack_dbatch_and_dtime (line 166) | def pack_dbatch_and_dtime(self, x):
    method unpack_dbatch_and_dtime (line 172) | def unpack_dbatch_and_dtime(self, x, b, s):
    method extract_img_feat (line 177) | def extract_img_feat(self, img_inputs_seq, img_metas):
    method warp_features (line 232) | def warp_features(self, x, flow, tseq):
    method cumulative_warp_occ (line 327) | def cumulative_warp_occ(self, lifted_feature_seq, future_egomotion, mo...
    method extract_feat (line 342) | def extract_feat(self, img_inputs_seq, img_metas, future_egomotion):
    method forward_pts_train (line 395) | def forward_pts_train(
    method forward_flow_train (line 437) | def forward_flow_train(
    method forward_train (line 478) | def forward_train(self,
    method forward_test (line 566) | def forward_test(self,
    method simple_test (line 587) | def simple_test(self, img_metas, img_inputs_seq=None, rescale=False, p...
    method evaluate_occupancy_forecasting (line 662) | def evaluate_occupancy_forecasting(self, pred, gt, img_metas=None, sav...
    method find_instance_centers (line 714) | def find_instance_centers(self, center_prediction: torch.Tensor, conf_...
    method group_pixels (line 728) | def group_pixels(self, centers: torch.Tensor, offset_predictions: torc...
    method update_instance_ids (line 755) | def update_instance_ids(self, instance_seg, old_ids, new_ids):
    method make_instance_seg_consecutive (line 762) | def make_instance_seg_consecutive(self, instance_seg):
    method get_instance_segmentation_and_centers (line 769) | def get_instance_segmentation_and_centers(self,
    method flow_warp (line 797) | def flow_warp(self, occupancy, flow, mode='nearest', padding_mode='zer...
    method make_instance_id_temporally_consecutive (line 838) | def make_instance_id_temporally_consecutive(self, pred_inst, preds, ba...
    method predict_instance_segmentation (line 860) | def predict_instance_segmentation(self, pred_seg, pred_flow):
    method combine_mask (line 882) | def combine_mask(self, segmentation: torch.Tensor, instance: torch.Ten...
    method panoptic_metrics (line 915) | def panoptic_metrics(self, pred_segmentation, pred_instance, gt_segmen...
    method evaluate_instance_prediction (line 993) | def evaluate_instance_prediction(self, pred_seg, pred_flow, gt_seg, gt...
    method forward_dummy (line 1039) | def forward_dummy(self,
  function fast_hist (line 1061) | def fast_hist(pred, label, max_label=18):

FILE: projects/occ_plugin/occupancy/fuser/addfuse.py
  class AddFuser (line 11) | class AddFuser(nn.Module):
    method __init__ (line 12) | def __init__(self, in_channels, out_channels, dropout, input_modality=...
    method forward (line 40) | def forward(self, img_voxel_feats, pts_voxel_feats):

FILE: projects/occ_plugin/occupancy/fuser/convfuse.py
  class ConvFuser (line 11) | class ConvFuser(nn.Module):
    method __init__ (line 12) | def __init__(self, in_channels, out_channels) -> None:
    method forward (line 22) | def forward(self, img_voxel_feats, pts_voxel_feats):

FILE: projects/occ_plugin/occupancy/fuser/visfuse.py
  class VisFuser (line 13) | class VisFuser(nn.Module):
    method __init__ (line 14) | def __init__(self, in_channels, out_channels, norm_cfg=None) -> None:
    method forward (line 42) | def forward(self, img_voxel_feats, pts_voxel_feats):

FILE: projects/occ_plugin/occupancy/image2bev/ViewTransformerLSSBEVDepth.py
  function gen_dx_bx (line 20) | def gen_dx_bx(xbound, ybound, zbound):
  function cumsum_trick (line 26) | def cumsum_trick(x, geom_feats, ranks):
  class QuickCumsum (line 35) | class QuickCumsum(torch.autograd.Function):
    method forward (line 37) | def forward(ctx, x, geom_feats, ranks):
    method backward (line 54) | def backward(ctx, gradx, gradgeom):
  class ViewTransformerLiftSplatShoot (line 63) | class ViewTransformerLiftSplatShoot(BaseModule):
    method __init__ (line 64) | def __init__(self, grid_config=None, data_config=None,
    method get_depth_dist (line 100) | def get_depth_dist(self, x):
    method create_frustum (line 103) | def create_frustum(self):
    method get_geometry (line 116) | def get_geometry(self, rots_seq, trans_seq, intrins_seq, post_rots_seq...
    method voxel_pooling (line 145) | def voxel_pooling(self, geom_feats, x):
    method voxel_pooling_accelerated (line 190) | def voxel_pooling_accelerated(self, rots, trans, intrins, post_rots, p...
    method voxel_pooling_bevdepth (line 259) | def voxel_pooling_bevdepth(self, geom_feats, x):
    method forward (line 269) | def forward(self, input):
  class _ASPPModule (line 297) | class _ASPPModule(nn.Module):
    method __init__ (line 298) | def __init__(self, inplanes, planes, kernel_size, padding, dilation,
    method forward (line 313) | def forward(self, x):
    method _init_weight (line 319) | def _init_weight(self):
  class ASPP (line 328) | class ASPP(nn.Module):
    method __init__ (line 329) | def __init__(self, inplanes, mid_channels=256, norm_cfg=dict(type='BN2...
    method forward (line 374) | def forward(self, x):
    method _init_weight (line 392) | def _init_weight(self):
  class Mlp (line 401) | class Mlp(nn.Module):
    method __init__ (line 402) | def __init__(self,
    method forward (line 417) | def forward(self, x):
  class SELayer (line 426) | class SELayer(nn.Module):
    method __init__ (line 427) | def __init__(self, channels, act_layer=nn.ReLU, gate_layer=nn.Sigmoid):
    method forward (line 434) | def forward(self, x, x_se):
  class DepthNet (line 441) | class DepthNet(nn.Module):
    method __init__ (line 442) | def __init__(self, in_channels, mid_channels, context_channels,
    method forward (line 487) | def forward(self, x, mlp_input):
  class DepthAggregation (line 498) | class DepthAggregation(nn.Module):
    method __init__ (line 502) | def __init__(self, in_channels, mid_channels, out_channels, norm_cfg):
    method forward (line 547) | def forward(self, x):
  class ViewTransformerLSSBEVDepth (line 557) | class ViewTransformerLSSBEVDepth(ViewTransformerLiftSplatShoot):
    method __init__ (line 558) | def __init__(self, loss_depth_weight, cam_channels=27, loss_depth_reg_...
    method _forward_voxel_net (line 573) | def _forward_voxel_net(self, img_feat_with_depth):
    method get_mlp_input (line 586) | def get_mlp_input(self, rot, tran, intrin, post_rot, post_tran, bda=No...
    method get_downsampled_gt_depth (line 638) | def get_downsampled_gt_depth(self, gt_depths):
    method _prepare_depth_gt (line 662) | def _prepare_depth_gt(self, gt_depths):
    method get_depth_reg_loss (line 680) | def get_depth_reg_loss(self, depth_labels, depth_preds):
    method get_depth_loss (line 700) | def get_depth_loss(self, depth_labels, depth_preds):
    method forward (line 717) | def forward(self, input):
  class ConvBnReLU3D (line 747) | class ConvBnReLU3D(nn.Module):
    method __init__ (line 749) | def __init__(
    method forward (line 777) | def forward(self, x: torch.Tensor) -> torch.Tensor:
  class DepthNetStereo (line 782) | class DepthNetStereo(nn.Module):
    method __init__ (line 783) | def __init__(self,
    method forward (line 861) | def forward(self, x, mlp_input):
  class ViewTransformerLSSBEVStereo (line 884) | class ViewTransformerLSSBEVStereo(ViewTransformerLSSBEVDepth):
    method __init__ (line 885) | def __init__(self, num_ranges=4, use_mask=True, em_iteration=3,
    method depth_sampling (line 959) | def depth_sampling(self):
    method create_depth_sample_frustum (line 973) | def create_depth_sample_frustum(self, depth_sample, downsample_factor=...
    method homo_warping (line 1001) | def homo_warping(
    method _forward_mask (line 1081) | def _forward_mask(
    method _generate_cost_volume (line 1140) | def _generate_cost_volume(
    method _forward_stereo (line 1206) | def _forward_stereo(
    method forward (line 1368) | def forward(self, input):

FILE: projects/occ_plugin/occupancy/image2bev/ViewTransformerLSSVoxel.py
  function get_gpu_memory_usage (line 22) | def get_gpu_memory_usage():
  class ViewTransformerLiftSplatShootVoxel (line 28) | class ViewTransformerLiftSplatShootVoxel(ViewTransformerLSSBEVDepth):
    method __init__ (line 29) | def __init__(self, loss_depth_weight, loss_depth_type='bce', **kwargs):
    method get_downsampled_gt_depth (line 36) | def get_downsampled_gt_depth(self, gt_depths):
    method get_bce_depth_loss (line 63) | def get_bce_depth_loss(self, depth_labels, depth_preds):
    method get_klv_depth_loss (line 76) | def get_klv_depth_loss(self, depth_labels, depth_preds):
    method get_depth_loss (line 91) | def get_depth_loss(self, depth_labels, depth_preds):
    method voxel_pooling (line 103) | def voxel_pooling(self, geom_feats, x):
    method forward (line 129) | def forward(self, input):

FILE: projects/occ_plugin/occupancy/necks/fpn3d.py
  class FPN3D (line 14) | class FPN3D(BaseModule):
    method __init__ (line 27) | def __init__(self,
    method forward (line 69) | def forward(self, inputs):

FILE: projects/occ_plugin/occupancy/necks/second_fpn_3d.py
  class SECONDFPN3D (line 13) | class SECONDFPN3D(BaseModule):
    method __init__ (line 26) | def __init__(self,
    method forward (line 78) | def forward(self, x):

FILE: projects/occ_plugin/occupancy/voxel_encoder/sparse_lidar_enc.py
  function post_act_block (line 17) | def post_act_block(in_channels, out_channels, kernel_size, indice_key=No...
  class SparseBasicBlock (line 40) | class SparseBasicBlock(spconv.SparseModule):
    method __init__ (line 42) | def __init__(self, inplanes, planes, stride=1, norm_cfg=None, indice_k...
    method forward (line 55) | def forward(self, x):
  class SparseLiDAREnc4x (line 66) | class SparseLiDAREnc4x(nn.Module):
    method __init__ (line 67) | def __init__(self, input_channel, norm_cfg, base_channel, out_channel,
    method forward (line 103) | def forward(self, voxel_features, coors, batch_size):
  class SparseLiDAREnc8x (line 124) | class SparseLiDAREnc8x(nn.Module):
    method __init__ (line 125) | def __init__(self, input_channel, norm_cfg, base_channel, out_channel,
    method forward (line 162) | def forward(self, voxel_features, coors, batch_size):

FILE: projects/occ_plugin/ops/occ_pooling/OCC_Pool.py
  class QuickCumsum (line 8) | class QuickCumsum(torch.autograd.Function):
    method forward (line 10) | def forward(ctx, x, geom_feats, ranks):
    method backward (line 27) | def backward(ctx, gradx, gradgeom):
  class QuickCumsumCuda (line 37) | class QuickCumsumCuda(torch.autograd.Function):
    method forward (line 39) | def forward(ctx, x, geom_feats, ranks, B, D, H, W):
    method backward (line 64) | def backward(ctx, out_grad):
  function occ_pool (line 83) | def occ_pool(feats, coords, B, D, H, W):

FILE: projects/occ_plugin/ops/occ_pooling/src/occ_pool.cpp
  function occ_pool_forward (line 22) | at::Tensor occ_pool_forward(
  function occ_pool_backward (line 60) | at::Tensor occ_pool_backward(
  function PYBIND11_MODULE (line 89) | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {

FILE: projects/occ_plugin/utils/coordinate_transform.py
  function coarse_to_fine_coordinates (line 4) | def coarse_to_fine_coordinates(coarse_cor, ratio, topk=30000):
  function project_points_on_img (line 26) | def project_points_on_img(points, rots, trans, intrins, post_rots, post_...

FILE: projects/occ_plugin/utils/formating.py
  function cm_to_ious (line 4) | def cm_to_ious(cm):
  function format_results (line 17) | def format_results(mean_ious, return_dic=False):
  function format_iou_results (line 55) | def format_iou_results(mean_ious, return_dic=False):
  function format_vel_results (line 92) | def format_vel_results(mean_epe, return_dic=False):

FILE: projects/occ_plugin/utils/gaussian.py
  function gaussian_2d (line 7) | def gaussian_2d(shape, sigma=1):
  function draw_heatmap_gaussian (line 26) | def draw_heatmap_gaussian(heatmap, center, radius, k=1):
  function gaussian_radius (line 58) | def gaussian_radius(det_size, min_overlap=0.5):
  function generate_guassian_depth_target (line 90) | def generate_guassian_depth_target(depth, stride, cam_depth_range, const...

FILE: projects/occ_plugin/utils/geometry.py
  function convert_egopose_to_matrix_numpy (line 8) | def convert_egopose_to_matrix_numpy(trans, rot):
  function invert_matrix_egopose_numpy (line 18) | def invert_matrix_egopose_numpy(egopose):

FILE: projects/occ_plugin/utils/metric_util.py
  function fast_hist (line 7) | def fast_hist(pred, label, n):
  function per_class_iu (line 13) | def per_class_iu(hist):
  function fast_hist_crop (line 16) | def fast_hist_crop(output, target, unique_label):
  class SSCMetrics (line 24) | class SSCMetrics:
    method __init__ (line 25) | def __init__(self, class_names, ignore_idx=255, empty_idx=None):
    method hist_info (line 32) | def hist_info(self, n_cl, pred, gt):
    method compute_score (line 47) | def compute_score(hist, correct, labeled):
    method add_batch (line 57) | def add_batch(self, y_pred, y_true, nonsurface=None):
    method get_stats (line 80) | def get_stats(self):
    method reset (line 98) | def reset(self):
    method get_score_completion (line 118) | def get_score_completion(self, predict, target, nonempty=None):
    method get_score_semantic_and_completion (line 153) | def get_score_semantic_and_completion(self, predict, target, nonempty=...

FILE: projects/occ_plugin/utils/nusc_param.py
  function KL_sep (line 53) | def KL_sep(p, target):
  function geo_scal_loss (line 63) | def geo_scal_loss(pred, ssc_target):
  function sem_scal_loss (line 90) | def sem_scal_loss(pred, ssc_target):
  function CE_ssc_loss (line 137) | def CE_ssc_loss(pred, target, class_weights):

FILE: projects/occ_plugin/utils/semkitti.py
  function KL_sep (line 55) | def KL_sep(p, target):
  function geo_scal_loss (line 65) | def geo_scal_loss(pred, ssc_target, ignore_index=255, non_empty_idx=0):
  function sem_scal_loss (line 93) | def sem_scal_loss(pred, ssc_target, ignore_index=255):
  function CE_ssc_loss (line 140) | def CE_ssc_loss(pred, target, class_weights=None, ignore_index=255):
  function Smooth_L1_loss (line 151) | def Smooth_L1_loss(pred, target, ignore_index=255):
  function vel_loss (line 165) | def vel_loss(pred, gt):

FILE: projects/occ_plugin/utils/voxel_to_points.py
  function query_points_from_voxels (line 4) | def query_points_from_voxels(pred, gt, img_metas):

FILE: setup.py
  function make_cuda_ext (line 10) | def make_cuda_ext(name,

FILE: tools/gen_data/gen_depth_gt.py
  function map_pointcloud_to_image (line 12) | def map_pointcloud_to_image(
  function worker (line 90) | def worker(info):

FILE: tools/misc/browse_dataset.py
  function parse_args (line 15) | def parse_args():
  function build_data_cfg (line 53) | def build_data_cfg(config_path, skip_type, cfg_options):
  function to_depth_mode (line 79) | def to_depth_mode(points, bboxes):
  function show_det_data (line 90) | def show_det_data(idx, dataset, out_dir, filename, show=False):
  function show_seg_data (line 107) | def show_seg_data(idx, dataset, out_dir, filename, show=False):
  function show_proj_bbox_img (line 124) | def show_proj_bbox_img(idx,
  function main (line 184) | def main():

FILE: tools/misc/fuse_conv_bn.py
  function fuse_conv_bn (line 10) | def fuse_conv_bn(conv, bn):
  function fuse_module (line 26) | def fuse_module(m):
  function parse_args (line 47) | def parse_args():
  function main (line 57) | def main():

FILE: tools/misc/print_config.py
  function parse_args (line 6) | def parse_args():
  function main (line 16) | def main():

FILE: tools/misc/visualize_results.py
  function parse_args (line 9) | def parse_args():
  function main (line 21) | def main():

FILE: tools/test.py
  function parse_args (line 29) | def parse_args():
  function main (line 111) | def main():

FILE: tools/train.py
  function parse_args (line 34) | def parse_args():
  function main (line 95) | def main():

FILE: viz/viz_gt.py
  function viz_occ (line 16) | def viz_occ(occ, occ_mo, file_name, voxel_size, show_occ, show_time_chan...
  function main (line 91) | def main():

FILE: viz/viz_pred.py
  function viz_occ (line 11) | def viz_occ(occ, occ_mo, file_name, voxel_size, show_occ, show_time_chan...
  function main (line 86) | def main():

Download .json

Condensed preview — 110 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (581K chars).

[
  {
    "path": "LICENSE",
    "chars": 1065,
    "preview": "MIT License\n\nCopyright (c) 2023 HAOMO.AI\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\no"
  },
  {
    "path": "README.md",
    "chars": 10811,
    "preview": "# Cam4DOcc\n\nThe official code an data for the benchmark with baselines for our paper: [Cam4DOcc: Benchmark for Camera-On"
  },
  {
    "path": "data/README.md",
    "chars": 1541,
    "preview": "### Data Structure\n\nPlease link your [nuScenes V1.0 full dataset ](https://www.nuscenes.org/nuscenes#download) to the da"
  },
  {
    "path": "data/cam4docc/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "data/cam4docc/GMO/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "data/cam4docc/GMO_lyft/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "data/cam4docc/MMO/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "data/cam4docc/MMO_lyft/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "data/nuscenes/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "other_baselines/README.md",
    "chars": 5104,
    "preview": "## I. Static World\r\n\r\nThe static world model is built based on the identity hypothesis.\r\n\r\n```bash\r\ncd other_baselines/s"
  },
  {
    "path": "other_baselines/lifted_2d/eval_lifted_2d.py",
    "chars": 6017,
    "preview": "# Developed by Junyi Ma\n# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Application"
  },
  {
    "path": "other_baselines/static_world/eval_static_world.py",
    "chars": 3195,
    "preview": "# Developed by Junyi Ma\n# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Application"
  },
  {
    "path": "other_baselines/voxel_pcp/eval_voxel_pcp.py",
    "chars": 10254,
    "preview": "# Developed by Junyi Ma\n# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Application"
  },
  {
    "path": "projects/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "projects/configs/_base_/datasets/custom_lyft-3d.py",
    "chars": 4571,
    "preview": "# If point cloud range is changed, the models should also change their point\n# cloud range accordingly\npoint_cloud_range"
  },
  {
    "path": "projects/configs/_base_/datasets/custom_nus-3d.py",
    "chars": 4900,
    "preview": "# If point cloud range is changed, the models should also change their point\n# cloud range accordingly\npoint_cloud_range"
  },
  {
    "path": "projects/configs/_base_/datasets/custom_waymo-3d.py",
    "chars": 3914,
    "preview": "# dataset settings\n# D5 in the config name means the whole dataset is divided into 5 folds\n# We only use one fold for ef"
  },
  {
    "path": "projects/configs/_base_/default_runtime.py",
    "chars": 484,
    "preview": "checkpoint_config = dict(interval=1)\n# yapf:disable push\n# By default we use textlogger hook and tensorboard\n# For more "
  },
  {
    "path": "projects/configs/_base_/schedules/cosine.py",
    "chars": 536,
    "preview": "# This schedule is mainly used by models with dynamic voxelization\n# optimizer\nlr = 0.003  # max learning rate\noptimizer"
  },
  {
    "path": "projects/configs/_base_/schedules/cyclic_20e.py",
    "chars": 797,
    "preview": "# For nuScenes dataset, we usually evaluate the model at the end of training.\n# Since the models are trained by 24 epoch"
  },
  {
    "path": "projects/configs/_base_/schedules/cyclic_40e.py",
    "chars": 1572,
    "preview": "# The schedule is usually used by models trained on KITTI dataset\n\n# The learning rate set in the cyclic schedule is the"
  },
  {
    "path": "projects/configs/_base_/schedules/mmdet_schedule_1x.py",
    "chars": 319,
    "preview": "# optimizer\noptimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)\noptimizer_config = dict(grad_clip=N"
  },
  {
    "path": "projects/configs/_base_/schedules/schedule_2x.py",
    "chars": 459,
    "preview": "# optimizer\n# This schedule is mainly used by models on nuScenes dataset\noptimizer = dict(type='AdamW', lr=0.001, weight"
  },
  {
    "path": "projects/configs/_base_/schedules/schedule_3x.py",
    "chars": 399,
    "preview": "# optimizer\n# This schedule is mainly used by models on indoor dataset,\n# e.g., VoteNet on SUNRGBD and ScanNet\nlr = 0.00"
  },
  {
    "path": "projects/configs/_base_/schedules/seg_cosine_150e.py",
    "chars": 361,
    "preview": "# optimizer\n# This schedule is mainly used on S3DIS dataset in segmentation task\noptimizer = dict(type='SGD', lr=0.2, we"
  },
  {
    "path": "projects/configs/_base_/schedules/seg_cosine_200e.py",
    "chars": 349,
    "preview": "# optimizer\n# This schedule is mainly used on ScanNet dataset in segmentation task\noptimizer = dict(type='Adam', lr=0.00"
  },
  {
    "path": "projects/configs/_base_/schedules/seg_cosine_50e.py",
    "chars": 347,
    "preview": "# optimizer\n# This schedule is mainly used on S3DIS dataset in segmentation task\noptimizer = dict(type='Adam', lr=0.001,"
  },
  {
    "path": "projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.1.py",
    "chars": 11807,
    "preview": "# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications\n# https://github.com/h"
  },
  {
    "path": "projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.1_lyft.py",
    "chars": 11770,
    "preview": "# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications\n# https://github.com/h"
  },
  {
    "path": "projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.2.py",
    "chars": 11902,
    "preview": "# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications\n# https://github.com/h"
  },
  {
    "path": "projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.2_lyft.py",
    "chars": 11920,
    "preview": "# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications\n# https://github.com/h"
  },
  {
    "path": "projects/configs/datasets/custom_nus-3d.py",
    "chars": 4900,
    "preview": "# If point cloud range is changed, the models should also change their point\n# cloud range accordingly\npoint_cloud_range"
  },
  {
    "path": "projects/occ_plugin/__init__.py",
    "chars": 333,
    "preview": "from .core.evaluation.eval_hooks import OccDistEvalHook, OccEvalHook\nfrom .core.evaluation.efficiency_hooks import OccEf"
  },
  {
    "path": "projects/occ_plugin/core/__init__.py",
    "chars": 51,
    "preview": "from .evaluation import *\nfrom .visualizer import *"
  },
  {
    "path": "projects/occ_plugin/core/evaluation/__init__.py",
    "chars": 100,
    "preview": "from .eval_hooks import OccDistEvalHook, OccEvalHook\nfrom .efficiency_hooks import OccEfficiencyHook"
  },
  {
    "path": "projects/occ_plugin/core/evaluation/efficiency_hooks.py",
    "chars": 3834,
    "preview": "import copy\nfrom mmcv.runner import HOOKS, Hook\nimport time\ntry:\n    from mmcv.cnn import get_model_complexity_info\nexce"
  },
  {
    "path": "projects/occ_plugin/core/evaluation/eval_hooks.py",
    "chars": 2752,
    "preview": "\n# Note: Considering that MMCV's EvalHook updated its interface in V1.3.16,\n# in order to avoid strong version dependenc"
  },
  {
    "path": "projects/occ_plugin/core/visualizer/__init__.py",
    "chars": 30,
    "preview": "from .show_occ import save_occ"
  },
  {
    "path": "projects/occ_plugin/core/visualizer/show_occ.py",
    "chars": 2829,
    "preview": "\nimport torch.nn.functional as F\nimport torch\nimport numpy as np\nfrom os import path as osp\nimport os\n\ndef save_occ(pred"
  },
  {
    "path": "projects/occ_plugin/datasets/__init__.py",
    "chars": 256,
    "preview": "from .nuscenes_dataset import CustomNuScenesDataset\nfrom .cam4docc_dataset import Cam4DOccDataset\nfrom .cam4docc_lyft_da"
  },
  {
    "path": "projects/occ_plugin/datasets/builder.py",
    "chars": 5897,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nimport copy\nimport platform\nimport random\nfrom functools import partial\n"
  },
  {
    "path": "projects/occ_plugin/datasets/cam4docc_dataset.py",
    "chars": 20036,
    "preview": "# Developed by Junyi Ma based on the codebase of OpenOccupancy and PowerBEV\n# Cam4DOcc: Benchmark for Camera-Only 4D Occ"
  },
  {
    "path": "projects/occ_plugin/datasets/cam4docc_lyft_dataset.py",
    "chars": 24771,
    "preview": "# Developed by Junyi Ma based on the codebase of OpenOccupancy and PowerBEV\n# Cam4DOcc: Benchmark for Camera-Only 4D Occ"
  },
  {
    "path": "projects/occ_plugin/datasets/nuscenes_dataset.py",
    "chars": 6833,
    "preview": "import copy\nimport numpy as np\nfrom mmdet.datasets import DATASETS\nfrom mmdet3d.datasets import NuScenesDataset\nimport m"
  },
  {
    "path": "projects/occ_plugin/datasets/pipelines/__init__.py",
    "chars": 737,
    "preview": "from .transform_3d import (\n    PadMultiViewImage, NormalizeMultiviewImage, \n    PhotoMetricDistortionMultiViewImage, Cu"
  },
  {
    "path": "projects/occ_plugin/datasets/pipelines/formating.py",
    "chars": 2204,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\r\nimport numpy as np\r\nfrom mmcv.parallel import DataContainer as DC\r\n\r\nfr"
  },
  {
    "path": "projects/occ_plugin/datasets/pipelines/loading_bevdet.py",
    "chars": 22756,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\n\nimport mmcv\nimport numpy as np\nfrom mmdet.datasets.builder import PIPEL"
  },
  {
    "path": "projects/occ_plugin/datasets/pipelines/loading_instance.py",
    "chars": 24148,
    "preview": "# Developed by Junyi Ma based on the codebase of PowerBEV\n# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting"
  },
  {
    "path": "projects/occ_plugin/datasets/pipelines/loading_occupancy.py",
    "chars": 8924,
    "preview": "# Developed by Junyi Ma based on the codebase of OpenOccupancy\n# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Foreca"
  },
  {
    "path": "projects/occ_plugin/datasets/pipelines/transform_3d.py",
    "chars": 17244,
    "preview": "import numpy as np\nfrom numpy import random\nimport mmcv\nfrom mmdet.datasets.builder import PIPELINES\nfrom mmcv.parallel "
  },
  {
    "path": "projects/occ_plugin/datasets/samplers/__init__.py",
    "chars": 148,
    "preview": "from .group_sampler import DistributedGroupSampler\nfrom .distributed_sampler import DistributedSampler\nfrom .sampler imp"
  },
  {
    "path": "projects/occ_plugin/datasets/samplers/distributed_sampler.py",
    "chars": 1371,
    "preview": "import math\n\nimport torch\nfrom torch.utils.data import DistributedSampler as _DistributedSampler\nfrom .sampler import SA"
  },
  {
    "path": "projects/occ_plugin/datasets/samplers/group_sampler.py",
    "chars": 3958,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nimport math\n\nimport numpy as np\nimport torch\nfrom mmcv.runner import get"
  },
  {
    "path": "projects/occ_plugin/datasets/samplers/sampler.py",
    "chars": 189,
    "preview": "from mmcv.utils.registry import Registry, build_from_cfg\r\n\r\nSAMPLER = Registry('sampler')\r\n\r\n\r\ndef build_sampler(cfg, de"
  },
  {
    "path": "projects/occ_plugin/occupancy/__init__.py",
    "chars": 180,
    "preview": "from .dense_heads import *\r\nfrom .detectors import *\r\nfrom .backbones import *\r\nfrom .image2bev import *\r\nfrom .voxel_en"
  },
  {
    "path": "projects/occ_plugin/occupancy/apis/__init__.py",
    "chars": 126,
    "preview": "from .train import custom_train_model\nfrom .mmdet_train import custom_train_detector\n# from .test import custom_multi_gp"
  },
  {
    "path": "projects/occ_plugin/occupancy/apis/mmdet_train.py",
    "chars": 4030,
    "preview": "# ---------------------------------------------\n# Copyright (c) OpenMMLab. All rights reserved.\n# ----------------------"
  },
  {
    "path": "projects/occ_plugin/occupancy/apis/test.py",
    "chars": 6038,
    "preview": "# ---------------------------------------------\n# Copyright (c) OpenMMLab. All rights reserved.\n# ----------------------"
  },
  {
    "path": "projects/occ_plugin/occupancy/apis/train.py",
    "chars": 1630,
    "preview": "from .mmdet_train import custom_train_detector\nfrom mmseg.apis import train_segmentor\nfrom mmdet.apis import train_detec"
  },
  {
    "path": "projects/occ_plugin/occupancy/backbones/__init__.py",
    "chars": 70,
    "preview": "from .resnet3d import CustomResNet3D\nfrom .pred_block import Predictor"
  },
  {
    "path": "projects/occ_plugin/occupancy/backbones/pred_block.py",
    "chars": 3143,
    "preview": "# Developed by Junyi Ma\n# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Application"
  },
  {
    "path": "projects/occ_plugin/occupancy/backbones/resnet3d.py",
    "chars": 7358,
    "preview": "import math\nfrom functools import partial\nfrom mmdet3d.models.builder import BACKBONES\nfrom mmcv.cnn import build_conv_l"
  },
  {
    "path": "projects/occ_plugin/occupancy/dense_heads/__init__.py",
    "chars": 61,
    "preview": "from .occ_head import OccHead\nfrom .flow_head import FlowHead"
  },
  {
    "path": "projects/occ_plugin/occupancy/dense_heads/flow_head.py",
    "chars": 7132,
    "preview": "# Developed by Junyi Ma\n# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Application"
  },
  {
    "path": "projects/occ_plugin/occupancy/dense_heads/lovasz_softmax.py",
    "chars": 11724,
    "preview": "# -*- coding:utf-8 -*-\n# author: Xinge\n\n\n\"\"\"\nLovasz-Softmax and Jaccard hinge loss in PyTorch\nMaxim Berman 2018 ESAT-PSI"
  },
  {
    "path": "projects/occ_plugin/occupancy/dense_heads/occ_head.py",
    "chars": 8792,
    "preview": "# Developed by Junyi Ma based on the codebase of OpenOccupancy\n# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Foreca"
  },
  {
    "path": "projects/occ_plugin/occupancy/dense_heads/utils.py",
    "chars": 4395,
    "preview": "# borrowed from https://github.com/GuoPingPan/RPVNet/blob/main/core/models/utils/utils.py\n\nimport time\nimport numpy as n"
  },
  {
    "path": "projects/occ_plugin/occupancy/detectors/__init__.py",
    "chars": 28,
    "preview": "from .ocfnet import OCFNet\r\n"
  },
  {
    "path": "projects/occ_plugin/occupancy/detectors/bevdepth.py",
    "chars": 29106,
    "preview": "# Copyright (c) Phigent Robotics. All rights reserved.\nimport math\nimport torch\nfrom mmcv.runner import force_fp32\nimpor"
  },
  {
    "path": "projects/occ_plugin/occupancy/detectors/ocfnet.py",
    "chars": 43849,
    "preview": "# Developed by Junyi Ma based on the codebase of OpenOccupancy and PowerBEV\n# Cam4DOcc: Benchmark for Camera-Only 4D Occ"
  },
  {
    "path": "projects/occ_plugin/occupancy/fuser/__init__.py",
    "chars": 92,
    "preview": "from .addfuse import AddFuser\nfrom .visfuse import VisFuser\nfrom .convfuse import ConvFuser\n"
  },
  {
    "path": "projects/occ_plugin/occupancy/fuser/addfuse.py",
    "chars": 1838,
    "preview": "import random\nfrom typing import List\n\nimport torch\nfrom torch import nn\n\nfrom mmdet3d.models.builder import FUSION_LAYE"
  },
  {
    "path": "projects/occ_plugin/occupancy/fuser/convfuse.py",
    "chars": 687,
    "preview": "import random\nfrom typing import List\n\nimport torch\nfrom torch import nn\n\nfrom mmdet3d.models.builder import FUSION_LAYE"
  },
  {
    "path": "projects/occ_plugin/occupancy/fuser/visfuse.py",
    "chars": 1718,
    "preview": "import random\nfrom typing import List\n\nimport torch\nfrom torch import nn\nimport torch.nn.functional as F\n\nfrom mmdet3d.m"
  },
  {
    "path": "projects/occ_plugin/occupancy/image2bev/ViewTransformerLSSBEVDepth.py",
    "chars": 58150,
    "preview": "# Copyright (c) Phigent Robotics. All rights reserved.\nimport math\nimport torch\nimport torch.nn as nn\nfrom mmcv.runner i"
  },
  {
    "path": "projects/occ_plugin/occupancy/image2bev/ViewTransformerLSSVoxel.py",
    "chars": 6074,
    "preview": "# Copyright (c) Phigent Robotics. All rights reserved.\nimport math\nimport torch\nimport torch.nn as nn\nfrom mmcv.runner i"
  },
  {
    "path": "projects/occ_plugin/occupancy/image2bev/__init__.py",
    "chars": 138,
    "preview": "from .ViewTransformerLSSBEVDepth import ViewTransformerLSSBEVDepth\nfrom .ViewTransformerLSSVoxel import ViewTransformerL"
  },
  {
    "path": "projects/occ_plugin/occupancy/necks/__init__.py",
    "chars": 63,
    "preview": "from .second_fpn_3d import SECONDFPN3D\nfrom .fpn3d import FPN3D"
  },
  {
    "path": "projects/occ_plugin/occupancy/necks/fpn3d.py",
    "chars": 3831,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nimport numpy as np\nimport torch\nfrom mmcv.cnn import build_conv_layer, b"
  },
  {
    "path": "projects/occ_plugin/occupancy/necks/second_fpn_3d.py",
    "chars": 3579,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nimport numpy as np\nimport torch\nfrom mmcv.cnn import build_conv_layer, b"
  },
  {
    "path": "projects/occ_plugin/occupancy/voxel_encoder/__init__.py",
    "chars": 64,
    "preview": "from .sparse_lidar_enc import SparseLiDAREnc4x, SparseLiDAREnc8x"
  },
  {
    "path": "projects/occ_plugin/occupancy/voxel_encoder/sparse_lidar_enc.py",
    "chars": 6806,
    "preview": "import math\nfrom functools import partial\nfrom mmcv.cnn import build_conv_layer, build_norm_layer, build_upsample_layer\n"
  },
  {
    "path": "projects/occ_plugin/ops/__init__.py",
    "chars": 26,
    "preview": "from .occ_pooling import *"
  },
  {
    "path": "projects/occ_plugin/ops/occ_pooling/OCC_Pool.py",
    "chars": 2672,
    "preview": "import torch\n\nfrom projects.occ_plugin.ops.occ_pooling import occ_pool_ext\n\n__all__ = [\"occ_pool\"]\n\n\nclass QuickCumsum(t"
  },
  {
    "path": "projects/occ_plugin/ops/occ_pooling/__init__.py",
    "chars": 30,
    "preview": "from .OCC_Pool import occ_pool"
  },
  {
    "path": "projects/occ_plugin/ops/occ_pooling/src/occ_pool.cpp",
    "chars": 3374,
    "preview": "#include <torch/torch.h>\n#include <c10/cuda/CUDAGuard.h>\n\n// CUDA function declarations\nvoid occ_pool(int b, int d, int "
  },
  {
    "path": "projects/occ_plugin/ops/occ_pooling/src/occ_pool_cuda.cu",
    "chars": 4366,
    "preview": "#include <stdio.h>\n#include <stdlib.h>\n\n/*\n  Function: pillar pooling\n  Args:\n    b                : batch size\n    d   "
  },
  {
    "path": "projects/occ_plugin/utils/__init__.py",
    "chars": 270,
    "preview": "from .formating import cm_to_ious, format_results\nfrom .metric_util import per_class_iu, fast_hist_crop\nfrom .coordinate"
  },
  {
    "path": "projects/occ_plugin/utils/coordinate_transform.py",
    "chars": 2323,
    "preview": "\nimport torch\n\ndef coarse_to_fine_coordinates(coarse_cor, ratio, topk=30000):\n    \"\"\"\n    Args:\n        coarse_cor (torc"
  },
  {
    "path": "projects/occ_plugin/utils/formating.py",
    "chars": 3012,
    "preview": "from prettytable import PrettyTable\nimport numpy as np\n\ndef cm_to_ious(cm):\n    # SC：[TN FP \\n FN TP]\n    mean_ious = []"
  },
  {
    "path": "projects/occ_plugin/utils/gaussian.py",
    "chars": 4386,
    "preview": "import numpy as np\nimport torch\nimport torch.nn.functional as F\nfrom torch.distributions import Normal\nimport pdb\n\ndef g"
  },
  {
    "path": "projects/occ_plugin/utils/geometry.py",
    "chars": 867,
    "preview": "import numpy as np\nimport PIL\nimport torch\nimport torch.nn.functional as F\nfrom pyquaternion import Quaternion\n\n\ndef con"
  },
  {
    "path": "projects/occ_plugin/utils/metric_util.py",
    "chars": 7138,
    "preview": "# -*- coding:utf-8 -*-\n# author: Xinge\n# @file: metric_util.py \n\nimport numpy as np\n\ndef fast_hist(pred, label, n):\n    "
  },
  {
    "path": "projects/occ_plugin/utils/nusc_param.py",
    "chars": 4571,
    "preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport numpy as np\n\n\nnusc_class_frequencies = np.arra"
  },
  {
    "path": "projects/occ_plugin/utils/semkitti.py",
    "chars": 4805,
    "preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport numpy as np\n\nsemantic_kitti_class_frequencies "
  },
  {
    "path": "projects/occ_plugin/utils/voxel_to_points.py",
    "chars": 1758,
    "preview": "import open3d as o3d\nimport numpy as np\n\ndef query_points_from_voxels(pred, gt, img_metas):\n    # pred, [tensor of shape"
  },
  {
    "path": "run.sh",
    "chars": 177,
    "preview": "echo \"-------------\"\necho \"load config from local path:\" $1\nif [ -f $1 ]; then\n  config=$1\nelse\n  echo \"need a config fi"
  },
  {
    "path": "run_eval.sh",
    "chars": 222,
    "preview": "echo \"-------------\"\necho \"load config from local path:\" $1\nif [ -f $1 ]; then\n  config=$1\nelse\n  echo \"need a config fi"
  },
  {
    "path": "setup.py",
    "chars": 2528,
    "preview": "from setuptools import find_packages, setup\n\nimport os\nimport torch\nfrom os import path as osp\nfrom torch.utils.cpp_exte"
  },
  {
    "path": "tools/dist_test.sh",
    "chars": 296,
    "preview": "#!/usr/bin/env bash\n\nCONFIG=$1\nCHECKPOINT=$2\nGPUS=$3\nPORT=${PORT:-29504}\n\nPYTHONPATH=\"$(dirname $0)/..\":$PYTHONPATH \\\npy"
  },
  {
    "path": "tools/dist_train.sh",
    "chars": 516,
    "preview": "#!/usr/bin/env bash\n\nCONFIG=$1\nGPUS=$2\nNNODES=${NNODES:-1}\nNODE_RANK=${NODE_RANK:-0}\nPORT=${PORT:-29501}\nMASTER_ADDR=${M"
  },
  {
    "path": "tools/gen_data/gen_depth_gt.py",
    "chars": 5288,
    "preview": "import os\nfrom multiprocessing import Pool\n\nimport mmcv\nimport numpy as np\nfrom nuscenes.utils.data_classes import Lidar"
  },
  {
    "path": "tools/misc/browse_dataset.py",
    "chars": 8551,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nimport argparse\nimport numpy as np\nimport warnings\nfrom mmcv import Conf"
  },
  {
    "path": "tools/misc/fuse_conv_bn.py",
    "chars": 2240,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nimport argparse\nimport torch\nfrom mmcv.runner import save_checkpoint\nfro"
  },
  {
    "path": "tools/misc/print_config.py",
    "chars": 638,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nimport argparse\nfrom mmcv import Config, DictAction\n\n\ndef parse_args():\n"
  },
  {
    "path": "tools/misc/visualize_results.py",
    "chars": 1447,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nimport argparse\nimport mmcv\nfrom mmcv import Config\n\nfrom mmdet3d.datase"
  },
  {
    "path": "tools/test.py",
    "chars": 10588,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\n#  Modified by Junyi Ma, following OpenOccupancy of Zhiqi Li\n\nimport arg"
  },
  {
    "path": "tools/train.py",
    "chars": 7056,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\n# Cam4DOcc refers to OpenOccupancy of Zhiqi Li\n \nfrom __future__ import "
  },
  {
    "path": "viz/viz_gt.py",
    "chars": 4176,
    "preview": "# Developed by Junyi Ma\n# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Application"
  },
  {
    "path": "viz/viz_pred.py",
    "chars": 4071,
    "preview": "from tqdm import tqdm\nimport pickle\nimport numpy as np\nfrom mayavi import mlab\nfrom tqdm import trange\nimport os\nfrom xv"
  }
]

About this extraction

This page contains the full source code of the haomo-ai/Cam4DOcc GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 110 files (543.7 KB), approximately 143.1k tokens, and a symbol index with 438 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo