Showing preview only (429K chars total). Download the full file or copy to clipboard to get everything.
Repository: MCG-NJU/SparseOcc
Branch: main
Commit: af4d9df83bfd
Files: 58
Total size: 409.0 KB
Directory structure:
gitextract_8j2gi4im/
├── .gitignore
├── LICENSE
├── README.md
├── configs/
│ ├── r50_nuimg_704x256_8f.py
│ ├── r50_nuimg_704x256_8f_60e.py
│ ├── r50_nuimg_704x256_8f_openocc.py
│ └── r50_nuimg_704x256_8f_pano.py
├── gen_instance_info.py
├── gen_sweep_info.py
├── lib/
│ └── dvr/
│ ├── dvr.cpp
│ └── dvr.cu
├── loaders/
│ ├── __init__.py
│ ├── builder.py
│ ├── ego_pose_dataset.py
│ ├── nuscenes_dataset.py
│ ├── nuscenes_occ_dataset.py
│ ├── old_metrics.py
│ ├── pipelines/
│ │ ├── __init__.py
│ │ ├── loading.py
│ │ └── transforms.py
│ ├── ray_metrics.py
│ └── ray_pq.py
├── models/
│ ├── __init__.py
│ ├── backbones/
│ │ ├── __init__.py
│ │ └── vovnet.py
│ ├── bbox/
│ │ ├── __init__.py
│ │ ├── assigners/
│ │ │ ├── __init__.py
│ │ │ └── hungarian_assigner_3d.py
│ │ ├── coders/
│ │ │ ├── __init__.py
│ │ │ └── nms_free_coder.py
│ │ ├── match_costs/
│ │ │ ├── __init__.py
│ │ │ └── match_cost.py
│ │ └── utils.py
│ ├── checkpoint.py
│ ├── csrc/
│ │ ├── __init__.py
│ │ ├── msmv_sampling/
│ │ │ ├── msmv_sampling.cpp
│ │ │ ├── msmv_sampling.h
│ │ │ ├── msmv_sampling_backward.cu
│ │ │ └── msmv_sampling_forward.cu
│ │ ├── setup.py
│ │ └── wrapper.py
│ ├── loss_utils.py
│ ├── matcher.py
│ ├── sparse_voxel_decoder.py
│ ├── sparsebev_head.py
│ ├── sparsebev_sampling.py
│ ├── sparsebev_transformer.py
│ ├── sparseocc.py
│ ├── sparseocc_head.py
│ ├── sparseocc_transformer.py
│ └── utils.py
├── old_metrics.py
├── ray_metrics.py
├── timing.py
├── train.py
├── utils.py
├── val.py
└── viz_prediction.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
# OS generated files
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db
# Compiled source
build
debug
Debug
release
Release
x64
*.so
*.whl
# VS project files
*.sln
*.vcxproj
*.vcxproj.filters
*.vcxproj.user
*.rc
.vs
# Byte-compiled / optimized / DLL files
*__pycache__*
*.py[cod]
*$py.class
# Distribution / packaging
.Python
build
develop-eggs
dist
downloads
# IDE
.idea
.vscode
pyrightconfig.json
# Custom
data
outputs
prediction
submission
checkpoints
pretrain
ckpts
occ_result
wandb
================================================
FILE: LICENSE
================================================
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
================================================
FILE: README.md
================================================
# SparseOcc
This is the official PyTorch implementation for our paper:
> [**Fully Sparse 3D Panoptic Occupancy Prediction**](https://arxiv.org/abs/2312.17118)<br>
> :school: Presented by Nanjing University and Shanghai AI Lab<br>
> :email: Primary contact: Haisong Liu (afterthat97@gmail.com)<br>
> :trophy: [CVPR 2024 Autonomous Driving Challenge - Occupancy and Flow](https://opendrivelab.com/challenge2024/#occupancy_and_flow)<br>
> :book: 中文解读(官方):https://zhuanlan.zhihu.com/p/709576252<br>
> :book: 中文解读(第三方): [AIming](https://zhuanlan.zhihu.com/p/691549750), [自动驾驶之心](https://zhuanlan.zhihu.com/p/675811281)
## :warning: Important Notes
There is another concurrent project titled *''SparseOcc: Rethinking sparse latent representation for vision-based semantic occupancy prediction''* by Tang et al., which shares the same name SparseOcc with ours. However, this repository is **unrelated** to the aforementioned paper.
If you cite our research, please ensure that you reference the correct version (arXiv **2312.17118**, authored by **Liu et al.**):
```
@article{liu2023fully,
title={Fully sparse 3d panoptic occupancy prediction},
author={Liu, Haisong and Wang, Haiguang and Chen, Yang and Yang, Zetong and Zeng, Jia and Chen, Li and Wang, Limin},
journal={arXiv preprint arXiv:2312.17118},
year={2023}
}
```
> In arXiv 2312.17118v3, we removed the word "panoptic" from the title. However, Google Scholar's database has not been updated and still shows the old one. Therefore, we still recommend citing the old title - "Fully sparse 3d panoptic occupancy prediction" - so that Google Scholar can index it correctly. Thank you all.
## News
* **2024-07-19**: We released an updated version of SparseOcc on [arXiv](https://arxiv.org/abs/2312.17118). All charts and colors have been carefully adjusted. Delete the old version and download the new one!
* **2024-07-01**: SparseOcc is accepted to ECCV 2024.
* **2024-06-27**: SparseOcc v1.1 is released. In this change, we introduce BEV data augmentation (BDA) and Lovasz-Softmax loss to further enhance the performance. Compared with [v1.0](https://github.com/MCG-NJU/SparseOcc/tree/v1.0) (35.0 RayIoU with 48 epochs), SparseOcc v1.1 can achieve 36.8 RayIoU with 24 epochs!
* **2024-05-29**: We add support for [OpenOcc v2](configs/r50_nuimg_704x256_8f_openocc.py) dataset (without occupancy flow).
* **2024-04-11**: The panoptic version of SparseOcc ([configs/r50_nuimg_704x256_8f_pano.py](configs/r50_nuimg_704x256_8f_pano.py)) is released.
* **2024-04-09**: An updated arXiv version [https://arxiv.org/abs/2312.17118v3](https://arxiv.org/abs/2312.17118v3) has been released.
* **2024-03-31**: We release the code and pretrained weights.
* **2023-12-30**: We release the paper.
## Highlights
**New model**:1st_place_medal:: SparseOcc initially reconstructs a sparse 3D representation from visual inputs and subsequently predicts semantic/instance occupancy from the 3D sparse representation by sparse queries.

**New evaluation metric**:chart_with_upwards_trend:: We design a thoughtful ray-based evaluation metric, namely RayIoU, to solve the inconsistency penalty along depths raised in traditional voxel-level mIoU criteria.

Some FAQs from the community about the evaluation metrics:
1. **Why does training with visible masks result in significant improvements in the old mIoU metric, but not in the new RayIoU metric?** As mentioned in the paper, when using the visible mask during training, the area behind the surface won't be supervised, so the model tends to fill this area with duplicated predictions, leading to a thicker surface. The old metric inconsistently penalizes along the depth axis when the prediction has a thick surface. Thus, this ''imporovement'' is mainly due to the vulnerability of old metric.
2. **Why SparseOcc cannot exploit the vulnerability of the old metrics?** As SparseOcc employs a fully sparse architecture, it always predicts a thin surface. Thus, there are two ways for a fair comparison: (a) use the old metric, but all methods must predict a thin surface, which implies they cannot use the visible mask during training; (b) use RayIoU, as it is more reasonable and can fairly compare thick or thin surface. Our method achieves SOTA performance on both cases.
3. **Does RayIoU overlook interior reconstruction?** Firstly, we are unable to obtain the interior occupancy ground-truth. This is because the ground-truth is derived from voxelizing LiDAR point clouds, and LiDARs are only capable of scanning the thin surface of an object. Secondly, the query ray in RayIoU can originate from any position within the scene (see the figure above). This allows it to evaluate the overall reconstruction performance, unlike depth estimation. We would like to emphasize that the evaluation logic of RayIoU aligns with the process of ground-truth generation.
If you have other questions, feel free to contact me (Haisong Liu, afterthat97@gmail.com).
## Model Zoo
These results are from our latest version, v1.1, which outperforms the results reported in the paper. Additionally, our implementation differs slightly from the original paper. If you wish to reproduce the paper exactly, please refer to the [v1.0](https://github.com/MCG-NJU/SparseOcc/tree/v1.0) tag.
| Setting | Epochs | Training Cost | RayIoU | RayPQ | FPS | Weights |
|----------|:--------:|:-------------:|:------:|:-----:|:---:|:-------:|
| [r50_nuimg_704x256_8f](configs/r50_nuimg_704x256_8f.py) | 24 | 15h, ~12GB | 36.8 | - | 17.3 | [github](https://github.com/MCG-NJU/SparseOcc/releases/download/v1.1/sparseocc_r50_nuimg_704x256_8f_24e_v1.1.pth) |
| [r50_nuimg_704x256_8f_60e](configs/r50_nuimg_704x256_8f_60e.py) | 60 | 37h, ~12GB | 37.7 | - | 17.3 | [github](https://github.com/MCG-NJU/SparseOcc/releases/download/v1.1/sparseocc_r50_nuimg_704x256_8f_60e_v1.1.pth) |
| [r50_nuimg_704x256_8f_pano](configs/r50_nuimg_704x256_8f_pano.py) | 24 | 15h, ~12GB | 35.9 | 14.0 | 17.3 | [github](https://github.com/MCG-NJU/SparseOcc/releases/download/v1.1/sparseocc_r50_nuimg_704x256_8f_pano_24e_v1.1.pth) |
* The backbone is pretrained on [nuImages](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/nuimages_semseg/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth). Download the weights to `pretrain/xxx.pth` before you start training.
* FPS is measured with Intel(R) Xeon(R) Platinum 8369B CPU and NVIDIA A100-SXM4-80GB GPU (PyTorch `fp32` backend, including data loading).
* We will release more settings in the future.
## Environment
> The requirements are the same as those of [SparseBEV](https://github.com/MCG-NJU/SparseBEV).
Install PyTorch 2.0 + CUDA 11.8:
```
conda create -n sparseocc python=3.8
conda activate sparseocc
conda install pytorch==2.0.0 torchvision==0.15.0 pytorch-cuda=11.8 -c pytorch -c nvidia
```
Install other dependencies:
```
pip install openmim
mim install mmcv-full==1.6.0
mim install mmdet==2.28.2
mim install mmsegmentation==0.30.0
mim install mmdet3d==1.0.0rc6
pip install setuptools==59.5.0
pip install numpy==1.23.5
```
Install turbojpeg and pillow-simd to speed up data loading (optional but important):
```
sudo apt-get update
sudo apt-get install -y libturbojpeg
pip install pyturbojpeg
pip uninstall pillow
pip install pillow-simd==9.0.0.post1
```
Compile CUDA extensions:
```
cd models/csrc
python setup.py build_ext --inplace
```
## Prepare Dataset
> The first two steps are the same as those of [SparseBEV](https://github.com/MCG-NJU/SparseBEV).
1. Download nuScenes from [https://www.nuscenes.org/nuscenes](https://www.nuscenes.org/nuscenes), put it to `data/nuscenes` and preprocess it with [mmdetection3d](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0rc6).
2. Download the generated info file from [gdrive](https://drive.google.com/file/d/1uyoUuSRIVScrm_CUpge6V_UzwDT61ODO/view?usp=sharing) and unzip it. These `*.pkl` files can also be generated with our script: `gen_sweep_info.py`.
3. Download Occ3D-nuScenes occupancy GT from [gdrive](https://drive.google.com/file/d/1kiXVNSEi3UrNERPMz_CfiJXKkgts_5dY/view?usp=drive_link), unzip it, and save it to `data/nuscenes/occ3d`.
4. Folder structure:
```
data/nuscenes
├── maps
├── nuscenes_infos_test_sweep.pkl
├── nuscenes_infos_train_sweep.pkl
├── nuscenes_infos_val_sweep.pkl
├── samples
├── sweeps
├── v1.0-test
└── v1.0-trainval
└── occ3d
├── scene-0001
│ ├── 0037a705a2e04559b1bba6c01beca1cf
│ │ └── labels.npz
│ ├── 026155aa1c554e2f87914ec9ba80acae
│ │ └── labels.npz
...
```
5. (Optional) Generate the panoptic occupancy ground truth with `gen_instance_info.py`. The panoptic version of Occ3D will be saved to `data/nuscenes/occ3d_panoptic`.
## Training
Train SparseOcc with 8 GPUs:
```
torchrun --nproc_per_node 8 train.py --config configs/sparseocc_r50_nuimg_704x256_8f.py
```
Train SparseOcc with 4 GPUs (i.e the last four GPUs):
```
export CUDA_VISIBLE_DEVICES=4,5,6,7
torchrun --nproc_per_node 4 train.py --config configs/sparseocc_r50_nuimg_704x256_8f.py
```
The batch size for each GPU will be scaled automatically. So there is no need to modify the `batch_size` in config files.
## Evaluation
Single-GPU evaluation:
```
export CUDA_VISIBLE_DEVICES=0
python val.py --config configs/sparseocc_r50_nuimg_704x256_8f.py --weights checkpoints/sparseocc_r50_nuimg_704x256_8f.pth
```
Multi-GPU evaluation:
```
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
torchrun --nproc_per_node 8 val.py --config configs/sparseocc_r50_nuimg_704x256_8f.py --weights checkpoints/sparseocc_r50_nuimg_704x256_8f.pth
```
## Standalone Evaluation
If you want to evaluate your own model using RayIoU, please follow the steps below:
1. Save the predictions (shape=`[200x200x16]`, dtype=`np.uint8`) with the compressed `npz` format. For example:
```
save_path = os.path.join(save_dir, sample_token + '.npz')
np.savez_compressed(save_path, pred=sem_pred)
```
2. The filename for each sample is `sample_token.npz`, for example:
```
prediction/your_model
├── 000681a060c04755a1537cf83b53ba57.npz
├── 000868a72138448191b4092f75ed7776.npz
├── 0017c2623c914571a1ff2a37f034ffd7.npz
├── ...
```
3. Run `ray_metrics.py` to evaluate on the RayIoU:
```
python ray_metrics.py --pred-dir prediction/your_model
```
## Timing
FPS is measured with a single GPU:
```
export CUDA_VISIBLE_DEVICES=0
python timing.py --config configs/sparseocc_r50_nuimg_704x256_8f.py --weights checkpoints/sparseocc_r50_nuimg_704x256_8f.pth
```
## Acknowledgements
Many thanks to these excellent open-source projects:
* [MaskFormer](https://github.com/facebookresearch/MaskFormer)
* [NeuralRecon](https://github.com/zju3dv/NeuralRecon)
* [4D-Occ](https://github.com/tarashakhurana/4d-occ-forecasting)
* [MMDetection3D](https://github.com/open-mmlab/mmdetection3d)
================================================
FILE: configs/r50_nuimg_704x256_8f.py
================================================
dataset_type = 'NuSceneOcc'
dataset_root = 'data/nuscenes/'
occ_gt_root = 'data/nuscenes/occ3d'
# If point cloud range is changed, the models should also change their point
# cloud range accordingly
point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4]
occ_size = [200, 200, 16]
img_norm_cfg = dict(
mean=[123.675, 116.280, 103.530],
std=[58.395, 57.120, 57.375],
to_rgb=True
)
# For nuScenes we usually do 10-class detection
det_class_names = [
'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
]
occ_class_names = [
'others', 'barrier', 'bicycle', 'bus', 'car', 'construction_vehicle',
'motorcycle', 'pedestrian', 'traffic_cone', 'trailer', 'truck',
'driveable_surface', 'other_flat', 'sidewalk',
'terrain', 'manmade', 'vegetation', 'free'
]
input_modality = dict(
use_lidar=False,
use_camera=True,
use_radar=False,
use_map=False,
use_external=False
)
_dim_ = 256
_num_points_ = 4
_num_groups_ = 4
_num_layers_ = 2
_num_frames_ = 8
_num_queries_ = 100
_topk_training_ = [4000, 16000, 64000]
_topk_testing_ = [2000, 8000, 32000]
model = dict(
type='SparseOcc',
data_aug=dict(
img_color_aug=True, # Move some augmentations to GPU
img_norm_cfg=img_norm_cfg,
img_pad_cfg=dict(size_divisor=32)),
use_mask_camera=False,
img_backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN2d', requires_grad=True),
norm_eval=True,
style='pytorch',
with_cp=True),
img_neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=_dim_,
num_outs=4),
pts_bbox_head=dict(
type='SparseOccHead',
class_names=occ_class_names,
embed_dims=_dim_,
occ_size=occ_size,
pc_range=point_cloud_range,
transformer=dict(
type='SparseOccTransformer',
embed_dims=_dim_,
num_layers=_num_layers_,
num_frames=_num_frames_,
num_points=_num_points_,
num_groups=_num_groups_,
num_queries=_num_queries_,
num_levels=4,
num_classes=len(occ_class_names),
pc_range=point_cloud_range,
occ_size=occ_size,
topk_training=_topk_training_,
topk_testing=_topk_testing_),
loss_cfgs=dict(
loss_mask2former=dict(
type='Mask2FormerLoss',
num_classes=len(occ_class_names),
no_class_weight=0.1,
loss_cls_weight=2.0,
loss_mask_weight=5.0,
loss_dice_weight=5.0,
),
loss_geo_scal=dict(
type='GeoScalLoss',
num_classes=len(occ_class_names),
loss_weight=1.0
),
loss_sem_scal=dict(
type='SemScalLoss',
num_classes=len(occ_class_names),
loss_weight=1.0
)
),
),
)
ida_aug_conf = {
'resize_lim': (0.38, 0.55),
'final_dim': (256, 704),
'bot_pct_lim': (0.0, 0.0),
'rot_lim': (0.0, 0.0),
'H': 900, 'W': 1600,
'rand_flip': True,
}
bda_aug_conf = dict(
rot_lim=(-22.5, 22.5),
scale_lim=(1., 1.),
flip_dx_ratio=0.5,
flip_dy_ratio=0.5
)
train_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=False, color_type='color'),
dict(type='LoadMultiViewImageFromMultiSweeps', sweeps_num=_num_frames_ - 1),
dict(type='BEVAug', bda_aug_conf=bda_aug_conf, classes=det_class_names, is_train=True),
dict(type='LoadOccGTFromFile', num_classes=len(occ_class_names)),
dict(type='RandomTransformImage', ida_aug_conf=ida_aug_conf, training=True),
dict(type='DefaultFormatBundle3D', class_names=det_class_names),
dict(type='Collect3D', keys=['img', 'voxel_semantics', 'voxel_instances', 'instance_class_ids'], # other keys: 'mask_camera'
meta_keys=('filename', 'ori_shape', 'img_shape', 'pad_shape', 'lidar2img', 'img_timestamp', 'ego2lidar'))
]
test_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=False, color_type='color'),
dict(type='LoadMultiViewImageFromMultiSweeps', sweeps_num=_num_frames_ - 1, test_mode=True),
dict(type='BEVAug', bda_aug_conf=bda_aug_conf, classes=det_class_names, is_train=False),
dict(type='LoadOccGTFromFile', num_classes=len(occ_class_names)),
dict(type='RandomTransformImage', ida_aug_conf=ida_aug_conf, training=False),
dict(type='DefaultFormatBundle3D', class_names=det_class_names),
dict(type='Collect3D', keys=['img', 'voxel_semantics', 'voxel_instances', 'instance_class_ids'],
meta_keys=('filename', 'ori_shape', 'img_shape', 'pad_shape', 'lidar2img', 'img_timestamp', 'ego2lidar'))
]
data = dict(
workers_per_gpu=8,
train=dict(
type=dataset_type,
data_root=dataset_root,
occ_gt_root=occ_gt_root,
ann_file=dataset_root + 'nuscenes_infos_train_sweep.pkl',
pipeline=train_pipeline,
classes=det_class_names,
modality=input_modality,
test_mode=False
),
val=dict(
type=dataset_type,
data_root=dataset_root,
occ_gt_root=occ_gt_root,
ann_file=dataset_root + 'nuscenes_infos_val_sweep.pkl',
pipeline=test_pipeline,
classes=det_class_names,
modality=input_modality,
test_mode=True
),
test=dict(
type=dataset_type,
data_root=dataset_root,
occ_gt_root=occ_gt_root,
ann_file=dataset_root + 'nuscenes_infos_test_sweep.pkl',
pipeline=test_pipeline,
classes=det_class_names,
modality=input_modality,
test_mode=True
),
)
optimizer = dict(
type='AdamW',
lr=5e-4,
paramwise_cfg=dict(
custom_keys={
'img_backbone': dict(lr_mult=0.1),
'sampling_offset': dict(lr_mult=0.1),
}),
weight_decay=0.01
)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=1.0 / 3,
by_epoch=True,
step=[22, 24],
gamma=0.2
)
total_epochs = 24
batch_size = 8
# load pretrained weights
load_from = 'pretrain/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth'
revise_keys = [('backbone', 'img_backbone')]
# resume the last training
resume_from = None
# checkpointing
checkpoint_config = dict(interval=1, max_keep_ckpts=1)
# logging
log_config = dict(
interval=1,
hooks=[
dict(type='MyTextLoggerHook', interval=1, reset_flag=True),
dict(type='MyTensorboardLoggerHook', interval=500, reset_flag=True)
]
)
# evaluation
eval_config = dict(interval=total_epochs)
# other flags
debug = False
================================================
FILE: configs/r50_nuimg_704x256_8f_60e.py
================================================
_base_ = ['./r50_nuimg_704x256_8f.py']
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=1.0 / 3,
by_epoch=True,
step=[48, 60],
gamma=0.2
)
total_epochs = 60
# evaluation
eval_config = dict(interval=total_epochs)
================================================
FILE: configs/r50_nuimg_704x256_8f_openocc.py
================================================
_base_ = ['./r50_nuimg_704x256_8f.py']
occ_gt_root = 'data/nuscenes/openocc_v2'
det_class_names = [
'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
]
occ_class_names = [
'car', 'truck', 'trailer', 'bus', 'construction_vehicle',
'bicycle', 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier',
'driveable_surface', 'other_flat', 'sidewalk',
'terrain', 'manmade', 'vegetation', 'free'
]
_num_frames_ = 8
model = dict(
pts_bbox_head=dict(
class_names=occ_class_names,
transformer=dict(
num_classes=len(occ_class_names)),
loss_cfgs=dict(
loss_mask2former=dict(
num_classes=len(occ_class_names)
),
),
),
)
ida_aug_conf = {
'resize_lim': (0.38, 0.55),
'final_dim': (256, 704),
'bot_pct_lim': (0.0, 0.0),
'rot_lim': (0.0, 0.0),
'H': 900, 'W': 1600,
'rand_flip': False,
}
train_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=False, color_type='color'),
dict(type='LoadMultiViewImageFromMultiSweeps', sweeps_num=_num_frames_ - 1),
dict(type='LoadOccGTFromFile', num_classes=len(occ_class_names)),
dict(type='RandomTransformImage', ida_aug_conf=ida_aug_conf, training=True),
dict(type='DefaultFormatBundle3D', class_names=det_class_names),
dict(type='Collect3D', keys=['img', 'voxel_semantics', 'voxel_instances', 'instance_class_ids'], # other keys: 'mask_camera'
meta_keys=('filename', 'ori_shape', 'img_shape', 'pad_shape', 'lidar2img', 'img_timestamp', 'ego2lidar'))
]
test_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=False, color_type='color'),
dict(type='LoadMultiViewImageFromMultiSweeps', sweeps_num=_num_frames_ - 1, test_mode=True),
dict(type='LoadOccGTFromFile', num_classes=len(occ_class_names)),
dict(type='RandomTransformImage', ida_aug_conf=ida_aug_conf, training=False),
dict(type='DefaultFormatBundle3D', class_names=det_class_names),
dict(type='Collect3D', keys=['img', 'voxel_semantics', 'voxel_instances', 'instance_class_ids'],
meta_keys=('filename', 'ori_shape', 'img_shape', 'pad_shape', 'lidar2img', 'img_timestamp', 'ego2lidar'))
]
data = dict(
workers_per_gpu=8,
train=dict(
pipeline=train_pipeline,
occ_gt_root=occ_gt_root
),
val=dict(
pipeline=test_pipeline,
occ_gt_root=occ_gt_root
),
test=dict(
pipeline=test_pipeline,
occ_gt_root=occ_gt_root
),
)
================================================
FILE: configs/r50_nuimg_704x256_8f_pano.py
================================================
_base_ = ['./r50_nuimg_704x256_8f.py']
occ_gt_root = 'data/nuscenes/occ3d_panoptic'
# For nuScenes we usually do 10-class detection
det_class_names = [
'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
]
occ_class_names = [
'others', 'barrier', 'bicycle', 'bus', 'car', 'construction_vehicle',
'motorcycle', 'pedestrian', 'traffic_cone', 'trailer', 'truck',
'driveable_surface', 'other_flat', 'sidewalk',
'terrain', 'manmade', 'vegetation', 'free'
]
_num_frames_ = 8
model = dict(
pts_bbox_head=dict(
panoptic=True
)
)
ida_aug_conf = {
'resize_lim': (0.38, 0.55),
'final_dim': (256, 704),
'bot_pct_lim': (0.0, 0.0),
'rot_lim': (0.0, 0.0),
'H': 900, 'W': 1600,
'rand_flip': True,
}
bda_aug_conf = dict(
rot_lim=(-22.5, 22.5),
scale_lim=(1., 1.),
flip_dx_ratio=0.5,
flip_dy_ratio=0.5
)
train_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=False, color_type='color'),
dict(type='LoadMultiViewImageFromMultiSweeps', sweeps_num=_num_frames_ - 1),
dict(type='BEVAug', bda_aug_conf=bda_aug_conf, classes=det_class_names, is_train=True),
dict(type='LoadOccGTFromFile', num_classes=len(occ_class_names), inst_class_ids=[2, 3, 4, 5, 6, 7, 9, 10]),
dict(type='RandomTransformImage', ida_aug_conf=ida_aug_conf, training=True),
dict(type='DefaultFormatBundle3D', class_names=det_class_names),
dict(type='Collect3D', keys=['img', 'voxel_semantics', 'voxel_instances', 'instance_class_ids'], # other keys: 'mask_camera'
meta_keys=('filename', 'ori_shape', 'img_shape', 'pad_shape', 'lidar2img', 'img_timestamp', 'ego2lidar'))
]
test_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=False, color_type='color'),
dict(type='LoadMultiViewImageFromMultiSweeps', sweeps_num=_num_frames_ - 1, test_mode=True),
dict(type='BEVAug', bda_aug_conf=bda_aug_conf, classes=det_class_names, is_train=False),
dict(type='LoadOccGTFromFile', num_classes=len(occ_class_names), inst_class_ids=[2, 3, 4, 5, 6, 7, 9, 10]),
dict(type='RandomTransformImage', ida_aug_conf=ida_aug_conf, training=False),
dict(type='DefaultFormatBundle3D', class_names=det_class_names),
dict(type='Collect3D', keys=['img', 'voxel_semantics', 'voxel_instances', 'instance_class_ids'],
meta_keys=('filename', 'ori_shape', 'img_shape', 'pad_shape', 'lidar2img', 'img_timestamp', 'ego2lidar'))
]
data = dict(
workers_per_gpu=8,
train=dict(
pipeline=train_pipeline,
occ_gt_root=occ_gt_root
),
val=dict(
pipeline=test_pipeline,
occ_gt_root=occ_gt_root
),
test=dict(
pipeline=test_pipeline,
occ_gt_root=occ_gt_root
),
)
================================================
FILE: gen_instance_info.py
================================================
import os
import tqdm
import glob
import pickle
import argparse
import numpy as np
import torch
import multiprocessing
from pyquaternion import Quaternion
from nuscenes.utils.data_classes import Box
from nuscenes.utils.geometry_utils import points_in_box
parser = argparse.ArgumentParser()
parser.add_argument('--nusc-root', default='data/nuscenes')
parser.add_argument('--occ3d-root', default='data/nuscenes/occ3d')
parser.add_argument('--output-dir', default='data/nuscenes/occ3d_panoptic')
parser.add_argument('--version', default='v1.0-trainval')
args = parser.parse_args()
token2path = {}
for gt_path in glob.glob(os.path.join(args.occ3d_root, '*/*/*.npz')):
token = gt_path.split('/')[-2]
token2path[token] = gt_path
occ_class_names = [
'others', 'barrier', 'bicycle', 'bus', 'car', 'construction_vehicle',
'motorcycle', 'pedestrian', 'traffic_cone', 'trailer', 'truck',
'driveable_surface', 'other_flat', 'sidewalk',
'terrain', 'manmade', 'vegetation', 'free'
]
det_class_names = [
'car', 'truck', 'trailer', 'bus', 'construction_vehicle',
'bicycle', 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'
]
def convert_to_nusc_box(bboxes, lift_center=False, wlh_margin=0.0):
results = []
for q in range(bboxes.shape[0]):
bbox = bboxes[q].copy()
if lift_center:
bbox[2] += bbox[5] * 0.5
bbox_yaw = -bbox[6] - np.pi / 2
orientation = Quaternion(axis=[0, 0, 1], radians=bbox_yaw).inverse
box = Box(
center=[bbox[0], bbox[1], bbox[2]],
# 0.8 in pc range is roungly 2 voxels in occ grid
# enlarge bbox to include voxels on the edge
size=[bbox[3]+wlh_margin, bbox[4]+wlh_margin, bbox[5]+wlh_margin],
orientation=orientation,
)
results.append(box)
return results
def meshgrid3d(occ_size, pc_range): # points in ego coord
W, H, D = occ_size
xs = torch.linspace(0.5, W - 0.5, W).view(W, 1, 1).expand(W, H, D) / W
ys = torch.linspace(0.5, H - 0.5, H).view(1, H, 1).expand(W, H, D) / H
zs = torch.linspace(0.5, D - 0.5, D).view(1, 1, D).expand(W, H, D) / D
xs = xs * (pc_range[3] - pc_range[0]) + pc_range[0]
ys = ys * (pc_range[4] - pc_range[1]) + pc_range[1]
zs = zs * (pc_range[5] - pc_range[2]) + pc_range[2]
xyz = torch.stack((xs, ys, zs), -1)
return xyz
def process_add_instance_info(sample):
point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4]
occ_size = [200, 200, 16]
num_classes = 18
occ_gt_path = token2path[sample['token']]
occ_labels = np.load(occ_gt_path)
occ_gt = occ_labels['semantics']
gt_boxes = sample['gt_boxes']
gt_names = sample['gt_names']
bboxes = convert_to_nusc_box(gt_boxes)
instance_gt = np.zeros(occ_gt.shape).astype(np.uint8)
instance_id = 1
pts = meshgrid3d(occ_size, point_cloud_range).numpy()
# filter out free voxels to accelerate
valid_idx = np.where(occ_gt < num_classes - 1)
flatten_occ_gt = occ_gt[valid_idx]
flatten_inst_gt = instance_gt[valid_idx]
flatten_pts = pts[valid_idx]
instance_boxes = []
instance_class_ids = []
for i in range(len(gt_names)):
if gt_names[i] not in occ_class_names:
continue
occ_tag_id = occ_class_names.index(gt_names[i])
# Move box to ego vehicle coord system
bbox = bboxes[i]
bbox.rotate(Quaternion(sample['lidar2ego_rotation']))
bbox.translate(np.array(sample['lidar2ego_translation']))
mask = points_in_box(bbox, flatten_pts.transpose(1, 0))
# ignore voxels not belonging to this class
mask[mask] = (flatten_occ_gt[mask] == occ_tag_id)
# ignore voxels already occupied
mask[mask] = (flatten_inst_gt[mask] == 0)
# only instance with at least 1 voxel will be recorded
if mask.sum() > 0:
flatten_inst_gt[mask] = instance_id
instance_id += 1
# enlarge boxes to include voxels on the edge
new_box = bbox.copy()
new_box.wlh = new_box.wlh + 0.8
instance_boxes.append(new_box)
instance_class_ids.append(occ_tag_id)
# classes that should be viewed as one instance
all_class_ids_unique = np.unique(occ_gt)
for i, class_name in enumerate(occ_class_names):
if class_name in det_class_names or class_name == 'free' or i not in all_class_ids_unique:
continue
flatten_inst_gt[flatten_occ_gt == i] = instance_id
instance_id += 1
# post process unconvered non-occupied voxels
uncover_idx = np.where(flatten_inst_gt == 0)
uncover_pts = flatten_pts[uncover_idx]
uncover_inst_gt = np.zeros_like(uncover_pts[..., 0]).astype(np.uint8)
unconver_occ_gt = flatten_occ_gt[uncover_idx]
# uncover_inst_dist records the dist between each voxel and its current nearest bbox's center
uncover_inst_dist = np.ones_like(uncover_pts[..., 0]) * 1e8
for i, box in enumerate(instance_boxes):
# important, non-background inst id starts from 1
inst_id = i + 1
class_id = instance_class_ids[i]
mask = points_in_box(box, uncover_pts.transpose(1, 0))
# mask voxels not belonging to this class
mask[unconver_occ_gt != class_id] = False
dist = np.sum((box.center - uncover_pts) ** 2, axis=-1)
# voxels that have already been assigned to a closer box's instance should be ignored
# voxels that not inside the box should be ignored
# `mask[(dist >= uncover_inst_dist)]=False` is right, as it only transforms True masks into False without converting False into True
# to give readers a more clear understanding, the most standard writing is `mask[mask & (dist >= uncover_inst_dist)]=False`
mask[dist >= uncover_inst_dist] = False
# mask[mask & (dist >= uncover_inst_dist)]=False
# important: only voxels inside the box (mask = True) and having no closer identical-class box need to update dist
uncover_inst_dist[mask] = dist[mask]
uncover_inst_gt[mask] = inst_id
flatten_inst_gt[uncover_idx] = uncover_inst_gt
instance_gt[valid_idx] = flatten_inst_gt
# not using this checking function yet
# assert (instance_gt == 0).sum() - (occ_gt == num_classes-1).sum() < 100, "too many non-free voxels are not assigned to any instance in %s"%(occ_gt_path)
# global max_margin
# if max_margin < (instance_gt == 0).sum() - (occ_gt == num_classes-1).sum():
# print("###### new max margin: ", max(max_margin, (instance_gt == 0).sum() - (occ_gt == num_classes-1).sum()))
# max_margin = max(max_margin, (instance_gt == 0).sum() - (occ_gt == num_classes-1).sum())
# save to original path
data_split = occ_gt_path.split(os.path.sep)[-3:]
data_path = os.path.sep.join(data_split)
##### Warning: Using args.xxx (global variable) here is strongly unrecommended
save_path = os.path.join(args.output_dir, data_path)
save_dir = os.path.split(save_path)[0]
if not os.path.exists(save_dir):
os.makedirs(save_dir)
if np.unique(instance_gt).shape[0] != instance_gt.max()+1:
print('warning: some instance masks are covered by following ones %s'%(save_dir))
# only semantic and mask information is needed to be reserved
retain_keys = ['semantics', 'mask_lidar', 'mask_camera']
new_occ_labels = {k: occ_labels[k] for k in retain_keys}
new_occ_labels['instances'] = instance_gt
np.savez_compressed(save_path, **new_occ_labels)
def add_instance_info(sample_infos):
if not os.path.exists(args.output_dir):
os.makedirs(args.output_dir)
# all cpus participate in multi processing
pool = multiprocessing.Pool(multiprocessing.cpu_count())
with tqdm.tqdm(total=len(sample_infos['infos'])) as pbar:
for _ in pool.imap(process_add_instance_info, sample_infos['infos']):
pbar.update(1)
pool.close()
pool.join()
if __name__ == '__main__':
if args.version == 'v1.0-trainval':
sample_infos = pickle.load(open(os.path.join(args.nusc_root, 'nuscenes_infos_train_sweep.pkl'), 'rb'))
add_instance_info(sample_infos)
sample_infos = pickle.load(open(os.path.join(args.nusc_root, 'nuscenes_infos_val_sweep.pkl'), 'rb'))
add_instance_info(sample_infos)
elif args.version == 'v1.0-test':
sample_infos = pickle.load(open(os.path.join(args.nusc_root, 'nuscenes_infos_test_sweep.pkl'), 'rb'))
add_instance_info(sample_infos)
else:
raise ValueError
================================================
FILE: gen_sweep_info.py
================================================
# Generate info files manually
import os
import mmcv
import tqdm
import pickle
import argparse
import numpy as np
from nuscenes import NuScenes
from pyquaternion import Quaternion
parser = argparse.ArgumentParser()
parser.add_argument('--data-root', default='data/nuscenes')
parser.add_argument('--version', default='v1.0-trainval')
args = parser.parse_args()
def get_cam_info(nusc, sample_data):
pose_record = nusc.get('ego_pose', sample_data['ego_pose_token'])
cs_record = nusc.get('calibrated_sensor', sample_data['calibrated_sensor_token'])
sensor2ego_translation = cs_record['translation']
ego2global_translation = pose_record['translation']
sensor2ego_rotation = Quaternion(cs_record['rotation']).rotation_matrix
ego2global_rotation = Quaternion(pose_record['rotation']).rotation_matrix
cam_intrinsic = np.array(cs_record['camera_intrinsic'])
sensor2global_rotation = sensor2ego_rotation.T @ ego2global_rotation.T
sensor2global_translation = sensor2ego_translation @ ego2global_rotation.T + ego2global_translation
return {
'data_path': os.path.join(args.data_root, sample_data['filename']),
'sensor2global_rotation': sensor2global_rotation,
'sensor2global_translation': sensor2global_translation,
'cam_intrinsic': cam_intrinsic,
'timestamp': sample_data['timestamp'],
}
def add_sweep_info(nusc, sample_infos):
for curr_id in tqdm.tqdm(range(len(sample_infos['infos']))):
sample = nusc.get('sample', sample_infos['infos'][curr_id]['token'])
cam_types = [
'CAM_FRONT', 'CAM_FRONT_RIGHT', 'CAM_BACK_RIGHT',
'CAM_BACK', 'CAM_BACK_LEFT', 'CAM_FRONT_LEFT'
]
curr_cams = dict()
for cam in cam_types:
curr_cams[cam] = nusc.get('sample_data', sample['data'][cam])
for cam in cam_types:
sample_data = nusc.get('sample_data', sample['data'][cam])
sweep_cam = get_cam_info(nusc, sample_data)
sample_infos['infos'][curr_id]['cams'][cam].update(sweep_cam)
# remove unnecessary
for cam in cam_types:
del sample_infos['infos'][curr_id]['cams'][cam]['sensor2ego_translation']
del sample_infos['infos'][curr_id]['cams'][cam]['sensor2ego_rotation']
del sample_infos['infos'][curr_id]['cams'][cam]['ego2global_translation']
del sample_infos['infos'][curr_id]['cams'][cam]['ego2global_rotation']
sweep_infos = []
if sample['prev'] != '': # add sweep frame between two key frame
for _ in range(5):
sweep_info = dict()
for cam in cam_types:
if curr_cams[cam]['prev'] == '':
sweep_info = sweep_infos[-1]
break
sample_data = nusc.get('sample_data', curr_cams[cam]['prev'])
sweep_cam = get_cam_info(nusc, sample_data)
curr_cams[cam] = sample_data
sweep_info[cam] = sweep_cam
sweep_infos.append(sweep_info)
sample_infos['infos'][curr_id]['sweeps'] = sweep_infos
return sample_infos
if __name__ == '__main__':
nusc = NuScenes(args.version, args.data_root)
if args.version == 'v1.0-trainval':
sample_infos = pickle.load(open(os.path.join(args.data_root, 'nuscenes_infos_train.pkl'), 'rb'))
sample_infos = add_sweep_info(nusc, sample_infos)
mmcv.dump(sample_infos, os.path.join(args.data_root, 'nuscenes_infos_train_sweep.pkl'))
sample_infos = pickle.load(open(os.path.join(args.data_root, 'nuscenes_infos_val.pkl'), 'rb'))
sample_infos = add_sweep_info(nusc, sample_infos)
mmcv.dump(sample_infos, os.path.join(args.data_root, 'nuscenes_infos_val_sweep.pkl'))
elif args.version == 'v1.0-test':
sample_infos = pickle.load(open(os.path.join(args.data_root, 'nuscenes_infos_test.pkl'), 'rb'))
sample_infos = add_sweep_info(nusc, sample_infos)
mmcv.dump(sample_infos, os.path.join(args.data_root, 'nuscenes_infos_test_sweep.pkl'))
else:
raise ValueError
================================================
FILE: lib/dvr/dvr.cpp
================================================
// Acknowledgments: https://github.com/tarashakhurana/4d-occ-forecasting
// Modified by Haisong Liu
#include <string>
#include <torch/extension.h>
#include <vector>
/*
* CUDA forward declarations
*/
std::vector<torch::Tensor> render_forward_cuda(torch::Tensor sigma,
torch::Tensor origin,
torch::Tensor points,
torch::Tensor tindex,
const std::vector<int> grid,
std::string phase_name);
std::vector<torch::Tensor>
render_cuda(torch::Tensor sigma, torch::Tensor origin, torch::Tensor points,
torch::Tensor tindex, std::string loss_name);
torch::Tensor init_cuda(torch::Tensor points, torch::Tensor tindex,
const std::vector<int> grid);
/*
* C++ interface
*/
#define CHECK_CUDA(x) \
TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
#define CHECK_CONTIGUOUS(x) \
TORCH_CHECK(x.is_contiguous(), #x " must be contiguous")
#define CHECK_INPUT(x) \
CHECK_CUDA(x); \
CHECK_CONTIGUOUS(x)
std::vector<torch::Tensor>
render_forward(torch::Tensor sigma, torch::Tensor origin, torch::Tensor points,
torch::Tensor tindex, const std::vector<int> grid,
std::string phase_name) {
CHECK_INPUT(sigma);
CHECK_INPUT(origin);
CHECK_INPUT(points);
CHECK_INPUT(tindex);
return render_forward_cuda(sigma, origin, points, tindex, grid, phase_name);
}
std::vector<torch::Tensor> render(torch::Tensor sigma, torch::Tensor origin,
torch::Tensor points, torch::Tensor tindex,
std::string loss_name) {
CHECK_INPUT(sigma);
CHECK_INPUT(origin);
CHECK_INPUT(points);
CHECK_INPUT(tindex);
return render_cuda(sigma, origin, points, tindex, loss_name);
}
torch::Tensor init(torch::Tensor points, torch::Tensor tindex,
const std::vector<int> grid) {
CHECK_INPUT(points);
CHECK_INPUT(tindex);
return init_cuda(points, tindex, grid);
}
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
m.def("init", &init, "Initialize");
m.def("render", &render, "Render");
m.def("render_forward", &render_forward, "Render (forward pass only)");
}
================================================
FILE: lib/dvr/dvr.cu
================================================
// Acknowledgments: https://github.com/tarashakhurana/4d-occ-forecasting
// Modified by Haisong Liu
#include <torch/extension.h>
#include <stdio.h>
#include <cuda.h>
#include <cuda_runtime.h>
#include <vector>
#include <string>
#include <iostream>
#define MAX_D 1446 // 700 + 700 + 45 + 1
#define MAX_STEP 1000
enum LossType {L1, L2, ABSREL};
enum PhaseName {TEST, TRAIN};
template <typename scalar_t>
__global__ void init_cuda_kernel(
const torch::PackedTensorAccessor32<scalar_t,3,torch::RestrictPtrTraits> points,
const torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> tindex,
torch::PackedTensorAccessor32<scalar_t,5,torch::RestrictPtrTraits> occupancy) {
// batch index
const auto n = blockIdx.y;
// ray index
const auto c = blockIdx.x * blockDim.x + threadIdx.x;
// num of rays
const auto M = points.size(1);
const auto T = occupancy.size(1);
// we allocated more threads than num_rays
if (c < M) {
// ray end point
const auto t = tindex[n][c];
// invalid points
assert(T == 1 || t < T);
// if t < 0, it is a padded point
if (t < 0) return;
// time index for sigma
// when T = 1, we have a static sigma
const auto ts = (T == 1) ? 0 : t;
// grid shape
const int vzsize = occupancy.size(2);
const int vysize = occupancy.size(3);
const int vxsize = occupancy.size(4);
// assert(vzsize + vysize + vxsize <= MAX_D);
// end point
const int vx = int(points[n][c][0]);
const int vy = int(points[n][c][1]);
const int vz = int(points[n][c][2]);
//
if (0 <= vx && vx < vxsize &&
0 <= vy && vy < vysize &&
0 <= vz && vz < vzsize) {
occupancy[n][ts][vz][vy][vx] = 1;
}
}
}
template <typename scalar_t>
__global__ void render_forward_cuda_kernel(
const torch::PackedTensorAccessor32<scalar_t,5,torch::RestrictPtrTraits> sigma,
const torch::PackedTensorAccessor32<scalar_t,3,torch::RestrictPtrTraits> origin,
const torch::PackedTensorAccessor32<scalar_t,3,torch::RestrictPtrTraits> points,
const torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> tindex,
// torch::PackedTensorAccessor32<scalar_t,5,torch::RestrictPtrTraits> pog,
torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> pred_dist,
torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> gt_dist,
torch::PackedTensorAccessor32<scalar_t,3,torch::RestrictPtrTraits> coord_index,
PhaseName train_phase) {
// batch index
const auto n = blockIdx.y;
// ray index
const auto c = blockIdx.x * blockDim.x + threadIdx.x;
// num of rays
const auto M = points.size(1);
const auto T = sigma.size(1);
// we allocated more threads than num_rays
if (c < M) {
// ray end point
const auto t = tindex[n][c];
// invalid points
// assert(t < T);
assert(T == 1 || t < T);
// time index for sigma
// when T = 1, we have a static sigma
const auto ts = (T == 1) ? 0 : t;
// if t < 0, it is a padded point
if (t < 0) return;
// grid shape
const int vzsize = sigma.size(2);
const int vysize = sigma.size(3);
const int vxsize = sigma.size(4);
// assert(vzsize + vysize + vxsize <= MAX_D);
// origin
const double xo = origin[n][t][0];
const double yo = origin[n][t][1];
const double zo = origin[n][t][2];
// end point
const double xe = points[n][c][0];
const double ye = points[n][c][1];
const double ze = points[n][c][2];
// locate the voxel where the origin resides
const int vxo = int(xo);
const int vyo = int(yo);
const int vzo = int(zo);
const int vxe = int(xe);
const int vye = int(ye);
const int vze = int(ze);
// NOTE: new
int vx = vxo;
int vy = vyo;
int vz = vzo;
// origin to end
const double rx = xe - xo;
const double ry = ye - yo;
const double rz = ze - zo;
double gt_d = sqrt(rx * rx + ry * ry + rz * rz);
// directional vector
const double dx = rx / gt_d;
const double dy = ry / gt_d;
const double dz = rz / gt_d;
// In which direction the voxel ids are incremented.
const int stepX = (dx >= 0) ? 1 : -1;
const int stepY = (dy >= 0) ? 1 : -1;
const int stepZ = (dz >= 0) ? 1 : -1;
// Distance along the ray to the next voxel border from the current position (tMaxX, tMaxY, tMaxZ).
const double next_voxel_boundary_x = vx + (stepX < 0 ? 0 : 1);
const double next_voxel_boundary_y = vy + (stepY < 0 ? 0 : 1);
const double next_voxel_boundary_z = vz + (stepZ < 0 ? 0 : 1);
// tMaxX, tMaxY, tMaxZ -- distance until next intersection with voxel-border
// the value of t at which the ray crosses the first vertical voxel boundary
double tMaxX = (dx!=0) ? (next_voxel_boundary_x - xo)/dx : DBL_MAX; //
double tMaxY = (dy!=0) ? (next_voxel_boundary_y - yo)/dy : DBL_MAX; //
double tMaxZ = (dz!=0) ? (next_voxel_boundary_z - zo)/dz : DBL_MAX; //
// tDeltaX, tDeltaY, tDeltaZ --
// how far along the ray we must move for the horizontal component to equal the width of a voxel
// the direction in which we traverse the grid
// can only be FLT_MAX if we never go in that direction
const double tDeltaX = (dx!=0) ? stepX/dx : DBL_MAX;
const double tDeltaY = (dy!=0) ? stepY/dy : DBL_MAX;
const double tDeltaZ = (dz!=0) ? stepZ/dz : DBL_MAX;
int3 path[MAX_D];
double csd[MAX_D]; // cumulative sum of sigma times delta
double p[MAX_D]; // alpha
double d[MAX_D];
// forward raymarching with voxel traversal
int step = 0; // total number of voxels traversed
int count = 0; // number of voxels traversed inside the voxel grid
double last_d = 0.0; // correct initialization
// voxel traversal raycasting
bool was_inside = false;
while (true) {
bool inside = (0 <= vx && vx < vxsize) &&
(0 <= vy && vy < vysize) &&
(0 <= vz && vz < vzsize);
if (inside) {
was_inside = true;
path[count] = make_int3(vx, vy, vz);
} else if (was_inside) { // was but no longer inside
// we know we are not coming back so terminate
break;
} /*else if (last_d > gt_d) {
break;
} */
/*else { // has not gone inside yet
// assert(count == 0);
// (1) when we have hit the destination but haven't gone inside the voxel grid
// (2) when we have traveled MAX_D voxels but haven't found one valid voxel
// handle intersection corner cases in case of infinite loop
bool hit = (vx == vxe && vy == vye && vz == vze); // this test seems brittle with corner cases
if (hit || step >= MAX_D)
break;
//if (last_d >= gt_d || step >= MAX_D) break;
} */
// _d represents the ray distance has traveled before escaping the current voxel cell
double _d = 0.0;
// voxel traversal
if (tMaxX < tMaxY) {
if (tMaxX < tMaxZ) {
_d = tMaxX;
vx += stepX;
tMaxX += tDeltaX;
} else {
_d = tMaxZ;
vz += stepZ;
tMaxZ += tDeltaZ;
}
} else {
if (tMaxY < tMaxZ) {
_d = tMaxY;
vy += stepY;
tMaxY += tDeltaY;
} else {
_d = tMaxZ;
vz += stepZ;
tMaxZ += tDeltaZ;
}
}
if (inside) {
// get sigma at the current voxel
const int3 &v = path[count]; // use the recorded index
const double _sigma = sigma[n][ts][v.z][v.y][v.x];
const double _delta = max(0.0, _d - last_d); // THIS TURNS OUT IMPORTANT
const double sd = _sigma * _delta;
if (count == 0) { // the first voxel inside
csd[count] = sd;
p[count] = 1 - exp(-sd);
} else {
csd[count] = csd[count-1] + sd;
p[count] = exp(-csd[count-1]) - exp(-csd[count]);
}
// record the traveled distance
d[count] = _d;
// count the number of voxels we have escaped
count ++;
}
last_d = _d;
step ++;
if (step > MAX_STEP) {
break;
}
}
// the total number of voxels visited should not exceed this number
assert(count <= MAX_D);
if (count > 0) {
// compute the expected ray distance
//double exp_d = 0.0;
double exp_d = d[count-1];
const int3 &v_init = path[count-1];
int x = v_init.x;
int y = v_init.y;
int z = v_init.z;
for (int i = 0; i < count; i++) {
//printf("%f\t%f\n",p[i], d[i]);
//exp_d += p[i] * d[i];
const int3 &v = path[i];
const double occ = sigma[n][ts][v.z][v.y][v.x];
if (occ > 0.5) {
exp_d = d[i];
x = v.x;
y = v.y;
z = v.z;
break;
}
}
//printf("%f\n",exp_d);
// add an imaginary sample at the end point should gt_d exceeds max_d
double p_out = exp(-csd[count-1]);
double max_d = d[count-1];
// if (gt_d > max_d)
// exp_d += (p_out * gt_d);
// p_out is the probability the ray escapes the voxel grid
//exp_d += (p_out * max_d);
if (train_phase == 1) {
gt_d = min(gt_d, max_d);
}
// write the rendered ray distance (max_d)
pred_dist[n][c] = exp_d;
gt_dist[n][c] = gt_d;
coord_index[n][c][0] = double(x);
coord_index[n][c][1] = double(y);
coord_index[n][c][2] = double(z);
// // write occupancy
// for (int i = 0; i < count; i ++) {
// const int3 &v = path[i];
// auto & occ = pog[n][t][v.z][v.y][v.x];
// if (p[i] >= occ) {
// occ = p[i];
// }
// }
}
}
}
/*
* input shape
* sigma : N x T x H x L x W
* origin : N x T x 3
* points : N x M x 4
* output shape
* dist : N x M
*/
std::vector<torch::Tensor> render_forward_cuda(
torch::Tensor sigma,
torch::Tensor origin,
torch::Tensor points,
torch::Tensor tindex,
const std::vector<int> grid,
std::string phase_name) {
const auto N = points.size(0); // batch size
const auto M = points.size(1); // num of rays
const auto T = grid[0];
const auto H = grid[1];
const auto L = grid[2];
const auto W = grid[3];
const auto device = sigma.device();
const int threads = 1024;
const dim3 blocks((M + threads - 1) / threads, N);
//
// const auto dtype = points.dtype();
// const auto options = torch::TensorOptions().dtype(dtype).device(device).requires_grad(false);
// auto pog = torch::zeros({N, T, H, L, W}, options);
// perform rendering
auto gt_dist = -torch::ones({N, M}, device);
auto pred_dist = -torch::ones({N, M}, device);
auto coord_index = torch::zeros({N, M, 3}, device);
PhaseName train_phase;
if (phase_name.compare("test") == 0) {
train_phase = TEST;
} else if (phase_name.compare("train") == 0){
train_phase = TRAIN;
} else {
std::cout << "UNKNOWN PHASE NAME: " << phase_name << std::endl;
exit(1);
}
AT_DISPATCH_FLOATING_TYPES(sigma.type(), "render_forward_cuda", ([&] {
render_forward_cuda_kernel<scalar_t><<<blocks, threads>>>(
sigma.packed_accessor32<scalar_t,5,torch::RestrictPtrTraits>(),
origin.packed_accessor32<scalar_t,3,torch::RestrictPtrTraits>(),
points.packed_accessor32<scalar_t,3,torch::RestrictPtrTraits>(),
tindex.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
// pog.packed_accessor32<scalar_t,5,torch::RestrictPtrTraits>(),
pred_dist.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
gt_dist.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
coord_index.packed_accessor32<scalar_t,3,torch::RestrictPtrTraits>(),
train_phase);
}));
cudaDeviceSynchronize();
// return {pog, pred_dist, gt_dist};
return {pred_dist, gt_dist, coord_index};
}
template <typename scalar_t>
__global__ void render_cuda_kernel(
const torch::PackedTensorAccessor32<scalar_t,5,torch::RestrictPtrTraits> sigma,
const torch::PackedTensorAccessor32<scalar_t,3,torch::RestrictPtrTraits> origin,
const torch::PackedTensorAccessor32<scalar_t,3,torch::RestrictPtrTraits> points,
const torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> tindex,
// const torch::PackedTensorAccessor32<scalar_t,5,torch::RestrictPtrTraits> occupancy,
torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> pred_dist,
torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> gt_dist,
torch::PackedTensorAccessor32<scalar_t,5,torch::RestrictPtrTraits> grad_sigma,
// torch::PackedTensorAccessor32<scalar_t,5,torch::RestrictPtrTraits> grad_sigma_count,
LossType loss_type) {
// batch index
const auto n = blockIdx.y;
// ray index
const auto c = blockIdx.x * blockDim.x + threadIdx.x;
// num of rays
const auto M = points.size(1);
const auto T = sigma.size(1);
// we allocated more threads than num_rays
if (c < M) {
// ray end point
const auto t = tindex[n][c];
// invalid points
// assert(t < T);
assert(T == 1 || t < T);
// time index for sigma
// when T = 1, we have a static sigma
const auto ts = (T == 1) ? 0 : t;
// if t < 0, it is a padded point
if (t < 0) return;
// grid shape
const int vzsize = sigma.size(2);
const int vysize = sigma.size(3);
const int vxsize = sigma.size(4);
// assert(vzsize + vysize + vxsize <= MAX_D);
// origin
const double xo = origin[n][t][0];
const double yo = origin[n][t][1];
const double zo = origin[n][t][2];
// end point
const double xe = points[n][c][0];
const double ye = points[n][c][1];
const double ze = points[n][c][2];
// locate the voxel where the origin resides
const int vxo = int(xo);
const int vyo = int(yo);
const int vzo = int(zo);
//
const int vxe = int(xe);
const int vye = int(ye);
const int vze = int(ze);
// NOTE: new
int vx = vxo;
int vy = vyo;
int vz = vzo;
// origin to end
const double rx = xe - xo;
const double ry = ye - yo;
const double rz = ze - zo;
double gt_d = sqrt(rx * rx + ry * ry + rz * rz);
// directional vector
const double dx = rx / gt_d;
const double dy = ry / gt_d;
const double dz = rz / gt_d;
// In which direction the voxel ids are incremented.
const int stepX = (dx >= 0) ? 1 : -1;
const int stepY = (dy >= 0) ? 1 : -1;
const int stepZ = (dz >= 0) ? 1 : -1;
// Distance along the ray to the next voxel border from the current position (tMaxX, tMaxY, tMaxZ).
const double next_voxel_boundary_x = vx + (stepX < 0 ? 0 : 1);
const double next_voxel_boundary_y = vy + (stepY < 0 ? 0 : 1);
const double next_voxel_boundary_z = vz + (stepZ < 0 ? 0 : 1);
// tMaxX, tMaxY, tMaxZ -- distance until next intersection with voxel-border
// the value of t at which the ray crosses the first vertical voxel boundary
double tMaxX = (dx!=0) ? (next_voxel_boundary_x - xo)/dx : DBL_MAX; //
double tMaxY = (dy!=0) ? (next_voxel_boundary_y - yo)/dy : DBL_MAX; //
double tMaxZ = (dz!=0) ? (next_voxel_boundary_z - zo)/dz : DBL_MAX; //
// tDeltaX, tDeltaY, tDeltaZ --
// how far along the ray we must move for the horizontal component to equal the width of a voxel
// the direction in which we traverse the grid
// can only be FLT_MAX if we never go in that direction
const double tDeltaX = (dx!=0) ? stepX/dx : DBL_MAX;
const double tDeltaY = (dy!=0) ? stepY/dy : DBL_MAX;
const double tDeltaZ = (dz!=0) ? stepZ/dz : DBL_MAX;
int3 path[MAX_D];
double csd[MAX_D]; // cumulative sum of sigma times delta
double p[MAX_D]; // alpha
double d[MAX_D];
double dt[MAX_D];
// forward raymarching with voxel traversal
int step = 0; // total number of voxels traversed
int count = 0; // number of voxels traversed inside the voxel grid
double last_d = 0.0; // correct initialization
// voxel traversal raycasting
bool was_inside = false;
while (true) {
bool inside = (0 <= vx && vx < vxsize) &&
(0 <= vy && vy < vysize) &&
(0 <= vz && vz < vzsize);
if (inside) { // now inside
was_inside = true;
path[count] = make_int3(vx, vy, vz);
} else if (was_inside) { // was inside but no longer
// we know we are not coming back so terminate
break;
} else if (last_d > gt_d) {
break;
} /* else { // has not gone inside yet
// assert(count == 0);
// (1) when we have hit the destination but haven't gone inside the voxel grid
// (2) when we have traveled MAX_D voxels but haven't found one valid voxel
// handle intersection corner cases in case of infinite loop
// bool hit = (vx == vxe && vy == vye && vz == vze);
// if (hit || step >= MAX_D)
// break;
if (last_d >= gt_d || step >= MAX_D) break;
} */
// _d represents the ray distance has traveled before escaping the current voxel cell
double _d = 0.0;
// voxel traversal
if (tMaxX < tMaxY) {
if (tMaxX < tMaxZ) {
_d = tMaxX;
vx += stepX;
tMaxX += tDeltaX;
} else {
_d = tMaxZ;
vz += stepZ;
tMaxZ += tDeltaZ;
}
} else {
if (tMaxY < tMaxZ) {
_d = tMaxY;
vy += stepY;
tMaxY += tDeltaY;
} else {
_d = tMaxZ;
vz += stepZ;
tMaxZ += tDeltaZ;
}
}
if (inside) {
// get sigma at the current voxel
const int3 &v = path[count]; // use the recorded index
const double _sigma = sigma[n][ts][v.z][v.y][v.x];
const double _delta = max(0.0, _d - last_d); // THIS TURNS OUT IMPORTANT
const double sd = _sigma * _delta;
if (count == 0) { // the first voxel inside
csd[count] = sd;
p[count] = 1 - exp(-sd);
} else {
csd[count] = csd[count-1] + sd;
p[count] = exp(-csd[count-1]) - exp(-csd[count]);
}
// record the traveled distance
d[count] = _d;
dt[count] = _delta;
// count the number of voxels we have escaped
count ++;
}
last_d = _d;
step ++;
if (step > MAX_STEP) {
break;
}
}
// the total number of voxels visited should not exceed this number
assert(count <= MAX_D);
// WHEN THERE IS AN INTERSECTION BETWEEN THE RAY AND THE VOXEL GRID
if (count > 0) {
// compute the expected ray distance
double exp_d = 0.0;
for (int i = 0; i < count; i ++)
exp_d += p[i] * d[i];
// add an imaginary sample at the end point should gt_d exceeds max_d
double p_out = exp(-csd[count-1]);
double max_d = d[count-1];
exp_d += (p_out * max_d);
gt_d = min(gt_d, max_d);
// write the rendered ray distance (max_d)
pred_dist[n][c] = exp_d;
gt_dist[n][c] = gt_d;
/* backward raymarching */
double dd_dsigma[MAX_D];
for (int i = count - 1; i >= 0; i --) {
// NOTE: probably need to double check again
if (i == count - 1)
dd_dsigma[i] = p_out * max_d;
else
dd_dsigma[i] = dd_dsigma[i+1] - exp(-csd[i]) * (d[i+1] - d[i]);
}
for (int i = count - 1; i >= 0; i --)
dd_dsigma[i] *= dt[i];
// option 2: cap at the boundary
for (int i = count - 1; i >= 0; i --)
dd_dsigma[i] -= dt[i] * p_out * max_d;
double dl_dd = 1.0;
if (loss_type == L1)
dl_dd = (exp_d >= gt_d) ? 1 : -1;
else if (loss_type == L2)
dl_dd = (exp_d - gt_d);
else if (loss_type == ABSREL)
dl_dd = (exp_d >= gt_d) ? (1.0/gt_d) : -(1.0/gt_d);
// apply chain rule
for (int i = 0; i < count; i ++) {
const int3 &v = path[i];
// NOTE: potential race conditions when writing gradients
grad_sigma[n][ts][v.z][v.y][v.x] += dl_dd * dd_dsigma[i];
// grad_sigma_count[n][ts][v.z][v.y][v.x] += 1;
}
}
}
}
/*
* input shape
* sigma : N x T x H x L x W
* origin : N x T x 3
* points : N x M x 4
* output shape
* dist : N x M
* loss : N x M
* grad_sigma : N x T x H x L x W
*/
std::vector<torch::Tensor> render_cuda(
torch::Tensor sigma,
torch::Tensor origin,
torch::Tensor points,
torch::Tensor tindex,
std::string loss_name) {
const auto N = points.size(0); // batch size
const auto M = points.size(1); // num of rays
const auto device = sigma.device();
const int threads = 1024;
const dim3 blocks((M + threads - 1) / threads, N);
// perform rendering
auto gt_dist = -torch::ones({N, M}, device);
auto pred_dist = -torch::ones({N, M}, device);
auto grad_sigma = torch::zeros_like(sigma);
// auto grad_sigma_count = torch::zeros_like(sigma);
LossType loss_type;
if (loss_name.compare("l1") == 0) {
loss_type = L1;
} else if (loss_name.compare("l2") == 0) {
loss_type = L2;
} else if (loss_name.compare("absrel") == 0) {
loss_type = ABSREL;
} else if (loss_name.compare("bce") == 0){
loss_type = L1;
} else {
std::cout << "UNKNOWN LOSS TYPE: " << loss_name << std::endl;
exit(1);
}
AT_DISPATCH_FLOATING_TYPES(sigma.type(), "render_cuda", ([&] {
render_cuda_kernel<scalar_t><<<blocks, threads>>>(
sigma.packed_accessor32<scalar_t,5,torch::RestrictPtrTraits>(),
origin.packed_accessor32<scalar_t,3,torch::RestrictPtrTraits>(),
points.packed_accessor32<scalar_t,3,torch::RestrictPtrTraits>(),
tindex.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
// occupancy.packed_accessor32<scalar_t,5,torch::RestrictPtrTraits>(),
pred_dist.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
gt_dist.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
grad_sigma.packed_accessor32<scalar_t,5,torch::RestrictPtrTraits>(),
// grad_sigma_count.packed_accessor32<scalar_t,5,torch::RestrictPtrTraits>(),
loss_type);
}));
cudaDeviceSynchronize();
// grad_sigma_count += (grad_sigma_count == 0);
// grad_sigma /= grad_sigma_count;
return {pred_dist, gt_dist, grad_sigma};
}
/*
* input shape
* origin : N x T x 3
* points : N x M x 3
* tindex : N x M
* output shape
* occupancy: N x T x H x L x W
*/
torch::Tensor init_cuda(
torch::Tensor points,
torch::Tensor tindex,
const std::vector<int> grid) {
const auto N = points.size(0); // batch size
const auto M = points.size(1); // num of rays
const auto T = grid[0];
const auto H = grid[1];
const auto L = grid[2];
const auto W = grid[3];
const auto dtype = points.dtype();
const auto device = points.device();
const auto options = torch::TensorOptions().dtype(dtype).device(device).requires_grad(false);
auto occupancy = torch::zeros({N, T, H, L, W}, options);
const int threads = 1024;
const dim3 blocks((M + threads - 1) / threads, N);
// initialize occupancy such that every voxel with one or more points is occupied
AT_DISPATCH_FLOATING_TYPES(points.type(), "init_cuda", ([&] {
init_cuda_kernel<scalar_t><<<blocks, threads>>>(
points.packed_accessor32<scalar_t,3,torch::RestrictPtrTraits>(),
tindex.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
occupancy.packed_accessor32<scalar_t,5,torch::RestrictPtrTraits>());
}));
// synchronize
cudaDeviceSynchronize();
return occupancy;
}
================================================
FILE: loaders/__init__.py
================================================
from .pipelines import __all__
from .nuscenes_dataset import CustomNuScenesDataset
from .nuscenes_occ_dataset import NuSceneOcc
__all__ = [
'CustomNuScenesDataset', 'NuSceneOcc'
]
================================================
FILE: loaders/builder.py
================================================
from functools import partial
from mmcv.parallel import collate
from mmcv.runner import get_dist_info
from torch.utils.data import DataLoader
from mmdet.datasets.builder import worker_init_fn
from mmdet.datasets.samplers import DistributedGroupSampler, DistributedSampler, GroupSampler
def build_dataloader(dataset,
samples_per_gpu,
workers_per_gpu,
num_gpus=1,
dist=True,
shuffle=True,
seed=None,
**kwargs):
rank, world_size = get_dist_info()
if dist:
# DistributedGroupSampler will definitely shuffle the data to satisfy
# that images on each GPU are in the same group
if shuffle:
sampler = DistributedGroupSampler(
dataset, samples_per_gpu, world_size, rank, seed=seed)
else:
sampler = DistributedSampler(
dataset, world_size, rank, shuffle=False, seed=seed)
batch_size = samples_per_gpu
num_workers = workers_per_gpu
else:
sampler = GroupSampler(dataset, samples_per_gpu) if shuffle else None
batch_size = num_gpus * samples_per_gpu
num_workers = num_gpus * workers_per_gpu
init_fn = partial(
worker_init_fn, num_workers=num_workers, rank=rank,
seed=seed) if seed is not None else None
data_loader = DataLoader(
dataset,
batch_size=batch_size,
sampler=sampler,
num_workers=num_workers,
collate_fn=partial(collate, samples_per_gpu=samples_per_gpu),
pin_memory=False,
worker_init_fn=init_fn,
**kwargs)
return data_loader
================================================
FILE: loaders/ego_pose_dataset.py
================================================
import torch
import numpy as np
from pyquaternion import Quaternion
from torch.utils.data import Dataset
np.set_printoptions(precision=3, suppress=True)
def trans_matrix(T, R):
tm = np.eye(4)
tm[:3, :3] = R.rotation_matrix
tm[:3, 3] = T
return tm
# A helper dataset for RayIoU. It is NOT used during training.
class EgoPoseDataset(Dataset):
def __init__(self, data_infos):
super(EgoPoseDataset, self).__init__()
self.data_infos = data_infos
self.scene_frames = {}
for info in data_infos:
scene_name = info['scene_name']
if scene_name not in self.scene_frames:
self.scene_frames[scene_name] = []
self.scene_frames[scene_name].append(info)
def __len__(self):
return len(self.data_infos)
def get_ego_from_lidar(self, info):
ego_from_lidar = trans_matrix(
np.array(info['lidar2ego_translation']),
Quaternion(info['lidar2ego_rotation']))
return ego_from_lidar
def get_global_pose(self, info, inverse=False):
global_from_ego = trans_matrix(
np.array(info['ego2global_translation']),
Quaternion(info['ego2global_rotation']))
ego_from_lidar = trans_matrix(
np.array(info['lidar2ego_translation']),
Quaternion(info['lidar2ego_rotation']))
pose = global_from_ego.dot(ego_from_lidar)
if inverse:
pose = np.linalg.inv(pose)
return pose
def __getitem__(self, idx):
info = self.data_infos[idx]
ref_sample_token = info['token']
ref_lidar_from_global = self.get_global_pose(info, inverse=True)
ref_ego_from_lidar = self.get_ego_from_lidar(info)
scene_frame = self.scene_frames[info['scene_name']]
ref_index = scene_frame.index(info)
# NOTE: getting output frames
output_origin_list = []
for curr_index in range(len(scene_frame)):
# if this exists a valid target
if curr_index == ref_index:
origin_tf = np.array([0.0, 0.0, 0.0], dtype=np.float32)
else:
# transform from the current lidar frame to global and then to the reference lidar frame
global_from_curr = self.get_global_pose(scene_frame[curr_index], inverse=False)
ref_from_curr = ref_lidar_from_global.dot(global_from_curr)
origin_tf = np.array(ref_from_curr[:3, 3], dtype=np.float32)
origin_tf_pad = np.ones([4])
origin_tf_pad[:3] = origin_tf # pad to [4]
origin_tf = np.dot(ref_ego_from_lidar[:3], origin_tf_pad.T).T # [3]
# origin
if np.abs(origin_tf[0]) < 39 and np.abs(origin_tf[1]) < 39:
output_origin_list.append(origin_tf)
# select 8 origins
if len(output_origin_list) > 8:
select_idx = np.round(np.linspace(0, len(output_origin_list) - 1, 8)).astype(np.int64)
output_origin_list = [output_origin_list[i] for i in select_idx]
output_origin_tensor = torch.from_numpy(np.stack(output_origin_list)) # [T, 3]
return (ref_sample_token, output_origin_tensor)
================================================
FILE: loaders/nuscenes_dataset.py
================================================
import os
import numpy as np
from mmdet.datasets import DATASETS
from mmdet3d.datasets import NuScenesDataset
from pyquaternion import Quaternion
@DATASETS.register_module()
class CustomNuScenesDataset(NuScenesDataset):
def collect_sweeps(self, index, into_past=60, into_future=0):
all_sweeps_prev = []
curr_index = index
while len(all_sweeps_prev) < into_past:
curr_sweeps = self.data_infos[curr_index]['sweeps']
if len(curr_sweeps) == 0:
break
all_sweeps_prev.extend(curr_sweeps)
all_sweeps_prev.append(self.data_infos[curr_index - 1]['cams'])
curr_index = curr_index - 1
all_sweeps_next = []
curr_index = index + 1
while len(all_sweeps_next) < into_future:
if curr_index >= len(self.data_infos):
break
curr_sweeps = self.data_infos[curr_index]['sweeps']
all_sweeps_next.extend(curr_sweeps[::-1])
all_sweeps_next.append(self.data_infos[curr_index]['cams'])
curr_index = curr_index + 1
return all_sweeps_prev, all_sweeps_next
def get_data_info(self, index):
info = self.data_infos[index]
sweeps_prev, sweeps_next = self.collect_sweeps(index)
ego2global_translation = info['ego2global_translation']
ego2global_rotation = info['ego2global_rotation']
lidar2ego_translation = info['lidar2ego_translation']
lidar2ego_rotation = info['lidar2ego_rotation']
ego2global_rotation = Quaternion(ego2global_rotation).rotation_matrix
lidar2ego_rotation = Quaternion(lidar2ego_rotation).rotation_matrix
input_dict = dict(
sample_idx=info['token'],
sweeps={'prev': sweeps_prev, 'next': sweeps_next},
timestamp=info['timestamp'] / 1e6,
ego2global_translation=ego2global_translation,
ego2global_rotation=ego2global_rotation,
lidar2ego_translation=lidar2ego_translation,
lidar2ego_rotation=lidar2ego_rotation,
)
if self.modality['use_camera']:
img_paths = []
img_timestamps = []
lidar2img_rts = []
for _, cam_info in info['cams'].items():
img_paths.append(os.path.relpath(cam_info['data_path']))
img_timestamps.append(cam_info['timestamp'] / 1e6)
# obtain lidar to image transformation matrix
lidar2cam_r = np.linalg.inv(cam_info['sensor2lidar_rotation'])
lidar2cam_t = cam_info['sensor2lidar_translation'] @ lidar2cam_r.T
lidar2cam_rt = np.eye(4)
lidar2cam_rt[:3, :3] = lidar2cam_r.T
lidar2cam_rt[3, :3] = -lidar2cam_t
intrinsic = cam_info['cam_intrinsic']
viewpad = np.eye(4)
viewpad[:intrinsic.shape[0], :intrinsic.shape[1]] = intrinsic
lidar2img_rt = (viewpad @ lidar2cam_rt.T)
lidar2img_rts.append(lidar2img_rt)
input_dict.update(dict(
img_filename=img_paths,
img_timestamp=img_timestamps,
lidar2img=lidar2img_rts,
))
if not self.test_mode:
annos = self.get_ann_info(index)
input_dict['ann_info'] = annos
return input_dict
================================================
FILE: loaders/nuscenes_occ_dataset.py
================================================
import os
import mmcv
import glob
import torch
import numpy as np
from tqdm import tqdm
from mmdet.datasets import DATASETS
from mmdet3d.datasets import NuScenesDataset
from nuscenes.eval.common.utils import Quaternion
from nuscenes.utils.geometry_utils import transform_matrix
from torch.utils.data import DataLoader
from models.utils import sparse2dense
from .ray_metrics import main_rayiou, main_raypq
from .ego_pose_dataset import EgoPoseDataset
from configs.r50_nuimg_704x256_8f import occ_class_names as occ3d_class_names
from configs.r50_nuimg_704x256_8f_openocc import occ_class_names as openocc_class_names
@DATASETS.register_module()
class NuSceneOcc(NuScenesDataset):
def __init__(self, occ_gt_root, *args, **kwargs):
super().__init__(filter_empty_gt=False, *args, **kwargs)
self.occ_gt_root = occ_gt_root
self.data_infos = self.load_annotations(self.ann_file)
self.token2scene = {}
for gt_path in glob.glob(os.path.join(self.occ_gt_root, '*/*/*.npz')):
token = gt_path.split('/')[-2]
scene_name = gt_path.split('/')[-3]
self.token2scene[token] = scene_name
for i in range(len(self.data_infos)):
scene_name = self.token2scene[self.data_infos[i]['token']]
self.data_infos[i]['scene_name'] = scene_name
def collect_sweeps(self, index, into_past=150, into_future=0):
all_sweeps_prev = []
curr_index = index
while len(all_sweeps_prev) < into_past:
curr_sweeps = self.data_infos[curr_index]['sweeps']
if len(curr_sweeps) == 0:
break
all_sweeps_prev.extend(curr_sweeps)
all_sweeps_prev.append(self.data_infos[curr_index - 1]['cams'])
curr_index = curr_index - 1
all_sweeps_next = []
curr_index = index + 1
while len(all_sweeps_next) < into_future:
if curr_index >= len(self.data_infos):
break
curr_sweeps = self.data_infos[curr_index]['sweeps']
all_sweeps_next.extend(curr_sweeps[::-1])
all_sweeps_next.append(self.data_infos[curr_index]['cams'])
curr_index = curr_index + 1
return all_sweeps_prev, all_sweeps_next
def get_data_info(self, index):
info = self.data_infos[index]
sweeps_prev, sweeps_next = self.collect_sweeps(index)
ego2global_translation = info['ego2global_translation']
ego2global_rotation = info['ego2global_rotation']
lidar2ego_translation = info['lidar2ego_translation']
lidar2ego_rotation = info['lidar2ego_rotation']
ego2global_rotation_mat = Quaternion(ego2global_rotation).rotation_matrix
lidar2ego_rotation_mat = Quaternion(lidar2ego_rotation).rotation_matrix
input_dict = dict(
sample_idx=info['token'],
sweeps={'prev': sweeps_prev, 'next': sweeps_next},
timestamp=info['timestamp'] / 1e6,
ego2global_translation=ego2global_translation,
ego2global_rotation=ego2global_rotation_mat,
lidar2ego_translation=lidar2ego_translation,
lidar2ego_rotation=lidar2ego_rotation_mat,
)
ego2lidar = transform_matrix(lidar2ego_translation, Quaternion(lidar2ego_rotation), inverse=True)
input_dict['ego2lidar'] = [ego2lidar for _ in range(6)]
input_dict['occ_path'] = os.path.join(self.occ_gt_root, info['scene_name'], info['token'], 'labels.npz')
if self.modality['use_camera']:
img_paths = []
img_timestamps = []
lidar2img_rts = []
for _, cam_info in info['cams'].items():
img_paths.append(os.path.relpath(cam_info['data_path']))
img_timestamps.append(cam_info['timestamp'] / 1e6)
# obtain lidar to image transformation matrix
lidar2cam_r = np.linalg.inv(cam_info['sensor2lidar_rotation'])
lidar2cam_t = cam_info['sensor2lidar_translation'] @ lidar2cam_r.T
lidar2cam_rt = np.eye(4)
lidar2cam_rt[:3, :3] = lidar2cam_r.T
lidar2cam_rt[3, :3] = -lidar2cam_t
intrinsic = cam_info['cam_intrinsic']
viewpad = np.eye(4)
viewpad[:intrinsic.shape[0], :intrinsic.shape[1]] = intrinsic
lidar2img_rt = (viewpad @ lidar2cam_rt.T)
lidar2img_rts.append(lidar2img_rt)
input_dict.update(dict(
img_filename=img_paths,
img_timestamp=img_timestamps,
lidar2img=lidar2img_rts,
))
if not self.test_mode:
annos = self.get_ann_info(index)
input_dict['ann_info'] = annos
return input_dict
def evaluate(self, occ_results, runner=None, show_dir=None, **eval_kwargs):
occ_gts, occ_preds, inst_gts, inst_preds, lidar_origins = [], [], [], [], []
print('\nStarting Evaluation...')
sample_tokens = [info['token'] for info in self.data_infos]
for batch in DataLoader(EgoPoseDataset(self.data_infos), num_workers=8):
token = batch[0][0]
output_origin = batch[1]
data_id = sample_tokens.index(token)
info = self.data_infos[data_id]
occ_path = os.path.join(self.occ_gt_root, info['scene_name'], info['token'], 'labels.npz')
occ_gt = np.load(occ_path, allow_pickle=True)
gt_semantics = occ_gt['semantics']
occ_pred = occ_results[data_id]
sem_pred = torch.from_numpy(occ_pred['sem_pred']) # [B, N]
occ_loc = torch.from_numpy(occ_pred['occ_loc'].astype(np.int64)) # [B, N, 3]
data_type = self.occ_gt_root.split('/')[-1]
if data_type == 'occ3d' or data_type == 'occ3d_panoptic':
occ_class_names = occ3d_class_names
elif data_type == 'openocc_v2':
occ_class_names = openocc_class_names
else:
raise ValueError
free_id = len(occ_class_names) - 1
occ_size = list(gt_semantics.shape)
sem_pred, _ = sparse2dense(occ_loc, sem_pred, dense_shape=occ_size, empty_value=free_id)
sem_pred = sem_pred.squeeze(0).numpy()
if 'pano_inst' in occ_pred.keys():
pano_inst = torch.from_numpy(occ_pred['pano_inst'])
pano_sem = torch.from_numpy(occ_pred['pano_sem'])
pano_inst, _ = sparse2dense(occ_loc, pano_inst, dense_shape=occ_size, empty_value=0)
pano_sem, _ = sparse2dense(occ_loc, pano_sem, dense_shape=occ_size, empty_value=free_id)
pano_inst = pano_inst.squeeze(0).numpy()
pano_sem = pano_sem.squeeze(0).numpy()
sem_pred = pano_sem
gt_instances = occ_gt['instances']
inst_gts.append(gt_instances)
inst_preds.append(pano_inst)
lidar_origins.append(output_origin)
occ_gts.append(gt_semantics)
occ_preds.append(sem_pred)
if len(inst_preds) > 0:
results = main_raypq(occ_preds, occ_gts, inst_preds, inst_gts, lidar_origins, occ_class_names=occ_class_names)
results.update(main_rayiou(occ_preds, occ_gts, lidar_origins, occ_class_names=occ_class_names))
return results
else:
return main_rayiou(occ_preds, occ_gts, lidar_origins, occ_class_names=occ_class_names)
def format_results(self, occ_results, submission_prefix, **kwargs):
if submission_prefix is not None:
mmcv.mkdir_or_exist(submission_prefix)
for index, occ_pred in enumerate(tqdm(occ_results)):
info = self.data_infos[index]
sample_token = info['token']
save_path = os.path.join(submission_prefix, '{}.npz'.format(sample_token))
np.savez_compressed(save_path, occ_pred.astype(np.uint8))
print('\nFinished.')
================================================
FILE: loaders/old_metrics.py
================================================
import os
import numpy as np
from sklearn.neighbors import KDTree
from termcolor import colored
from functools import reduce
from typing import Iterable
np.seterr(divide='ignore', invalid='ignore')
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
def pcolor(string, color, on_color=None, attrs=None):
"""
Produces a colored string for printing
Parameters
----------
string : str
String that will be colored
color : str
Color to use
on_color : str
Background color to use
attrs : list of str
Different attributes for the string
Returns
-------
string: str
Colored string
"""
return colored(string, color, on_color, attrs)
def getCellCoordinates(points, voxelSize):
return (points / voxelSize).astype(np.int)
def getNumUniqueCells(cells):
M = cells.max() + 1
return np.unique(cells[:, 0] + M * cells[:, 1] + M ** 2 * cells[:, 2]).shape[0]
class Metric_mIoU():
def __init__(self,
save_dir='.',
num_classes=18,
use_lidar_mask=False,
use_image_mask=False,
):
if num_classes == 18:
self.class_names = [
'others','barrier', 'bicycle', 'bus', 'car', 'construction_vehicle',
'motorcycle', 'pedestrian', 'traffic_cone', 'trailer', 'truck',
'driveable_surface', 'other_flat', 'sidewalk',
'terrain', 'manmade', 'vegetation','free'
]
elif num_classes == 2:
self.class_names = ['non-free', 'free']
self.save_dir = save_dir
self.use_lidar_mask = use_lidar_mask
self.use_image_mask = use_image_mask
self.num_classes = num_classes
self.point_cloud_range = [-40.0, -40.0, -1.0, 40.0, 40.0, 5.4]
self.occupancy_size = [0.4, 0.4, 0.4]
self.voxel_size = 0.4
self.occ_xdim = int((self.point_cloud_range[3] - self.point_cloud_range[0]) / self.occupancy_size[0])
self.occ_ydim = int((self.point_cloud_range[4] - self.point_cloud_range[1]) / self.occupancy_size[1])
self.occ_zdim = int((self.point_cloud_range[5] - self.point_cloud_range[2]) / self.occupancy_size[2])
self.voxel_num = self.occ_xdim * self.occ_ydim * self.occ_zdim
self.hist = np.zeros((self.num_classes, self.num_classes))
self.cnt = 0
def hist_info(self, n_cl, pred, gt):
"""
build confusion matrix
# empty classes:0
non-empty class: 0-16
free voxel class: 17
Args:
n_cl (int): num_classes_occupancy
pred (1-d array): pred_occupancy_label
gt (1-d array): gt_occupancu_label
Returns:
tuple:(hist, correctly number_predicted_labels, num_labelled_sample)
"""
assert pred.shape == gt.shape
k = (gt >= 0) & (gt < n_cl) # exclude 255
labeled = np.sum(k)
correct = np.sum((pred[k] == gt[k]))
return (
np.bincount(
n_cl * gt[k].astype(int) + pred[k].astype(int), minlength=n_cl ** 2
).reshape(n_cl, n_cl),
correct,
labeled,
)
def per_class_iu(self, hist):
#return np.diag(hist) / (hist.sum(1) + hist.sum(0) - np.diag(hist))
result = np.diag(hist) / (hist.sum(1) + hist.sum(0) - np.diag(hist))
result[hist.sum(1) == 0] = float('nan')
return result
def compute_mIoU(self, pred, label, n_classes):
hist = np.zeros((n_classes, n_classes))
new_hist, correct, labeled = self.hist_info(n_classes, pred.flatten(), label.flatten())
hist += new_hist
mIoUs = self.per_class_iu(hist)
# for ind_class in range(n_classes):
# print(str(round(mIoUs[ind_class] * 100, 2)))
# print('===> mIoU: ' + str(round(np.nanmean(mIoUs) * 100, 2)))
return round(np.nanmean(mIoUs) * 100, 2), hist
def add_batch(self,semantics_pred,semantics_gt,mask_lidar,mask_camera):
self.cnt += 1
if self.use_image_mask:
masked_semantics_gt = semantics_gt[mask_camera]
masked_semantics_pred = semantics_pred[mask_camera]
elif self.use_lidar_mask:
masked_semantics_gt = semantics_gt[mask_lidar]
masked_semantics_pred = semantics_pred[mask_lidar]
else:
masked_semantics_gt = semantics_gt
masked_semantics_pred = semantics_pred
if self.num_classes == 2:
masked_semantics_pred = np.copy(masked_semantics_pred)
masked_semantics_gt = np.copy(masked_semantics_gt)
masked_semantics_pred[masked_semantics_pred < 17] = 0
masked_semantics_pred[masked_semantics_pred == 17] = 1
masked_semantics_gt[masked_semantics_gt < 17] = 0
masked_semantics_gt[masked_semantics_gt == 17] = 1
_, _hist = self.compute_mIoU(masked_semantics_pred, masked_semantics_gt, self.num_classes)
self.hist += _hist
def count_miou(self):
mIoU = self.per_class_iu(self.hist)
# assert cnt == num_samples, 'some samples are not included in the miou calculation'
print(f'===> per class IoU of {self.cnt} samples:')
for ind_class in range(self.num_classes-1):
print(f'===> {self.class_names[ind_class]} - IoU = ' + str(round(mIoU[ind_class] * 100, 2)))
print(f'===> mIoU of {self.cnt} samples: ' + str(round(np.nanmean(mIoU[:self.num_classes-1]) * 100, 2)))
# print(f'===> sample-wise averaged mIoU of {cnt} samples: ' + str(round(np.nanmean(mIoU_avg), 2)))
return round(np.nanmean(mIoU[:self.num_classes-1]) * 100, 2)
class Metric_FScore():
def __init__(self,
leaf_size=10,
threshold_acc=0.6,
threshold_complete=0.6,
voxel_size=[0.4, 0.4, 0.4],
range=[-40, -40, -1, 40, 40, 5.4],
void=[17, 255],
use_lidar_mask=False,
use_image_mask=False, ) -> None:
self.leaf_size = leaf_size
self.threshold_acc = threshold_acc
self.threshold_complete = threshold_complete
self.voxel_size = voxel_size
self.range = range
self.void = void
self.use_lidar_mask = use_lidar_mask
self.use_image_mask = use_image_mask
self.cnt=0
self.tot_acc = 0.
self.tot_cmpl = 0.
self.tot_f1_mean = 0.
self.eps = 1e-8
def voxel2points(self, voxel):
# occIdx = torch.where(torch.logical_and(voxel != FREE, voxel != NOT_OBSERVED))
# if isinstance(voxel, np.ndarray): voxel = torch.from_numpy(voxel)
mask = np.logical_not(reduce(np.logical_or, [voxel == self.void[i] for i in range(len(self.void))]))
occIdx = np.where(mask)
points = np.concatenate((occIdx[0][:, None] * self.voxel_size[0] + self.voxel_size[0] / 2 + self.range[0], \
occIdx[1][:, None] * self.voxel_size[1] + self.voxel_size[1] / 2 + self.range[1], \
occIdx[2][:, None] * self.voxel_size[2] + self.voxel_size[2] / 2 + self.range[2]),
axis=1)
return points
def add_batch(self,semantics_pred,semantics_gt,mask_lidar,mask_camera ):
# for scene_token in tqdm(preds_dict.keys()):
self.cnt += 1
if self.use_image_mask:
semantics_gt[mask_camera == False] = 255
semantics_pred[mask_camera == False] = 255
elif self.use_lidar_mask:
semantics_gt[mask_lidar == False] = 255
semantics_pred[mask_lidar == False] = 255
else:
pass
ground_truth = self.voxel2points(semantics_gt)
prediction = self.voxel2points(semantics_pred)
if prediction.shape[0] == 0:
accuracy=0
completeness=0
fmean=0
else:
prediction_tree = KDTree(prediction, leaf_size=self.leaf_size)
ground_truth_tree = KDTree(ground_truth, leaf_size=self.leaf_size)
complete_distance, _ = prediction_tree.query(ground_truth)
complete_distance = complete_distance.flatten()
accuracy_distance, _ = ground_truth_tree.query(prediction)
accuracy_distance = accuracy_distance.flatten()
# evaluate completeness
complete_mask = complete_distance < self.threshold_complete
completeness = complete_mask.mean()
# evalute accuracy
accuracy_mask = accuracy_distance < self.threshold_acc
accuracy = accuracy_mask.mean()
fmean = 2.0 / (1 / (accuracy+self.eps) + 1 / (completeness+self.eps))
self.tot_acc += accuracy
self.tot_cmpl += completeness
self.tot_f1_mean += fmean
def count_fscore(self,):
base_color, attrs = 'red', ['bold', 'dark']
print(pcolor('\n######## F score: {} #######'.format(self.tot_f1_mean / self.cnt), base_color, attrs=attrs))
return self.tot_f1_mean / self.cnt
class Metric_mRecall():
def __init__(self,
save_dir='.',
num_classes=18,
pred_classes=2,
use_lidar_mask=False,
use_image_mask=False,
):
if num_classes == 18:
self.class_names = [
'others','barrier', 'bicycle', 'bus', 'car', 'construction_vehicle',
'motorcycle', 'pedestrian', 'traffic_cone', 'trailer', 'truck',
'driveable_surface', 'other_flat', 'sidewalk',
'terrain', 'manmade', 'vegetation','free'
]
elif num_classes == 2:
self.class_names = ['non-free', 'free']
self.pred_classes = pred_classes
self.save_dir = save_dir
self.use_lidar_mask = use_lidar_mask
self.use_image_mask = use_image_mask
self.num_classes = num_classes
self.point_cloud_range = [-40.0, -40.0, -1.0, 40.0, 40.0, 5.4]
self.occupancy_size = [0.4, 0.4, 0.4]
self.voxel_size = 0.4
self.occ_xdim = int((self.point_cloud_range[3] - self.point_cloud_range[0]) / self.occupancy_size[0])
self.occ_ydim = int((self.point_cloud_range[4] - self.point_cloud_range[1]) / self.occupancy_size[1])
self.occ_zdim = int((self.point_cloud_range[5] - self.point_cloud_range[2]) / self.occupancy_size[2])
self.voxel_num = self.occ_xdim * self.occ_ydim * self.occ_zdim
self.hist = np.zeros((self.num_classes, self.pred_classes)) # n_cl, p_cl
self.cnt = 0
def hist_info(self, n_cl, p_cl, pred, gt):
"""
build confusion matrix
# empty classes:0
non-empty class: 0-16
free voxel class: 17
Args:
n_cl (int): num_classes_occupancy
pred (1-d array): pred_occupancy_label
gt (1-d array): gt_occupancu_label
Returns:
tuple:(hist, correctly number_predicted_labels, num_labelled_sample)
"""
assert pred.shape == gt.shape
k = (gt >= 0) & (gt < n_cl) # exclude 255
labeled = np.sum(k)
correct = np.sum((pred[k] == gt[k]))
return (
np.bincount(
p_cl * gt[k].astype(int) + pred[k].astype(int), minlength=n_cl * p_cl
).reshape(n_cl, p_cl), # 18, 2
correct,
labeled,
)
def per_class_recall(self, hist):
return hist[:, 1] / hist.sum(1) ## recall
def compute_mRecall(self, pred, label, n_classes, p_classes):
hist = np.zeros((n_classes, p_classes))
new_hist, correct, labeled = self.hist_info(n_classes, p_classes, pred.flatten(), label.flatten())
hist += new_hist
mRecalls = self.per_class_recall(hist)
# for ind_class in range(n_classes):
# print(str(round(mIoUs[ind_class] * 100, 2)))
# print('===> mIoU: ' + str(round(np.nanmean(mIoUs) * 100, 2)))
return round(np.nanmean(mRecalls) * 100, 2), hist
def add_batch(self,semantics_pred,semantics_gt,mask_lidar,mask_camera):
self.cnt += 1
if self.use_image_mask:
masked_semantics_gt = semantics_gt[mask_camera]
masked_semantics_pred = semantics_pred[mask_camera]
elif self.use_lidar_mask:
masked_semantics_gt = semantics_gt[mask_lidar]
masked_semantics_pred = semantics_pred[mask_lidar]
else:
masked_semantics_gt = semantics_gt
masked_semantics_pred = semantics_pred
if self.pred_classes == 2:
masked_semantics_pred = np.copy(masked_semantics_pred)
masked_semantics_gt = np.copy(masked_semantics_gt)
masked_semantics_pred[masked_semantics_pred < 17] = 1
masked_semantics_pred[masked_semantics_pred == 17] = 0 # 0 is free
_, _hist = self.compute_mRecall(masked_semantics_pred, masked_semantics_gt, self.num_classes, self.pred_classes)
self.hist += _hist
def count_mrecall(self):
mRecall = self.per_class_recall(self.hist)
# assert cnt == num_samples, 'some samples are not included in the miou calculation'
print(f'===> per class Recall of {self.cnt} samples:')
for ind_class in range(self.num_classes-1):
print(f'===> {self.class_names[ind_class]} - Recall = ' + str(round(mRecall[ind_class] * 100, 2)))
print(f'===> mRecall of {self.cnt} samples: ' + str(round(np.nanmean(mRecall[:self.num_classes-1]) * 100, 2)))
return round(np.nanmean(mRecall[:self.num_classes-1]) * 100, 2)
# modified from https://github.com/open-mmlab/mmdetection3d/blob/main/mmdet3d/evaluation/functional/panoptic_seg_eval.py#L10
class Metric_Panoptic():
def __init__(self,
save_dir='.',
num_classes=18,
use_lidar_mask=False,
use_image_mask=False,
ignore_index: Iterable[int]=[],
):
"""
Args:
ignore_index (llist): Class ids that not be considered in pq counting.
"""
if num_classes == 18:
self.class_names = [
'others','barrier', 'bicycle', 'bus', 'car', 'construction_vehicle',
'motorcycle', 'pedestrian', 'traffic_cone', 'trailer', 'truck',
'driveable_surface', 'other_flat', 'sidewalk',
'terrain', 'manmade', 'vegetation','free'
]
else:
raise ValueError
self.save_dir = save_dir
self.num_classes = num_classes
self.use_lidar_mask = use_lidar_mask
self.use_image_mask = use_image_mask
self.ignore_index = ignore_index
self.id_offset = 2 ** 16
self.eps = 1e-5
self.min_num_points = 20
self.include = np.array(
[n for n in range(self.num_classes - 1) if n not in self.ignore_index],
dtype=int)
self.cnt = 0
# panoptic stuff
self.pan_tp = np.zeros(self.num_classes, dtype=int)
self.pan_iou = np.zeros(self.num_classes, dtype=np.double)
self.pan_fp = np.zeros(self.num_classes, dtype=int)
self.pan_fn = np.zeros(self.num_classes, dtype=int)
def add_batch(self,semantics_pred,semantics_gt,instances_pred,instances_gt,mask_lidar,mask_camera):
self.cnt += 1
if self.use_image_mask:
masked_semantics_gt = semantics_gt[mask_camera]
masked_semantics_pred = semantics_pred[mask_camera]
masked_instances_gt = instances_gt[mask_camera]
masked_instances_pred = instances_pred[mask_camera]
elif self.use_lidar_mask:
masked_semantics_gt = semantics_gt[mask_lidar]
masked_semantics_pred = semantics_pred[mask_lidar]
masked_instances_gt = instances_gt[mask_lidar]
masked_instances_pred = instances_pred[mask_lidar]
else:
masked_semantics_gt = semantics_gt
masked_semantics_pred = semantics_pred
masked_instances_gt = instances_gt
masked_instances_pred = instances_pred
self.add_panoptic_sample(masked_semantics_pred, masked_semantics_gt, masked_instances_pred, masked_instances_gt)
def add_panoptic_sample(self, semantics_pred, semantics_gt, instances_pred, instances_gt):
"""Add one sample of panoptic predictions and ground truths for
evaluation.
Args:
semantics_pred (np.ndarray): Semantic predictions.
semantics_gt (np.ndarray): Semantic ground truths.
instances_pred (np.ndarray): Instance predictions.
instances_gt (np.ndarray): Instance ground truths.
"""
# get instance_class_id from instance_gt
instance_class_ids = [self.num_classes - 1]
for i in range(1, instances_gt.max() + 1):
class_id = np.unique(semantics_gt[instances_gt == i])
# assert class_id.shape[0] == 1, "each instance must belong to only one class"
if class_id.shape[0] == 1:
instance_class_ids.append(class_id[0])
else:
instance_class_ids.append(self.num_classes - 1)
instance_class_ids = np.array(instance_class_ids)
instance_count = 1
final_instance_class_ids = []
final_instances = np.zeros_like(instances_gt) # empty space has instance id "0"
for class_id in range(self.num_classes - 1):
if np.sum(semantics_gt == class_id) == 0:
continue
if self.class_names[class_id] in ['car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'motorcycle', 'bicycle', 'pedestrian']:
# treat as instances
for instance_id in range(len(instance_class_ids)):
if instance_class_ids[instance_id] != class_id:
continue
final_instances[instances_gt == instance_id] = instance_count
instance_count += 1
final_instance_class_ids.append(class_id)
else:
# treat as semantics
final_instances[semantics_gt == class_id] = instance_count
instance_count += 1
final_instance_class_ids.append(class_id)
instances_gt = final_instances
# avoid zero (ignored label)
instances_pred = instances_pred + 1
instances_gt = instances_gt + 1
for cl in self.ignore_index:
# make a mask for this class
gt_not_in_excl_mask = semantics_gt != cl
# remove all other points
semantics_pred = semantics_pred[gt_not_in_excl_mask]
semantics_gt = semantics_gt[gt_not_in_excl_mask]
instances_pred = instances_pred[gt_not_in_excl_mask]
instances_gt = instances_gt[gt_not_in_excl_mask]
# for each class (except the ignored ones)
for cl in self.include:
# get a class mask
pred_inst_in_cl_mask = semantics_pred == cl
gt_inst_in_cl_mask = semantics_gt == cl
# get instance points in class (makes outside stuff 0)
pred_inst_in_cl = instances_pred * pred_inst_in_cl_mask.astype(int)
gt_inst_in_cl = instances_gt * gt_inst_in_cl_mask.astype(int)
# generate the areas for each unique instance prediction
unique_pred, counts_pred = np.unique(
pred_inst_in_cl[pred_inst_in_cl > 0], return_counts=True)
id2idx_pred = {id: idx for idx, id in enumerate(unique_pred)}
matched_pred = np.array([False] * unique_pred.shape[0])
# generate the areas for each unique instance gt_np
unique_gt, counts_gt = np.unique(
gt_inst_in_cl[gt_inst_in_cl > 0], return_counts=True)
id2idx_gt = {id: idx for idx, id in enumerate(unique_gt)}
matched_gt = np.array([False] * unique_gt.shape[0])
# generate intersection using offset
valid_combos = np.logical_and(pred_inst_in_cl > 0,
gt_inst_in_cl > 0)
id_offset_combo = pred_inst_in_cl[
valid_combos] + self.id_offset * gt_inst_in_cl[valid_combos]
unique_combo, counts_combo = np.unique(
id_offset_combo, return_counts=True)
# generate an intersection map
# count the intersections with over 0.5 IoU as TP
gt_labels = unique_combo // self.id_offset
pred_labels = unique_combo % self.id_offset
gt_areas = np.array([counts_gt[id2idx_gt[id]] for id in gt_labels])
pred_areas = np.array(
[counts_pred[id2idx_pred[id]] for id in pred_labels])
intersections = counts_combo
unions = gt_areas + pred_areas - intersections
ious = intersections.astype(float) / unions.astype(float)
tp_indexes = ious > 0.5
self.pan_tp[cl] += np.sum(tp_indexes)
self.pan_iou[cl] += np.sum(ious[tp_indexes])
matched_gt[[id2idx_gt[id] for id in gt_labels[tp_indexes]]] = True
matched_pred[[id2idx_pred[id]
for id in pred_labels[tp_indexes]]] = True
# count the FN
if len(counts_gt) > 0:
self.pan_fn[cl] += np.sum(
np.logical_and(counts_gt >= self.min_num_points,
~matched_gt))
# count the FP
if len(matched_pred) > 0:
self.pan_fp[cl] += np.sum(
np.logical_and(counts_pred >= self.min_num_points,
~matched_pred))
def count_pq(self, ):
sq_all = self.pan_iou.astype(np.double) / np.maximum(
self.pan_tp.astype(np.double), self.eps)
rq_all = self.pan_tp.astype(np.double) / np.maximum(
self.pan_tp.astype(np.double) + 0.5 * self.pan_fp.astype(np.double)
+ 0.5 * self.pan_fn.astype(np.double), self.eps)
pq_all = sq_all * rq_all
# mask classes not occurring in dataset
mask = (self.pan_tp + self.pan_fp + self.pan_fn) > 0
sq_all[~mask] = float('nan')
rq_all[~mask] = float('nan')
pq_all[~mask] = float('nan')
# then do the REAL mean (no ignored classes)
sq = round(np.nanmean(sq_all[self.include]) * 100, 2)
rq = round(np.nanmean(rq_all[self.include]) * 100, 2)
pq = round(np.nanmean(pq_all[self.include]) * 100, 2)
print(f'===> per class sq, rq, pq of {self.cnt} samples:')
for ind_class in self.include:
print(f'===> {self.class_names[ind_class]} -' + \
f' sq = {round(sq_all[ind_class] * 100, 2)},' + \
f' rq = {round(rq_all[ind_class] * 100, 2)},' + \
f' pq = {round(pq_all[ind_class] * 100, 2)}')
print(f'===> sq of {self.cnt} samples: ' + str(sq))
print(f'===> rq of {self.cnt} samples: ' + str(rq))
print(f'===> pq of {self.cnt} samples: ' + str(pq))
return (pq, sq, rq)
================================================
FILE: loaders/pipelines/__init__.py
================================================
from .loading import LoadMultiViewImageFromMultiSweeps, LoadOccGTFromFile
from .transforms import PadMultiViewImage, NormalizeMultiviewImage, PhotoMetricDistortionMultiViewImage
__all__ = [
'LoadMultiViewImageFromMultiSweeps', 'PadMultiViewImage', 'NormalizeMultiviewImage',
'PhotoMetricDistortionMultiViewImage', 'LoadOccGTFromFile'
]
================================================
FILE: loaders/pipelines/loading.py
================================================
import os
import mmcv
import torch
import numpy as np
from mmdet.datasets.builder import PIPELINES
from numpy.linalg import inv
from mmcv.runner import get_dist_info
from mmcv.parallel import DataContainer as DC
from mmdet.datasets.pipelines import to_tensor
from torchvision.transforms.functional import rotate
def compose_lidar2img(ego2global_translation_curr,
ego2global_rotation_curr,
lidar2ego_translation_curr,
lidar2ego_rotation_curr,
sensor2global_translation_past,
sensor2global_rotation_past,
cam_intrinsic_past):
R = sensor2global_rotation_past @ (inv(ego2global_rotation_curr).T @ inv(lidar2ego_rotation_curr).T)
T = sensor2global_translation_past @ (inv(ego2global_rotation_curr).T @ inv(lidar2ego_rotation_curr).T)
T -= ego2global_translation_curr @ (inv(ego2global_rotation_curr).T @ inv(lidar2ego_rotation_curr).T) + lidar2ego_translation_curr @ inv(lidar2ego_rotation_curr).T
lidar2cam_r = inv(R.T)
lidar2cam_t = T @ lidar2cam_r.T
lidar2cam_rt = np.eye(4)
lidar2cam_rt[:3, :3] = lidar2cam_r.T
lidar2cam_rt[3, :3] = -lidar2cam_t
viewpad = np.eye(4)
viewpad[:cam_intrinsic_past.shape[0], :cam_intrinsic_past.shape[1]] = cam_intrinsic_past
lidar2img = (viewpad @ lidar2cam_rt.T).astype(np.float32)
return lidar2img
@PIPELINES.register_module()
class LoadMultiViewImageFromMultiSweeps(object):
def __init__(self,
sweeps_num=5,
color_type='color',
test_mode=False):
self.sweeps_num = sweeps_num
self.color_type = color_type
self.test_mode = test_mode
self.train_interval = [4, 8]
self.test_interval = 6
try:
mmcv.use_backend('turbojpeg')
except ImportError:
mmcv.use_backend('cv2')
def load_offline(self, results):
cam_types = [
'CAM_FRONT', 'CAM_FRONT_RIGHT', 'CAM_FRONT_LEFT',
'CAM_BACK', 'CAM_BACK_LEFT', 'CAM_BACK_RIGHT'
]
if len(results['sweeps']['prev']) == 0:
for _ in range(self.sweeps_num):
for j in range(len(cam_types)):
results['img'].append(results['img'][j])
results['img_timestamp'].append(results['img_timestamp'][j])
results['filename'].append(results['filename'][j])
results['lidar2img'].append(np.copy(results['lidar2img'][j]))
if 'ego2lidar' in results:
results['ego2lidar'].append(results['ego2lidar'][0])
else:
if self.test_mode:
interval = self.test_interval
choices = [(k + 1) * interval - 1 for k in range(self.sweeps_num)]
elif len(results['sweeps']['prev']) <= self.sweeps_num:
pad_len = self.sweeps_num - len(results['sweeps']['prev'])
choices = list(range(len(results['sweeps']['prev']))) + [len(results['sweeps']['prev']) - 1] * pad_len
else:
max_interval = len(results['sweeps']['prev']) // self.sweeps_num
max_interval = min(max_interval, self.train_interval[1])
min_interval = min(max_interval, self.train_interval[0])
interval = np.random.randint(min_interval, max_interval + 1)
choices = [(k + 1) * interval - 1 for k in range(self.sweeps_num)]
for idx in sorted(list(choices)):
sweep_idx = min(idx, len(results['sweeps']['prev']) - 1)
sweep = results['sweeps']['prev'][sweep_idx]
if len(sweep.keys()) < len(cam_types):
sweep = results['sweeps']['prev'][sweep_idx - 1]
for sensor in cam_types:
results['img'].append(mmcv.imread(sweep[sensor]['data_path'], self.color_type))
results['img_timestamp'].append(sweep[sensor]['timestamp'] / 1e6)
results['filename'].append(os.path.relpath(sweep[sensor]['data_path']))
results['lidar2img'].append(compose_lidar2img(
results['ego2global_translation'],
results['ego2global_rotation'],
results['lidar2ego_translation'],
results['lidar2ego_rotation'],
sweep[sensor]['sensor2global_translation'],
sweep[sensor]['sensor2global_rotation'],
sweep[sensor]['cam_intrinsic'],
))
if 'ego2lidar' in results:
results['ego2lidar'].append(results['ego2lidar'][0])
return results
def load_online(self, results):
# only used when measuring FPS
assert self.test_mode
assert self.test_interval % 6 == 0
cam_types = [
'CAM_FRONT', 'CAM_FRONT_RIGHT', 'CAM_FRONT_LEFT',
'CAM_BACK', 'CAM_BACK_LEFT', 'CAM_BACK_RIGHT'
]
if len(results['sweeps']['prev']) == 0:
for _ in range(self.sweeps_num):
for j in range(len(cam_types)):
results['img_timestamp'].append(results['img_timestamp'][j])
results['filename'].append(results['filename'][j])
results['lidar2img'].append(np.copy(results['lidar2img'][j]))
if 'ego2lidar' in results:
results['ego2lidar'].append(results['ego2lidar'][0])
else:
interval = self.test_interval
choices = [(k + 1) * interval - 1 for k in range(self.sweeps_num)]
for idx in sorted(list(choices)):
sweep_idx = min(idx, len(results['sweeps']['prev']) - 1)
sweep = results['sweeps']['prev'][sweep_idx]
if len(sweep.keys()) < len(cam_types):
sweep = results['sweeps']['prev'][sweep_idx - 1]
for sensor in cam_types:
# skip loading history frames
results['img_timestamp'].append(sweep[sensor]['timestamp'] / 1e6)
results['filename'].append(os.path.relpath(sweep[sensor]['data_path']))
results['lidar2img'].append(compose_lidar2img(
results['ego2global_translation'],
results['ego2global_rotation'],
results['lidar2ego_translation'],
results['lidar2ego_rotation'],
sweep[sensor]['sensor2global_translation'],
sweep[sensor]['sensor2global_rotation'],
sweep[sensor]['cam_intrinsic'],
))
if 'ego2lidar' in results:
results['ego2lidar'].append(results['ego2lidar'][0])
return results
def __call__(self, results):
if self.sweeps_num == 0:
return results
world_size = get_dist_info()[1]
if world_size == 1 and self.test_mode:
return self.load_online(results)
else:
return self.load_offline(results)
@PIPELINES.register_module()
class LoadOccGTFromFile(object):
def __init__(self, num_classes=18, inst_class_ids=[]):
self.num_classes = num_classes
self.inst_class_ids = inst_class_ids
def __call__(self, results):
occ_labels = np.load(results['occ_path'])
semantics = occ_labels['semantics'] # [200, 200, 16]
# mask_lidar = occ_labels['mask_lidar'].astype(np.bool_) # [200, 200, 16]
# mask_camera = occ_labels['mask_camera'].astype(np.bool_) # [200, 200, 16]
# results['mask_lidar'] = mask_lidar
# results['mask_camera'] = mask_camera
# instance GT
if 'instances' in occ_labels.keys():
instances = occ_labels['instances']
instance_class_ids = [self.num_classes - 1] # the 0-th class is always free class
for i in range(1, instances.max() + 1):
class_id = np.unique(semantics[instances == i])
assert class_id.shape[0] == 1, "each instance must belong to only one class"
instance_class_ids.append(class_id[0])
instance_class_ids = np.array(instance_class_ids)
else:
instances = None
instance_class_ids = None
instance_count = 0
final_instance_class_ids = []
final_instances = np.ones_like(semantics) * 255 # empty space has instance id "255"
for class_id in range(self.num_classes - 1):
if np.sum(semantics == class_id) == 0:
continue
if class_id in self.inst_class_ids:
assert instances is not None, 'instance annotation not found'
# treat as instances
for instance_id in range(len(instance_class_ids)):
if instance_class_ids[instance_id] != class_id:
continue
final_instances[instances == instance_id] = instance_count
instance_count += 1
final_instance_class_ids.append(class_id)
else:
# treat as semantics
final_instances[semantics == class_id] = instance_count
instance_count += 1
final_instance_class_ids.append(class_id)
results['voxel_semantics'] = semantics
results['voxel_instances'] = final_instances
results['instance_class_ids'] = DC(to_tensor(final_instance_class_ids))
if results.get('rotate_bda', False):
semantics = torch.from_numpy(semantics).permute(2, 0, 1) # [16, 200, 200]
semantics = rotate(semantics, results['rotate_bda'], fill=255).permute(1, 2, 0) # [200, 200, 16]
results['voxel_semantics'] = semantics.numpy()
final_instances = torch.from_numpy(final_instances).permute(2, 0, 1) # [16, 200, 200]
final_instances = rotate(final_instances, results['rotate_bda'], fill=255).permute(1, 2, 0) # [200, 200, 16]
results['voxel_instances'] = final_instances.numpy()
if results.get('flip_dx', False):
results['voxel_semantics'] = results['voxel_semantics'][::-1, ...].copy()
results['voxel_instances'] = results['voxel_instances'][::-1, ...].copy()
if results.get('flip_dy', False):
results['voxel_semantics'] = results['voxel_semantics'][:, ::-1, ...].copy()
results['voxel_instances'] = results['voxel_instances'][:, ::-1, ...].copy()
return results
# https://github.com/HuangJunJie2017/BEVDet/blob/58c2587a8f89a1927926f0bdb6cde2917c91a9a5/mmdet3d/datasets/pipelines/loading.py#L1177
@PIPELINES.register_module()
class BEVAug(object):
def __init__(self, bda_aug_conf, classes, is_train=True):
self.bda_aug_conf = bda_aug_conf
self.is_train = is_train
self.classes = classes
def sample_bda_augmentation(self):
"""Generate bda augmentation values based on bda_config."""
if self.is_train:
rotate_bda = np.random.uniform(*self.bda_aug_conf['rot_lim'])
scale_bda = np.random.uniform(*self.bda_aug_conf['scale_lim'])
flip_dx = np.random.uniform() < self.bda_aug_conf['flip_dx_ratio']
flip_dy = np.random.uniform() < self.bda_aug_conf['flip_dy_ratio']
else:
rotate_bda = 0
scale_bda = 1.0
flip_dx = False
flip_dy = False
return rotate_bda, scale_bda, flip_dx, flip_dy
def bev_transform(self, rotate_angle, scale_ratio, flip_dx, flip_dy):
"""
Returns:
rot_mat: (3, 3)
"""
rotate_angle = torch.tensor(rotate_angle / 180 * np.pi)
rot_sin = torch.sin(rotate_angle)
rot_cos = torch.cos(rotate_angle)
rot_mat = torch.Tensor([[rot_cos, -rot_sin, 0], [rot_sin, rot_cos, 0],
[0, 0, 1]])
scale_mat = torch.Tensor([[scale_ratio, 0, 0], [0, scale_ratio, 0],
[0, 0, scale_ratio]])
flip_mat = torch.Tensor([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
if flip_dx:
flip_mat = flip_mat @ torch.Tensor([[-1, 0, 0], [0, 1, 0],
[0, 0, 1]])
if flip_dy:
flip_mat = flip_mat @ torch.Tensor([[1, 0, 0], [0, -1, 0],
[0, 0, 1]])
rot_mat = flip_mat @ (scale_mat @ rot_mat)
return rot_mat
def __call__(self, results):
rotate_bda, scale_bda, flip_dx, flip_dy = self.sample_bda_augmentation()
bda_mat = torch.zeros(4, 4)
bda_mat[3, 3] = 1
# bda_rot: (3, 3)
bda_rot = self.bev_transform(rotate_bda, scale_bda, flip_dx, flip_dy)
bda_mat[:3, :3] = bda_rot
results['bda_mat'] = bda_mat
results['flip_dx'] = flip_dx
results['flip_dy'] = flip_dy
results['rotate_bda'] = rotate_bda
results['scale_bda'] = scale_bda
for i in range(len(results['ego2lidar'])):
results['ego2lidar'][i] = results['ego2lidar'][i] @ torch.inverse(bda_mat).numpy() # [4, 4] @ [4, 4]
return results
================================================
FILE: loaders/pipelines/transforms.py
================================================
import mmcv
import torch
import numpy as np
from PIL import Image
from numpy import random
from mmdet.datasets.builder import PIPELINES
@PIPELINES.register_module()
class PadMultiViewImage(object):
"""Pad the multi-view image.
There are two padding modes: (1) pad to a fixed size and (2) pad to the
minimum size that is divisible by some number.
Added keys are "pad_shape", "pad_fixed_size", "pad_size_divisor",
Args:
size (tuple, optional): Fixed padding size.
size_divisor (int, optional): The divisor of padded size.
pad_val (float, optional): Padding value, 0 by default.
"""
def __init__(self, size=None, size_divisor=None, pad_val=0):
self.size = size
self.size_divisor = size_divisor
self.pad_val = pad_val
# only one of size and size_divisor should be valid
assert size is not None or size_divisor is not None
assert size is None or size_divisor is None
def _pad_img(self, img):
if self.size_divisor is not None:
pad_h = int(np.ceil(img.shape[0] / self.size_divisor)) * self.size_divisor
pad_w = int(np.ceil(img.shape[1] / self.size_divisor)) * self.size_divisor
else:
pad_h, pad_w = self.size
pad_width = ((0, pad_h - img.shape[0]), (0, pad_w - img.shape[1]), (0, 0))
img = np.pad(img, pad_width, constant_values=self.pad_val)
return img
def _pad_imgs(self, results):
padded_img = [self._pad_img(img) for img in results['img']]
results['ori_shape'] = [img.shape for img in results['img']]
results['img'] = padded_img
results['img_shape'] = [img.shape for img in padded_img]
results['pad_shape'] = [img.shape for img in padded_img]
results['pad_fixed_size'] = self.size
results['pad_size_divisor'] = self.size_divisor
def __call__(self, results):
"""Call function to pad images, masks, semantic segmentation maps.
Args:
results (dict): Result dict from loading pipeline.
Returns:
dict: Updated result dict.
"""
self._pad_imgs(results)
return results
def __repr__(self):
repr_str = self.__class__.__name__
repr_str += f'(size={self.size}, '
repr_str += f'size_divisor={self.size_divisor}, '
repr_str += f'pad_val={self.pad_val})'
return repr_str
@PIPELINES.register_module()
class NormalizeMultiviewImage(object):
"""Normalize the image.
Added key is "img_norm_cfg".
Args:
mean (sequence): Mean values of 3 channels.
std (sequence): Std values of 3 channels.
to_rgb (bool): Whether to convert the image from BGR to RGB,
default is true.
"""
def __init__(self, mean, std, to_rgb=True):
self.mean = np.array(mean, dtype=np.float32).reshape(-1)
self.std = 1 / np.array(std, dtype=np.float32).reshape(-1)
self.to_rgb = to_rgb
def __call__(self, results):
"""Call function to normalize images.
Args:
results (dict): Result dict from loading pipeline.
Returns:
dict: Normalized results, 'img_norm_cfg' key is added into
result dict.
"""
normalized_imgs = []
for img in results['img']:
img = img.astype(np.float32)
if self.to_rgb:
img = img[..., ::-1]
img = img - self.mean
img = img * self.std
normalized_imgs.append(img)
results['img'] = normalized_imgs
results['img_norm_cfg'] = dict(
mean=self.mean,
std=self.std,
to_rgb=self.to_rgb
)
return results
def __repr__(self):
repr_str = self.__class__.__name__
repr_str += f'(mean={self.mean}, std={self.std}, to_rgb={self.to_rgb})'
return repr_str
@PIPELINES.register_module()
class PhotoMetricDistortionMultiViewImage:
"""Apply photometric distortion to image sequentially, every transformation
is applied with a probability of 0.5. The position of random contrast is in
second or second to last.
1. random brightness
2. random contrast (mode 0)
3. convert color from BGR to HSV
4. random saturation
5. random hue
6. convert color from HSV to BGR
7. random contrast (mode 1)
8. randomly swap channels
Args:
brightness_delta (int): delta of brightness.
contrast_range (tuple): range of contrast.
saturation_range (tuple): range of saturation.
hue_delta (int): delta of hue.
"""
def __init__(self,
brightness_delta=32,
contrast_range=(0.5, 1.5),
saturation_range=(0.5, 1.5),
hue_delta=18):
self.brightness_delta = brightness_delta
self.contrast_lower, self.contrast_upper = contrast_range
self.saturation_lower, self.saturation_upper = saturation_range
self.hue_delta = hue_delta
def __call__(self, results):
"""Call function to perform photometric distortion on images.
Args:
results (dict): Result dict from loading pipeline.
Returns:
dict: Result dict with images distorted.
"""
imgs = results['img']
new_imgs = []
for img in imgs:
ori_dtype = img.dtype
img = img.astype(np.float32)
# random brightness
if random.randint(2):
delta = random.uniform(-self.brightness_delta,
self.brightness_delta)
img += delta
# mode == 0 --> do random contrast first
# mode == 1 --> do random contrast last
mode = random.randint(2)
if mode == 1:
if random.randint(2):
alpha = random.uniform(self.contrast_lower,
self.contrast_upper)
img *= alpha
# convert color from BGR to HSV
img = mmcv.bgr2hsv(img)
# random saturation
if random.randint(2):
img[..., 1] *= random.uniform(self.saturation_lower,
self.saturation_upper)
# random hue
if random.randint(2):
img[..., 0] += random.uniform(-self.hue_delta, self.hue_delta)
img[..., 0][img[..., 0] > 360] -= 360
img[..., 0][img[..., 0] < 0] += 360
# convert color from HSV to BGR
img = mmcv.hsv2bgr(img)
# random contrast
if mode == 0:
if random.randint(2):
alpha = random.uniform(self.contrast_lower,
self.contrast_upper)
img *= alpha
# randomly swap channels
if random.randint(2):
img = img[..., random.permutation(3)]
new_imgs.append(img.astype(ori_dtype))
results['img'] = new_imgs
return results
def __repr__(self):
repr_str = self.__class__.__name__
repr_str += f'(\nbrightness_delta={self.brightness_delta},\n'
repr_str += 'contrast_range='
repr_str += f'{(self.contrast_lower, self.contrast_upper)},\n'
repr_str += 'saturation_range='
repr_str += f'{(self.saturation_lower, self.saturation_upper)},\n'
repr_str += f'hue_delta={self.hue_delta})'
return repr_str
@PIPELINES.register_module()
class RandomTransformImage(object):
def __init__(self, ida_aug_conf=None, training=True):
self.ida_aug_conf = ida_aug_conf
self.training = training
def __call__(self, results):
resize, resize_dims, crop, flip, rotate = self.sample_augmentation()
if len(results['lidar2img']) == len(results['img']):
for i in range(len(results['img'])):
img = Image.fromarray(np.uint8(results['img'][i]))
# resize, resize_dims, crop, flip, rotate = self._sample_augmentation()
img, ida_mat = self.img_transform(
img,
resize=resize,
resize_dims=resize_dims,
crop=crop,
flip=flip,
rotate=rotate,
)
results['img'][i] = np.array(img).astype(np.uint8)
results['lidar2img'][i] = ida_mat @ results['lidar2img'][i]
elif len(results['img']) == 6:
for i in range(len(results['img'])):
img = Image.fromarray(np.uint8(results['img'][i]))
# resize, resize_dims, crop, flip, rotate = self._sample_augmentation()
img, ida_mat = self.img_transform(
img,
resize=resize,
resize_dims=resize_dims,
crop=crop,
flip=flip,
rotate=rotate,
)
results['img'][i] = np.array(img).astype(np.uint8)
for i in range(len(results['lidar2img'])):
results['lidar2img'][i] = ida_mat @ results['lidar2img'][i]
else:
raise ValueError()
results['ori_shape'] = [img.shape for img in results['img']]
results['img_shape'] = [img.shape for img in results['img']]
results['pad_shape'] = [img.shape for img in results['img']]
return results
def img_transform(self, img, resize, resize_dims, crop, flip, rotate):
"""
https://github.com/Megvii-BaseDetection/BEVStereo/blob/master/dataset/nusc_mv_det_dataset.py#L48
"""
def get_rot(h):
return torch.Tensor([
[np.cos(h), np.sin(h)],
[-np.sin(h), np.cos(h)],
])
ida_rot = torch.eye(2)
ida_tran = torch.zeros(2)
# adjust image
img = img.resize(resize_dims)
img = img.crop(crop)
if flip:
img = img.transpose(method=Image.FLIP_LEFT_RIGHT)
img = img.rotate(rotate)
# post-homography transformation
ida_rot *= resize
ida_tran -= torch.Tensor(crop[:2])
if flip:
A = torch.Tensor([[-1, 0], [0, 1]])
b = torch.Tensor([crop[2] - crop[0], 0])
ida_rot = A.matmul(ida_rot)
ida_tran = A.matmul(ida_tran) + b
A = get_rot(rotate / 180 * np.pi)
b = torch.Tensor([crop[2] - crop[0], crop[3] - crop[1]]) / 2
b = A.matmul(-b) + b
ida_rot = A.matmul(ida_rot)
ida_tran = A.matmul(ida_tran) + b
ida_mat = torch.eye(4)
ida_mat[:2, :2] = ida_rot
ida_mat[:2, 2] = ida_tran
return img, ida_mat.numpy()
def sample_augmentation(self):
"""
https://github.com/Megvii-BaseDetection/BEVStereo/blob/master/dataset/nusc_mv_det_dataset.py#L247
"""
H, W = self.ida_aug_conf['H'], self.ida_aug_conf['W']
fH, fW = self.ida_aug_conf['final_dim']
if self.training:
resize = np.random.uniform(*self.ida_aug_conf['resize_lim'])
resize_dims = (int(W * resize), int(H * resize))
newW, newH = resize_dims
crop_h = int((1 - np.random.uniform(*self.ida_aug_conf['bot_pct_lim'])) * newH) - fH
crop_w = int(np.random.uniform(0, max(0, newW - fW)))
crop = (crop_w, crop_h, crop_w + fW, crop_h + fH)
flip = False
if self.ida_aug_conf['rand_flip'] and np.random.choice([0, 1]):
flip = True
rotate = np.random.uniform(*self.ida_aug_conf['rot_lim'])
else:
resize = max(fH / H, fW / W)
resize_dims = (int(W * resize), int(H * resize))
newW, newH = resize_dims
crop_h = int((1 - np.mean(self.ida_aug_conf['bot_pct_lim'])) * newH) - fH
crop_w = int(max(0, newW - fW) / 2)
crop = (crop_w, crop_h, crop_w + fW, crop_h + fH)
flip = False
rotate = 0
return resize, resize_dims, crop, flip, rotate
@PIPELINES.register_module()
class GlobalRotScaleTransImage(object):
def __init__(self,
rot_range=[-0.3925, 0.3925],
scale_ratio_range=[0.95, 1.05],
translation_std=[0, 0, 0]):
self.rot_range = rot_range
self.scale_ratio_range = scale_ratio_range
self.translation_std = translation_std
def __call__(self, results):
# random rotate
rot_angle = np.random.uniform(*self.rot_range)
self.rotate_z(results, rot_angle)
results["gt_bboxes_3d"].rotate(np.array(rot_angle))
# random scale
scale_ratio = np.random.uniform(*self.scale_ratio_range)
self.scale_xyz(results, scale_ratio)
results["gt_bboxes_3d"].scale(scale_ratio)
# TODO: support translation
return results
def rotate_z(self, results, rot_angle):
rot_cos = torch.cos(torch.tensor(rot_angle))
rot_sin = torch.sin(torch.tensor(rot_angle))
rot_mat = torch.tensor([
[rot_cos, -rot_sin, 0, 0],
[rot_sin, rot_cos, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1],
])
rot_mat_inv = torch.inverse(rot_mat)
for view in range(len(results['lidar2img'])):
results['lidar2img'][view] = (torch.tensor(results['lidar2img'][view]).float() @ rot_mat_inv).numpy()
def scale_xyz(self, results, scale_ratio):
scale_mat = torch.tensor([
[scale_ratio, 0, 0, 0],
[0, scale_ratio, 0, 0],
[0, 0, scale_ratio, 0],
[0, 0, 0, 1],
])
scale_mat_inv = torch.inverse(scale_mat)
for view in range(len(results['lidar2img'])):
results['lidar2img'][view] = (torch.tensor(results['lidar2img'][view]).float() @ scale_mat_inv).numpy()
================================================
FILE: loaders/ray_metrics.py
================================================
# Acknowledgments: https://github.com/tarashakhurana/4d-occ-forecasting
# Modified by Haisong Liu
import math
import copy
import numpy as np
import torch
from torch.utils.cpp_extension import load
from tqdm import tqdm
from prettytable import PrettyTable
from .ray_pq import Metric_RayPQ
dvr = load("dvr", sources=["lib/dvr/dvr.cpp", "lib/dvr/dvr.cu"], verbose=True, extra_cuda_cflags=['-allow-unsupported-compiler'])
_pc_range = [-40, -40, -1.0, 40, 40, 5.4]
_voxel_size = 0.4
# https://github.com/tarashakhurana/4d-occ-forecasting/blob/ff986082cd6ea10e67ab7839bf0e654736b3f4e2/test_fgbg.py#L29C1-L46C16
def get_rendered_pcds(origin, points, tindex, pred_dist):
pcds = []
for t in range(len(origin)):
mask = (tindex == t)
# skip the ones with no data
if not mask.any():
continue
_pts = points[mask, :3]
# use ground truth lidar points for the raycasting direction
v = _pts - origin[t][None, :]
d = v / np.sqrt((v ** 2).sum(axis=1, keepdims=True))
pred_pts = origin[t][None, :] + d * pred_dist[mask][:, None]
pcds.append(torch.from_numpy(pred_pts))
return pcds
def meshgrid3d(occ_size, pc_range):
W, H, D = occ_size
xs = torch.linspace(0.5, W - 0.5, W).view(W, 1, 1).expand(W, H, D) / W
ys = torch.linspace(0.5, H - 0.5, H).view(1, H, 1).expand(W, H, D) / H
zs = torch.linspace(0.5, D - 0.5, D).view(1, 1, D).expand(W, H, D) / D
xs = xs * (pc_range[3] - pc_range[0]) + pc_range[0]
ys = ys * (pc_range[4] - pc_range[1]) + pc_range[1]
zs = zs * (pc_range[5] - pc_range[2]) + pc_range[2]
xyz = torch.stack((xs, ys, zs), -1)
return xyz
def generate_lidar_rays():
# prepare lidar ray angles
pitch_angles = []
for k in range(10):
angle = math.pi / 2 - math.atan(k + 1)
pitch_angles.append(-angle)
# nuscenes lidar fov: [0.2107773983152201, -0.5439104895672159] (rad)
while pitch_angles[-1] < 0.21:
delta = pitch_angles[-1] - pitch_angles[-2]
pitch_angles.append(pitch_angles[-1] + delta)
lidar_rays = []
for pitch_angle in pitch_angles:
for azimuth_angle in np.arange(0, 360, 1):
azimuth_angle = np.deg2rad(azimuth_angle)
x = np.cos(pitch_angle) * np.cos(azimuth_angle)
y = np.cos(pitch_angle) * np.sin(azimuth_angle)
z = np.sin(pitch_angle)
lidar_rays.append((x, y, z))
return np.array(lidar_rays, dtype=np.float32)
def process_one_sample(sem_pred, lidar_rays, output_origin, instance_pred=None, occ_class_names=None):
# lidar origin in ego coordinate
# lidar_origin = torch.tensor([[[0.9858, 0.0000, 1.8402]]])
T = output_origin.shape[1]
pred_pcds_t = []
free_id = len(occ_class_names) - 1
occ_pred = copy.deepcopy(sem_pred)
occ_pred[sem_pred < free_id] = 1
occ_pred[sem_pred == free_id] = 0
occ_pred = occ_pred.permute(2, 1, 0)
occ_pred = occ_pred[None, None, :].contiguous().float()
offset = torch.Tensor(_pc_range[:3])[None, None, :]
scaler = torch.Tensor([_voxel_size] * 3)[None, None, :]
lidar_tindex = torch.zeros([1, lidar_rays.shape[0]])
for t in range(T):
lidar_origin = output_origin[:, t:t+1, :] # [1, 1, 3]
lidar_endpts = lidar_rays[None] + lidar_origin # [1, 15840, 3]
output_origin_render = ((lidar_origin - offset) / scaler).float() # [1, 1, 3]
output_points_render = ((lidar_endpts - offset) / scaler).float() # [1, N, 3]
output_tindex_render = lidar_tindex # [1, N], all zeros
with torch.no_grad():
pred_dist, _, coord_index = dvr.render_forward(
occ_pred.cuda(),
output_origin_render.cuda(),
output_points_render.cuda(),
output_tindex_render.cuda(),
[1, 16, 200, 200],
"test"
)
pred_dist *= _voxel_size
pred_pcds = get_rendered_pcds(
lidar_origin[0].cpu().numpy(),
lidar_endpts[0].cpu().numpy(),
lidar_tindex[0].cpu().numpy(),
pred_dist[0].cpu().numpy()
)
coord_index = coord_index[0, :, :].int().cpu() # [N, 3]
pred_label = sem_pred[coord_index[:, 0], coord_index[:, 1], coord_index[:, 2]][:, None] # [N, 1]
pred_dist = pred_dist[0, :, None].cpu()
if instance_pred is not None:
pred_instance = instance_pred[coord_index[:, 0], coord_index[:, 1], coord_index[:, 2]][:, None] # [N, 1]
pred_pcds = torch.cat([pred_label.float(), pred_instance.float(), pred_dist], dim=-1)
else:
pred_pcds = torch.cat([pred_label.float(), pred_dist], dim=-1)
pred_pcds_t.append(pred_pcds)
pred_pcds_t = torch.cat(pred_pcds_t, dim=0)
return pred_pcds_t.numpy()
def calc_rayiou(pcd_pred_list, pcd_gt_list, occ_class_names):
thresholds = [1, 2, 4]
gt_cnt = np.zeros([len(occ_class_names)])
pred_cnt = np.zeros([len(occ_class_names)])
tp_cnt = np.zeros([len(thresholds), len(occ_class_names)])
for pcd_pred, pcd_gt in zip(pcd_pred_list, pcd_gt_list):
for j, threshold in enumerate(thresholds):
# L1
depth_pred = pcd_pred[:, 1]
depth_gt = pcd_gt[:, 1]
l1_error = np.abs(depth_pred - depth_gt)
tp_dist_mask = (l1_error < threshold)
for i, cls in enumerate(occ_class_names):
cls_id = occ_class_names.index(cls)
cls_mask_pred = (pcd_pred[:, 0] == cls_id)
cls_mask_gt = (pcd_gt[:, 0] == cls_id)
gt_cnt_i = cls_mask_gt.sum()
pred_cnt_i = cls_mask_pred.sum()
if j == 0:
gt_cnt[i] += gt_cnt_i
pred_cnt[i] += pred_cnt_i
tp_cls = cls_mask_gt & cls_mask_pred # [N]
tp_mask = np.logical_and(tp_cls, tp_dist_mask)
tp_cnt[j][i] += tp_mask.sum()
iou_list = []
for j, threshold in enumerate(thresholds):
iou_list.append((tp_cnt[j] / (gt_cnt + pred_cnt - tp_cnt[j]))[:-1])
return iou_list
def main_rayiou(sem_pred_list, sem_gt_list, lidar_origin_list, occ_class_names):
torch.cuda.empty_cache()
# generate lidar rays
lidar_rays = generate_lidar_rays()
lidar_rays = torch.from_numpy(lidar_rays)
pcd_pred_list, pcd_gt_list = [], []
for sem_pred, sem_gt, lidar_origins in tqdm(zip(sem_pred_list, sem_gt_list, lidar_origin_list), ncols=50):
sem_pred = torch.from_numpy(np.reshape(sem_pred, [200, 200, 16]))
sem_gt = torch.from_numpy(np.reshape(sem_gt, [200, 200, 16]))
pcd_pred = process_one_sample(sem_pred, lidar_rays, lidar_origins, occ_class_names=occ_class_names)
pcd_gt = process_one_sample(sem_gt, lidar_rays, lidar_origins, occ_class_names=occ_class_names)
# evalute on non-free rays
valid_mask = (pcd_gt[:, 0].astype(np.int32) != len(occ_class_names) - 1)
pcd_pred = pcd_pred[valid_mask]
pcd_gt = pcd_gt[valid_mask]
assert pcd_pred.shape == pcd_gt.shape
pcd_pred_list.append(pcd_pred)
pcd_gt_list.append(pcd_gt)
iou_list = calc_rayiou(pcd_pred_list, pcd_gt_list, occ_class_names)
rayiou = np.nanmean(iou_list)
rayiou_0 = np.nanmean(iou_list[0])
rayiou_1 = np.nanmean(iou_list[1])
rayiou_2 = np.nanmean(iou_list[2])
table = PrettyTable([
'Class Names',
'RayIoU@1', 'RayIoU@2', 'RayIoU@4'
])
table.float_format = '.3'
for i in range(len(occ_class_names) - 1):
table.add_row([
occ_class_names[i],
iou_list[0][i], iou_list[1][i], iou_list[2][i]
], divider=(i == len(occ_class_names) - 2))
table.add_row(['MEAN', rayiou_0, rayiou_1, rayiou_2])
print(table)
torch.cuda.empty_cache()
return {
'RayIoU': rayiou,
'RayIoU@1': rayiou_0,
'RayIoU@2': rayiou_1,
'RayIoU@4': rayiou_2,
}
def main_raypq(sem_pred_list, sem_gt_list, inst_pred_list, inst_gt_list, lidar_origin_list, occ_class_names):
torch.cuda.empty_cache()
eval_metrics_pq = Metric_RayPQ(
occ_class_names=occ_class_names,
num_classes=len(occ_class_names),
thresholds=[1, 2, 4]
)
# generate lidar rays
lidar_rays = generate_lidar_rays()
lidar_rays = torch.from_numpy(lidar_rays)
for sem_pred, sem_gt, inst_pred, inst_gt, lidar_origins in \
tqdm(zip(sem_pred_list, sem_gt_list, inst_pred_list, inst_gt_list, lidar_origin_list), ncols=50):
sem_pred = torch.from_numpy(np.reshape(sem_pred, [200, 200, 16]))
sem_gt = torch.from_numpy(np.reshape(sem_gt, [200, 200, 16]))
inst_pred = torch.from_numpy(np.reshape(inst_pred, [200, 200, 16]))
inst_gt = torch.from_numpy(np.reshape(inst_gt, [200, 200, 16]))
pcd_pred = process_one_sample(sem_pred, lidar_rays, lidar_origins, instance_pred=inst_pred, occ_class_names=occ_class_names)
pcd_gt = process_one_sample(sem_gt, lidar_rays, lidar_origins, instance_pred=inst_gt, occ_class_names=occ_class_names)
# evalute on non-free rays
valid_mask = (pcd_gt[:, 0].astype(np.int32) != len(occ_class_names) - 1)
pcd_pred = pcd_pred[valid_mask]
pcd_gt = pcd_gt[valid_mask]
assert pcd_pred.shape == pcd_gt.shape
sem_gt = pcd_gt[:, 0].astype(np.int32)
sem_pred = pcd_pred[:, 0].astype(np.int32)
instances_gt = pcd_gt[:, 1].astype(np.int32)
instances_pred = pcd_pred[:, 1].astype(np.int32)
# L1
depth_gt = pcd_gt[:, 2]
depth_pred = pcd_pred[:, 2]
l1_error = np.abs(depth_pred - depth_gt)
eval_metrics_pq.add_batch(sem_pred, sem_gt, instances_pred, instances_gt, l1_error)
torch.cuda.empty_cache()
return eval_metrics_pq.count_pq()
================================================
FILE: loaders/ray_pq.py
================================================
import numpy as np
from prettytable import PrettyTable
class Metric_RayPQ:
def __init__(self,
occ_class_names,
num_classes=18,
thresholds=[1, 2, 4]):
"""
Args:
ignore_index (llist): Class ids that not be considered in pq counting.
"""
if num_classes == 18 or num_classes == 17:
self.class_names = occ_class_names
else:
raise ValueError
self.num_classes = num_classes
self.id_offset = 2 ** 16
self.eps = 1e-5
self.thresholds = thresholds
self.min_num_points = 10
self.include = np.array(
[n for n in range(self.num_classes - 1)],
dtype=int)
self.cnt = 0
# panoptic stuff
self.pan_tp = np.zeros([len(self.thresholds), num_classes], dtype=int)
self.pan_iou = np.zeros([len(self.thresholds), num_classes], dtype=np.double)
self.pan_fp = np.zeros([len(self.thresholds), num_classes], dtype=int)
self.pan_fn = np.zeros([len(self.thresholds), num_classes], dtype=int)
def add_batch(self,semantics_pred,semantics_gt,instances_pred,instances_gt, l1_error):
self.cnt += 1
self.add_panoptic_sample(semantics_pred, semantics_gt, instances_pred, instances_gt, l1_error)
def add_panoptic_sample(self, semantics_pred, semantics_gt, instances_pred, instances_gt, l1_error):
"""Add one sample of panoptic predictions and ground truths for
evaluation.
Args:
semantics_pred (np.ndarray): Semantic predictions.
semantics_gt (np.ndarray): Semantic ground truths.
instances_pred (np.ndarray): Instance predictions.
instances_gt (np.ndarray): Instance ground truths.
"""
# get instance_class_id from instance_gt
instance_class_ids = [self.num_classes - 1]
for i in range(1, instances_gt.max() + 1):
class_id = np.unique(semantics_gt[instances_gt == i])
# assert class_id.shape[0] == 1, "each instance must belong to only one class"
if class_id.shape[0] == 1:
instance_class_ids.append(class_id[0])
else:
instance_class_ids.append(self.num_classes - 1)
instance_class_ids = np.array(instance_class_ids)
instance_count = 1
final_instance_class_ids = []
final_instances = np.zeros_like(instances_gt) # empty space has instance id "0"
for class_id in range(self.num_classes - 1):
if np.sum(semantics_gt == class_id) == 0:
continue
if self.class_names[class_id] in ['car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'motorcycle', 'bicycle', 'pedestrian']:
# treat as instances
for instance_id in range(len(instance_class_ids)):
if instance_class_ids[instance_id] != class_id:
continue
final_instances[instances_gt == instance_id] = instance_count
instance_count += 1
final_instance_class_ids.append(class_id)
else:
# treat as semantics
final_instances[semantics_gt == class_id] = instance_count
instance_count += 1
final_instance_class_ids.append(class_id)
instances_gt = final_instances
# avoid zero (ignored label)
instances_pred = instances_pred + 1
instances_gt = instances_gt + 1
for j, threshold in enumerate(self.thresholds):
tp_dist_mask = l1_error < threshold
# for each class (except the ignored ones)
for cl in self.include:
# get a class mask
pred_inst_in_cl_mask = semantics_pred == cl
gt_inst_in_cl_mask = semantics_gt == cl
# get instance points in class (makes outside stuff 0)
pred_inst_in_cl = instances_pred * pred_inst_in_cl_mask.astype(int)
gt_inst_in_cl = instances_gt * gt_inst_in_cl_mask.astype(int)
# generate the areas for each unique instance prediction
unique_pred, counts_pred = np.unique(
pred_inst_in_cl[pred_inst_in_cl > 0], return_counts=True)
id2idx_pred = {id: idx for idx, id in enumerate(unique_pred)}
matched_pred = np.array([False] * unique_pred.shape[0])
# generate the areas for each unique instance gt_np
unique_gt, counts_gt = np.unique(
gt_inst_in_cl[gt_inst_in_cl > 0], return_counts=True)
id2idx_gt = {id: idx for idx, id in enumerate(unique_gt)}
matched_gt = np.array([False] * unique_gt.shape[0])
# generate intersection using offset
valid_combos = np.logical_and(pred_inst_in_cl > 0,
gt_inst_in_cl > 0)
# add dist_mask
valid_combos = np.logical_and(valid_combos, tp_dist_mask)
id_offset_combo = pred_inst_in_cl[
valid_combos] + self.id_offset * gt_inst_in_cl[valid_combos]
unique_combo, counts_combo = np.unique(
id_offset_combo, return_counts=True)
# generate an intersection map
# count the intersections with over 0.5 IoU as TP
gt_labels = unique_combo // self.id_offset
pred_labels = unique_combo % self.id_offset
gt_areas = np.array([counts_gt[id2idx_gt[id]] for id in gt_labels])
pred_areas = np.array(
[counts_pred[id2idx_pred[id]] for id in pred_labels])
intersections = counts_combo
unions = gt_areas + pred_areas - intersections
ious = intersections.astype(float) / unions.astype(float)
tp_indexes = ious > 0.5
self.pan_tp[j][cl] += np.sum(tp_indexes)
self.pan_iou[j][cl] += np.sum(ious[tp_indexes])
matched_gt[[id2idx_gt[id] for id in gt_labels[tp_indexes]]] = True
matched_pred[[id2idx_pred[id]
for id in pred_labels[tp_indexes]]] = True
# count the FN
if len(counts_gt) > 0:
self.pan_fn[j][cl] += np.sum(
np.logical_and(counts_gt >= self.min_num_points,
~matched_gt))
# count the FP
if len(matched_pred) > 0:
self.pan_fp[j][cl] += np.sum(
np.logical_and(counts_pred >= self.min_num_points,
~matched_pred))
def count_pq(self):
sq_all = self.pan_iou.astype(np.double) / np.maximum(
self.pan_tp.astype(np.double), self.eps)
rq_all = self.pan_tp.astype(np.double) / np.maximum(
self.pan_tp.astype(np.double) + 0.5 * self.pan_fp.astype(np.double)
+ 0.5 * self.pan_fn.astype(np.double), self.eps)
pq_all = sq_all * rq_all
# mask classes not occurring in dataset
mask = (self.pan_tp + self.pan_fp + self.pan_fn) > 0
pq_all[~mask] = float('nan')
table = PrettyTable([
'Class Names',
'RayPQ@%d' % self.thresholds[0],
'RayPQ@%d' % self.thresholds[1],
'RayPQ@%d' % self.thresholds[2]
])
table.float_format = '.3'
for i in range(len(self.class_names) - 1):
table.add_row([
self.class_names[i],
pq_all[0][i], pq_all[1][i], pq_all[2][i],
], divider=(i == len(self.class_names) - 2))
table.add_row([
'MEAN',
np.nanmean(pq_all[0]), np.nanmean(pq_all[1]), np.nanmean(pq_all[2])
])
print(table)
return {
'RayPQ': np.nanmean(pq_all),
'RayPQ@1': np.nanmean(pq_all[0]),
'RayPQ@2': np.nanmean(pq_all[1]),
'RayPQ@4': np.nanmean(pq_all[2]),
}
================================================
FILE: models/__init__.py
================================================
from .backbones import __all__
from .bbox import __all__
from .sparseocc import SparseOcc
from .sparseocc_head import SparseOccHead
from .sparseocc_transformer import SparseOccTransformer
from .loss_utils import *
__all__ = []
================================================
FILE: models/backbones/__init__.py
================================================
from .vovnet import VoVNet
__all__ = ['VoVNet']
================================================
FILE: models/backbones/vovnet.py
================================================
import torch
import torch.nn as nn
import torch.nn.functional as F
import warnings
import torch.utils.checkpoint as cp
from collections import OrderedDict
from mmcv.runner import BaseModule
from mmdet.models.builder import BACKBONES
from torch.nn.modules.batchnorm import _BatchNorm
VoVNet19_slim_dw_eSE = {
'stem': [64, 64, 64],
'stage_conv_ch': [64, 80, 96, 112],
'stage_out_ch': [112, 256, 384, 512],
"layer_per_block": 3,
"block_per_stage": [1, 1, 1, 1],
"eSE": True,
"dw": True
}
VoVNet19_dw_eSE = {
'stem': [64, 64, 64],
"stage_conv_ch": [128, 160, 192, 224],
"stage_out_ch": [256, 512, 768, 1024],
"layer_per_block": 3,
"block_per_stage": [1, 1, 1, 1],
"eSE": True,
"dw": True
}
VoVNet19_slim_eSE = {
'stem': [64, 64, 128],
'stage_conv_ch': [64, 80, 96, 112],
'stage_out_ch': [112, 256, 384, 512],
'layer_per_block': 3,
'block_per_stage': [1, 1, 1, 1],
'eSE': True,
"dw": False
}
VoVNet19_eSE = {
'stem': [64, 64, 128],
"stage_conv_ch": [128, 160, 192, 224],
"stage_out_ch": [256, 512, 768, 1024],
"layer_per_block": 3,
"block_per_stage": [1, 1, 1, 1],
"eSE": True,
"dw": False
}
VoVNet39_eSE = {
'stem': [64, 64, 128],
"stage_conv_ch": [128, 160, 192, 224],
"stage_out_ch": [256, 512, 768, 1024],
"layer_per_block": 5,
"block_per_stage": [1, 1, 2, 2],
"eSE": True,
"dw": False
}
VoVNet57_eSE = {
'stem': [64, 64, 128],
"stage_conv_ch": [128, 160, 192, 224],
"stage_out_ch": [256, 512, 768, 1024],
"layer_per_block": 5,
"block_per_stage": [1, 1, 4, 3],
"eSE": True,
"dw": False
}
VoVNet99_eSE = {
'stem': [64, 64, 128],
"stage_conv_ch": [128, 160, 192, 224],
"stage_out_ch": [256, 512, 768, 1024],
"layer_per_block": 5,
"block_per_stage": [1, 3, 9, 3],
"eSE": True,
"dw": False
}
_STAGE_SPECS = {
"V-19-slim-dw-eSE": VoVNet19_slim_dw_eSE,
"V-19-dw-eSE": VoVNet19_dw_eSE,
"V-19-slim-eSE": VoVNet19_slim_eSE,
"V-19-eSE": VoVNet19_eSE,
"V-39-eSE": VoVNet39_eSE,
"V-57-eSE": VoVNet57_eSE,
"V-99-eSE": VoVNet99_eSE,
}
def dw_conv3x3(in_channels, out_channels, module_name, postfix, stride=1, kernel_size=3, padding=1):
"""3x3 convolution with padding"""
return [
(
'{}_{}/dw_conv3x3'.format(module_name, postfix),
nn.Conv2d(
in_channels,
out_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
groups=out_channels,
bias=False
)
),
(
'{}_{}/pw_conv1x1'.format(module_name, postfix),
nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0, groups=1, bias=False)
),
('{}_{}/pw_norm'.format(module_name, postfix), nn.BatchNorm2d(out_channels)),
('{}_{}/pw_relu'.format(module_name, postfix), nn.ReLU(inplace=True)),
]
def conv3x3(in_channels, out_channels, module_name, postfix, stride=1, groups=1, kernel_size=3, padding=1):
"""3x3 convolution with padding"""
return [
(
f"{module_name}_{postfix}/conv",
nn.Conv2d(
in_channels,
out_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
groups=groups,
bias=False,
),
),
(f"{module_name}_{postfix}/norm", nn.BatchNorm2d(out_channels)),
(f"{module_name}_{postfix}/relu", nn.ReLU(inplace=True)),
]
def conv1x1(in_channels, out_channels, module_name, postfix, stride=1, groups=1, kernel_size=1, padding=0):
"""1x1 convolution with padding"""
return [
(
f"{module_name}_{postfix}/conv",
nn.Conv2d(
in_channels,
out_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
groups=groups,
bias=False,
),
),
(f"{module_name}_{postfix}/norm", nn.BatchNorm2d(out_channels)),
(f"{module_name}_{postfix}/relu", nn.ReLU(inplace=True)),
]
class Hsigmoid(nn.Module):
def __init__(self, inplace=True):
super(Hsigmoid, self).__init__()
self.inplace = inplace
def forward(self, x):
return F.relu6(x + 3.0, inplace=self.inplace) / 6.0
class eSEModule(nn.Module):
def __init__(self, channel, reduction=4):
super(eSEModule, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.fc = nn.Conv2d(channel, channel, kernel_size=1, padding=0)
self.hsigmoid = Hsigmoid()
def forward(self, x):
inputs = x
x = self.avg_pool(x)
x = self.fc(x)
x = self.hsigmoid(x)
return inputs * x
class _OSA_module(nn.Module):
def __init__(self, in_ch, stage_ch, concat_ch, layer_per_block, module_name, SE=False, identity=False, depthwise=False, with_cp=False):
super(_OSA_module, self).__init__()
self.with_cp = with_cp
self.identity = identity
self.depthwise = depthwise
self.isReduced = False
self.layers = nn.ModuleList()
in_channel = in_ch
if self.depthwise and in_channel != stage_ch:
self.isReduced = True
self.conv_reduction = nn.Sequential(
OrderedDict(conv1x1(in_channel, stage_ch, "{}_reduction".format(module_name), "0"))
)
for i in range(layer_per_block):
if self.depthwise:
self.layers.append(nn.Sequential(OrderedDict(dw_conv3x3(stage_ch, stage_ch, module_name, i))))
else:
self.layers.append(nn.Sequential(OrderedDict(conv3x3(in_channel, stage_ch, module_name, i))))
in_channel = stage_ch
# feature aggregation
in_channel = in_ch + layer_per_block * stage_ch
self.concat = nn.Sequential(OrderedDict(conv1x1(in_channel, concat_ch, module_name, "concat")))
self.ese = eSEModule(concat_ch)
def _forward(self, x):
identity_feat = x
output = []
output.append(x)
if self.depthwise and self.isReduced:
x = self.conv_reduction(x)
for layer in self.layers:
x = layer(x)
output.append(x)
x = torch.cat(output, dim=1)
xt = self.concat(x)
xt = self.ese(xt)
if self.identity:
xt = xt + identity_feat
return xt
def forward(self, x):
if self.with_cp and self.training and x.requires_grad:
return cp.checkpoint(self._forward, x)
else:
return self._forward(x)
class _OSA_stage(nn.Sequential):
def __init__(self, in_ch, stage_ch, concat_ch, block_per_stage, layer_per_block, stage_num, SE=False, depthwise=False, with_cp=False):
super(_OSA_stage, self).__init__()
if not stage_num == 2:
self.add_module("Pooling", nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True))
if block_per_stage != 1:
SE = False
module_name = f"OSA{stage_num}_1"
self.add_module(
module_name, _OSA_module(in_ch, stage_ch, concat_ch, layer_per_block, module_name, SE, depthwise=depthwise, with_cp=with_cp)
)
for i in range(block_per_stage - 1):
if i != block_per_stage - 2: # last block
SE = False
module_name = f"OSA{stage_num}_{i + 2}"
self.add_module(
module_name,
_OSA_module(
concat_ch,
stage_ch,
concat_ch,
layer_per_block,
module_name,
SE,
identity=True,
depthwise=depthwise,
with_cp=with_cp
),
)
@BACKBONES.register_module()
class VoVNet(BaseModule):
def __init__(self, spec_name, input_ch=3, out_features=None, frozen_stages=-1, norm_eval=True, with_cp=False, pretrained=None, init_cfg=None):
"""
Args:
input_ch(int) : the number of input channel
out_features (list[str]): name of the layers whose outputs should
be returned in forward. Can be anything in "stem", "stage2" ...
"""
super(VoVNet, self).__init__(init_cfg)
self.frozen_stages = frozen_stages
self.norm_eval = norm_eval
if isinstance(pretrained, str):
warnings.warn('DeprecationWarning: pretrained is deprecated, '
'please use "init_cfg" instead')
self.init_cfg = dict(type='Pretrained', checkpoint=pretrained)
stage_specs = _STAGE_SPECS[spec_name]
stem_ch = stage_specs["stem"]
config_stage_ch = stage_specs["stage_conv_ch"]
config_concat_ch = stage_specs["stage_out_ch"]
block_per_stage = stage_specs["block_per_stage"]
layer_per_block = stage_specs["layer_per_block"]
SE = stage_specs["eSE"]
depthwise = stage_specs["dw"]
self._out_features = out_features
# Stem module
conv_type = dw_conv3x3 if depthwise else conv3x3
stem = conv3x3(input_ch, stem_ch[0], "stem", "1", 2)
stem += conv_type(stem_ch[0], stem_ch[1], "stem", "2", 1)
stem += conv_type(stem_ch[1], stem_ch[2], "stem", "3", 2)
self.add_module("stem", nn.Sequential((OrderedDict(stem))))
current_stirde = 4
self._out_feature_strides = {"stem": current_stirde, "stage2": current_stirde}
self._out_feature_channels = {"stem": stem_ch[2]}
stem_out_ch = [stem_ch[2]]
in_ch_list = stem_out_ch + config_concat_ch[:-1]
# OSA stages
self.stage_names = []
for i in range(4): # num_stages
name = "stage%d" % (i + 2) # stage 2 ... stage 5
self.stage_names.append(name)
self.add_module(
name,
_OSA_stage(
in_ch_list[i],
config_stage_ch[i],
config_concat_ch[i],
block_per_stage[i],
layer_per_block,
i + 2,
SE,
depthwise,
with_cp=with_cp
),
)
self._out_feature_channels[name] = config_concat_ch[i]
if not i == 0:
self._out_feature_strides[name] = current_stirde = int(current_stirde * 2)
# initialize weights
# self._initialize_weights()
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight)
def forward(self, x):
outputs = {}
x = self.stem(x)
if "stem" in self._out_features:
outputs["stem"] = x
for name in self.stage_names:
x = getattr(self, name)(x)
if name in self._out_features:
outputs[name] = x
return outputs
def _freeze_stages(self):
if self.frozen_stages >= 0:
m = getattr(self, 'stem')
m.eval()
for param in m.parameters():
param.requires_grad = False
for i in range(1, self.frozen_stages + 1):
m = getattr(self, f'stage{i+1}')
m.eval()
for param in m.parameters():
param.requires_grad = False
def train(self, mode=True):
"""Convert the model into training mode while keep normalization layer
freezed."""
super(VoVNet, self).train(mode)
self._freeze_stages()
if mode and self.norm_eval:
for m in self.modules():
# trick: eval have effect on BatchNorm only
if isinstance(m, _BatchNorm):
m.eval()
================================================
FILE: models/bbox/__init__.py
================================================
from .assigners import __all__
from .coders import __all__
from .match_costs import __all__
================================================
FILE: models/bbox/assigners/__init__.py
================================================
from .hungarian_assigner_3d import HungarianAssigner3D
__all__ = ['HungarianAssigner3D']
================================================
FILE: models/bbox/assigners/hungarian_assigner_3d.py
================================================
import torch
from mmdet.core.bbox.builder import BBOX_ASSIGNERS
from mmdet.core.bbox.assigners import AssignResult
from mmdet.core.bbox.assigners import BaseAssigner
from mmdet.core.bbox.match_costs import build_match_cost
from ..utils import normalize_bbox
try:
from scipy.optimize import linear_sum_assignment
except ImportError:
linear_sum_assignment = None
@BBOX_ASSIGNERS.register_module()
class HungarianAssigner3D(BaseAssigner):
def __init__(self,
cls_cost=dict(type='ClassificationCost', weight=1.),
reg_cost=dict(type='BBoxL1Cost', weight=1.0),
iou_cost=dict(type='IoUCost', weight=0.0),
pc_range=None):
self.cls_cost = build_match_cost(cls_cost)
self.reg_cost = build_match_cost(reg_cost)
self.iou_cost = build_match_cost(iou_cost)
self.pc_range = pc_range
def assign(self,
bbox_pred,
cls_pred,
gt_bboxes,
gt_labels,
gt_bboxes_ignore=None,
code_weights=None,
with_velo=False):
assert gt_bboxes_ignore is None, \
'Only case when gt_bboxes_ignore is None is supported.'
num_gts, num_bboxes = gt_bboxes.size(0), bbox_pred.size(0)
# 1. assign -1 by default
assigned_gt_inds = bbox_pred.new_full((num_bboxes, ),
-1,
dtype=torch.long)
assigned_labels = bbox_pred.new_full((num_bboxes, ),
-1,
dtype=torch.long)
if num_gts == 0 or num_bboxes == 0:
# No ground truth or boxes, return empty assignment
if num_gts == 0:
# No ground truth, assign all to background
assigned_gt_inds[:] = 0
return AssignResult(
num_gts, assigned_gt_inds, None, labels=assigned_labels)
# 2. compute the weighted costs
# classification and bboxcost.
cls_cost = self.cls_cost(cls_pred, gt_labels)
# regression L1 cost
normalized_gt_bboxes = normalize_bbox(gt_bboxes)
if code_weights is not None:
bbox_pred = bbox_pred * code_weights
normalized_gt_bboxes = normalized_gt_bboxes * code_weights
if with_velo:
reg_cost = self.reg_cost(bbox_pred, normalized_gt_bboxes)
else:
reg_cost = self.reg_cost(bbox_pred[:, :8], normalized_gt_bboxes[:, :8])
# weighted sum of above two costs
cost = cls_cost + reg_cost
# 3. do Hungarian matching on CPU using linear_sum_assignment
cost = cost.detach().cpu()
cost = torch.nan_to_num(cost, nan=100.0, posinf=100.0, neginf=-100.0)
if linear_sum_assignment is None:
raise ImportError('Please run "pip install scipy" '
'to install scipy first.')
matched_row_inds, matched_col_inds = linear_sum_assignment(cost)
matched_row_inds = torch.from_numpy(matched_row_inds).to(
bbox_pred.device)
matched_col_inds = torch.from_numpy(matched_col_inds).to(
bbox_pred.device)
# 4. assign backgrounds and foregrounds
# assign all indices to backgrounds first
assigned_gt_inds[:] = 0
# assign foregrounds based on matching results
assigned_gt_inds[matched_row_inds] = matched_col_inds + 1
assigned_labels[matched_row_inds] = gt_labels[matched_col_inds]
return AssignResult(
num_gts, assigned_gt_inds, None, labels=assigned_labels)
================================================
FILE: models/bbox/coders/__init__.py
================================================
from .nms_free_coder import NMSFreeCoder
__all__ = ['NMSFreeCoder']
================================================
FILE: models/bbox/coders/nms_free_coder.py
================================================
import torch
from mmdet.core.bbox import BaseBBoxCoder
from mmdet.core.bbox.builder import BBOX_CODERS
from ..utils import denormalize_bbox
@BBOX_CODERS.register_module()
class NMSFreeCoder(BaseBBoxCoder):
"""Bbox coder for NMS-free detector.
Args:
pc_range (list[float]): Range of point cloud.
post_center_range (list[float]): Limit of the center.
Default: None.
max_num (int): Max number to be kept. Default: 100.
score_threshold (float): Threshold to filter boxes based on score.
Default: None.
code_size (int): Code size of bboxes. Default: 9
"""
def __init__(self,
pc_range,
voxel_size=None,
post_center_range=None,
max_num=100,
score_threshold=None,
num_classes=10):
self.pc_range = pc_range
self.voxel_size = voxel_size
self.post_center_range = post_center_range
self.max_num = max_num
self.score_threshold = score_threshold
self.num_classes = num_classes
def encode(self):
pass
def decode_single(self, cls_scores, bbox_preds):
"""Decode bboxes.
Args:
cls_scores (Tensor): Outputs from the classification head, \
shape [num_query, cls_out_channels]. Note \
cls_out_channels should includes background.
bbox_preds (Tensor): Outputs from the regression \
head with normalized coordinate format (cx, cy, w, l, cz, h, rot_sine, rot_cosine, vx, vy). \
Shape [num_query, 9].
Returns:
list[dict]: Decoded boxes.
"""
max_num = self.max_num
cls_scores = cls_scores.sigmoid()
scores, indexs = cls_scores.view(-1).topk(max_num)
labels = indexs % self.num_classes
bbox_index = torch.div(indexs, self.num_classes, rounding_mode='trunc')
bbox_preds = bbox_preds[bbox_index]
final_box_preds = denormalize_bbox(bbox_preds)
final_scores = scores
final_preds = labels
# use score threshold
if self.score_threshold is not None:
thresh_mask = final_scores > self.score_threshold
if self.post_center_range is not None:
limit = torch.tensor(self.post_center_range, device=scores.device)
mask = (final_box_preds[..., :3] >= limit[:3]).all(1)
mask &= (final_box_preds[..., :3] <= limit[3:]).all(1)
if self.score_threshold:
mask &= thresh_mask
boxes3d = final_box_preds[mask]
scores = final_scores[mask]
labels = final_preds[mask]
predictions_dict = {
'bboxes': boxes3d,
'scores': scores,
'labels': labels
}
else:
raise NotImplementedError(
'Need to reorganize output as a batch, only '
'support post_center_range is not None for now!'
)
return predictions_dict
def decode(self, preds_dicts):
"""Decode bboxes.
Args:
all_cls_scores (Tensor): Outputs from the classification head, \
shape [nb_dec, bs, num_query, cls_out_channels]. Note \
cls_out_channels should includes background.
all_bbox_preds (Tensor): Sigmoid outputs from the regression \
head with normalized coordinate format (cx, cy, w, l, cz, h, rot_sine, rot_cosine, vx, vy). \
Shape [nb_dec, bs, num_query, 9].
Returns:
list[dict]: Decoded boxes.
"""
all_cls_scores = preds_dicts['all_cls_scores'][-1]
all_bbox_preds = preds_dicts['all_bbox_preds'][-1]
batch_size = all_cls_scores.size()[0]
predictions_list = []
for i in range(batch_size):
predictions_list.append(self.decode_single(all_cls_scores[i], all_bbox_preds[i]))
return predictions_list
================================================
FILE: models/bbox/match_costs/__init__.py
================================================
from .match_cost import BBox3DL1Cost
__all__ = ['BBox3DL1Cost']
================================================
FILE: models/bbox/match_costs/match_cost.py
================================================
import torch
from mmdet.core.bbox.match_costs.builder import MATCH_COST
@MATCH_COST.register_module()
class BBox3DL1Cost(object):
"""BBox3DL1Cost.
Args:
weight (int | float, optional): loss_weight
"""
def __init__(self, weight=1.0):
self.weight = weight
def __call__(self, bbox_pred, gt_bboxes):
"""
Args:
bbox_pred (Tensor): Predicted boxes with normalized coordinates
(cx, cy, w, h), which are all in range [0, 1]. Shape
[num_query, 4].
gt_bboxes (Tensor): Ground truth boxes with normalized
coordinates (x1, y1, x2, y2). Shape [num_gt, 4].
Returns:
torch.Tensor: bbox_cost value with weight
"""
bbox_cost = torch.cdist(bbox_pred, gt_bboxes, p=1)
return bbox_cost * self.weight
@MATCH_COST.register_module()
class BBoxBEVL1Cost(object):
def __init__(self, weight, pc_range):
self.weight = weight
self.pc_range = pc_range
def __call__(self, bboxes, gt_bboxes):
pc_start = bboxes.new(self.pc_range[0:2])
pc_range = bboxes.new(self.pc_range[3:5]) - bboxes.new(self.pc_range[0:2])
# normalize the box center to [0, 1]
normalized_bboxes_xy = (bboxes[:, :2] - pc_start) / pc_range
normalized_gt_bboxes_xy = (gt_bboxes[:, :2] - pc_start) / pc_range
reg_cost = torch.cdist(normalized_bboxes_xy, normalized_gt_bboxes_xy, p=1)
return reg_cost * self.weight
@MATCH_COST.register_module()
class IoU3DCost(object):
def __init__(self, weight):
self.weight = weight
def __call__(self, iou):
iou_cost = - iou
return iou_cost * self.weight
================================================
FILE: models/bbox/utils.py
================================================
import torch
def normalize_bbox(bboxes):
cx = bboxes[..., 0:1]
cy = bboxes[..., 1:2]
cz = bboxes[..., 2:3]
w = bboxes[..., 3:4].log()
l = bboxes[..., 4:5].log()
h = bboxes[..., 5:6].log()
rot = bboxes[..., 6:7]
if bboxes.size(-1) > 7:
vx = bboxes[..., 7:8]
vy = bboxes[..., 8:9]
out = torch.cat([cx, cy, w, l, cz, h, rot.sin(), rot.cos(), vx, vy], dim=-1)
else:
out = torch.cat([cx, cy, w, l, cz, h, rot.sin(), rot.cos()], dim=-1)
return out
def denormalize_bbox(normalized_bboxes):
rot_sin = normalized_bboxes[..., 6:7]
rot_cos = normalized_bboxes[..., 7:8]
rot = torch.atan2(rot_sin, rot_cos)
cx = normalized_bboxes[..., 0:1]
cy = normalized_bboxes[..., 1:2]
cz = normalized_bboxes[..., 4:5]
w = normalized_bboxes[..., 2:3].exp()
l = normalized_bboxes[..., 3:4].exp()
h = normalized_bboxes[..., 5:6].exp()
if normalized_bboxes.size(-1) > 8:
vx = normalized_bboxes[..., 8:9]
vy = normalized_bboxes[..., 9:10]
out = torch.cat([cx, cy, cz, w, l, h, rot, vx, vy], dim=-1)
else:
out = torch.cat([cx, cy, cz, w, l, h, rot], dim=-1)
return out
def encode_bbox(bboxes, pc_range=None):
xyz = bboxes[..., 0:3].clone()
wlh = bboxes[..., 3:6].log()
rot = bboxes[..., 6:7]
if pc_range is not None:
xyz[..., 0] = (xyz[..., 0] - pc_range[0]) / (pc_range[3] - pc_range[0])
xyz[..., 1] = (xyz[..., 1] - pc_range[1]) / (pc_range[4] - pc_range[1])
xyz[..., 2] = (xyz[..., 2] - pc_range[2]) / (pc_range[5] - pc_range[2])
if bboxes.shape[-1] > 7:
vel = bboxes[..., 7:9].clone()
return torch.cat([xyz, wlh, rot.sin(), rot.cos(), vel], dim=-1)
else:
return torch.cat([xyz, wlh, rot.sin(), rot.cos()], dim=-1)
def decode_bbox(bboxes, pc_range=None):
xyz = bboxes[..., 0:3].clone()
wlh = bboxes[..., 3:6].exp()
rot = torch.atan2(bboxes[..., 6:7], bboxes[..., 7:8])
if pc_range is not None:
xyz[..., 0] = xyz[..., 0] * (pc_range[3] - pc_range[0]) + pc_range[0]
xyz[..., 1] = xyz[..., 1] * (pc_range[4] - pc_range[1]) + pc_range[1]
xyz[..., 2] = xyz[..., 2] * (pc_range[5] - pc_range[2]) + pc_range[2]
if bboxes.shape[-1] > 8:
vel = bboxes[..., 8:10].clone()
return torch.cat([xyz, wlh, rot, vel], dim=-1)
else:
return torch.cat([xyz, wlh, rot], dim=-1)
def bbox2occrange(bboxes, occ_size, query_cube_size=None):
"""
xyz in [0, 1]
wlh in [0, 1]
"""
xyz = bboxes[..., 0:3].clone()
if query_cube_size is not None:
wlh = torch.zeros_like(xyz)
wlh[..., 0] = query_cube_size[0]
wlh[..., 1] = query_cube_size[1]
wlh[..., 2] = query_cube_size[2]
else:
wlh = bboxes[..., 3:6]
wlh[..., 0] = wlh[..., 0] * occ_size[0]
wlh[..., 1] = wlh[..., 1] * occ_size[1]
wlh[..., 2] = wlh[..., 2] * occ_size[2]
xyz[..., 0] = xyz[..., 0] * occ_size[0]
xyz[..., 1] = xyz[..., 1] * occ_size[1]
xyz[..., 2] = xyz[..., 2] * occ_size[2]
xyz = torch.round(xyz)
low_bound = torch.round(xyz - wlh/2)
high_bound = torch.round(xyz + wlh/2)
return torch.cat((low_bound, high_bound), dim=-1).long()
def occrange2bbox(occ_range, occ_size, pc_range):
"""
Return: xyz in [0, 1], wlh in [0, pc_range_size)
"""
xyz = (occ_range[..., :3] + occ_range[..., 3:]).to(torch.float32) / 2
xyz[..., 0] /= occ_size[0]
xyz[..., 1] /= occ_size[1]
xyz[..., 2] /= occ_size[2]
wlh = (occ_range[..., 3:] - occ_range[..., :3]).to(torch.float32)
wlh[..., 0] *= (pc_range[3] - pc_range[0]) / occ_size[0]
wlh[..., 1] *= (pc_range[4] - pc_range[1]) / occ_size[1]
wlh[..., 2] *= (pc_range[5] - pc_range[2]) / occ_size[2]
return torch.cat((xyz, wlh), dim=-1)
================================================
FILE: models/checkpoint.py
================================================
# This page is completely copied from https://pytorch.org/docs/stable/_modules/torch/utils/checkpoint.html#checkpoint
# If you are using torch 2.0 or higher, you can safely delete this page and import the related functions from official PyTorch
import torch
import warnings
import weakref
from typing import Any, Iterable, List, Tuple
__all__ = [
"checkpoint", "checkpoint_sequential", "CheckpointFunction",
"check_backward_validity", "detach_variable", "get_device_states",
"set_device_states",
]
def detach_variable(inputs: Tuple[Any, ...]) -> Tuple[torch.Tensor, ...]:
if isinstance(inputs, tuple):
out = []
for inp in inputs:
if not isinstance(inp, torch.Tensor):
out.append(inp)
continue
x = inp.detach()
x.requires_grad = inp.requires_grad
out.append(x)
return tuple(out)
else:
raise RuntimeError(
"Only tuple of tensors is supported. Got Unsupported input type: ", type(inputs).__name__)
def check_backward_validity(inputs: Iterable[Any]) -> None:
if not any(inp.requires_grad for inp in inputs if isinstance(inp, torch.Tensor)):
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
# We can't know if the run_fn will internally move some args to different devices,
# which would require logic to preserve rng states for those devices as well.
# We could paranoically stash and restore ALL the rng states for all visible devices,
# but that seems very wasteful for most cases. Compromise: Stash the RNG state for
# the device of all Tensor args.
#
# To consider: maybe get_device_states and set_device_states should reside in torch/random.py?
def get_device_states(*args) -> Tuple[List[int], List[torch.Tensor]]:
# This will not error out if "arg" is a CPU tensor or a non-tensor type because
# the conditionals short-circuit.
fwd_gpu_devices = list({arg.get_device() for arg in args
if isinstance(arg, torch.Tensor) and arg.is_cuda})
fwd_gpu_states = []
for device in fwd_gpu_devices:
with torch.cuda.device(device):
fwd_gpu_states.append(torch.cuda.get_rng_state())
return fwd_gpu_devices, fwd_gpu_states
def set_device_states(devices, states) -> None:
for device, state in zip(devices, states):
with torch.cuda.device(device):
torch.cuda.set_rng_state(state)
def _get_autocast_kwargs():
gpu_autocast_kwargs = {"enabled": torch.is_autocast_enabled(),
"dtype": torch.get_autocast_gpu_dtype(),
"cache_enabled": torch.is_autocast_cache_enabled()}
cpu_autocast_kwargs = {"enabled": torch.is_autocast_cpu_enabled(),
"dtype": torch.get_autocast_cpu_dtype(),
"cache_enabled": torch.is_autocast_cache_enabled()}
return gpu_autocast_kwargs, cpu_autocast_kwargs
class CheckpointFunction(torch.autograd.Function):
@staticmethod
def forward(ctx, run_function, preserve_rng_state, *args):
check_backward_validity(args)
ctx.run_function = run_function
ctx.preserve_rng_state = preserve_rng_state
# Accommodates the (remote) possibility that autocast is enabled for cpu AND gpu.
ctx.gpu_autocast_kwargs, ctx.cpu_autocast_kwargs = _get_autocast_kwargs()
if preserve_rng_state:
ctx.fwd_cpu_state = torch.get_rng_state()
# Don't eagerly initialize the cuda context by accident.
# (If the user intends that the context is initialized later, within their
# run_function, we SHOULD actually stash the cuda state here. Unfortunately,
# we have no way to anticipate this will happen before we run the function.)
ctx.had_cuda_in_fwd = False
if torch.cuda._initialized:
ctx.had_cuda_in_fwd = True
ctx.fwd_gpu_devices, ctx.fwd_gpu_states = get_device_states(*args)
# Save non-tensor inputs in ctx, keep a placeholder None for tensors
# to be filled out during the backward.
ctx.inputs = []
ctx.tensor_indices = []
tensor_inputs = []
for i, arg in enumerate(args):
if torch.is_tensor(arg):
tensor_inputs.append(arg)
ctx.tensor_indices.append(i)
ctx.inputs.append(None)
else:
ctx.inputs.append(arg)
ctx.save_for_backward(*tensor_inputs)
with torch.no_grad():
outputs = run_function(*args)
return outputs
@staticmethod
def backward(ctx, *args):
if not torch.autograd._is_checkpoint_valid():
raise RuntimeError(
"Checkpointing is not compatible with .grad() or when an `inputs` parameter"
" is passed to .backward(). Please use .backward() and do not pass its `inputs`"
" argument.")
# Copy the list to avoid modifying original list.
inputs = list(ctx.inputs)
tensor_indices = ctx.tensor_indices
tensors = ctx.saved_tensors
# Fill in inputs with appropriate saved tensors.
for i, idx in enumerate(tensor_indices):
inputs[idx] = tensors[i]
# Stash the surrounding rng state, and mimic the state that was
# present at this time during forward. Restore the surrounding state
# when we're done.
rng_devices = []
if ctx.preserve_rng_state and ctx.had_cuda_in_fwd:
rng_devices = ctx.fwd_gpu_devices
with torch.random.fork_rng(devices=rng_dev
gitextract_8j2gi4im/ ├── .gitignore ├── LICENSE ├── README.md ├── configs/ │ ├── r50_nuimg_704x256_8f.py │ ├── r50_nuimg_704x256_8f_60e.py │ ├── r50_nuimg_704x256_8f_openocc.py │ └── r50_nuimg_704x256_8f_pano.py ├── gen_instance_info.py ├── gen_sweep_info.py ├── lib/ │ └── dvr/ │ ├── dvr.cpp │ └── dvr.cu ├── loaders/ │ ├── __init__.py │ ├── builder.py │ ├── ego_pose_dataset.py │ ├── nuscenes_dataset.py │ ├── nuscenes_occ_dataset.py │ ├── old_metrics.py │ ├── pipelines/ │ │ ├── __init__.py │ │ ├── loading.py │ │ └── transforms.py │ ├── ray_metrics.py │ └── ray_pq.py ├── models/ │ ├── __init__.py │ ├── backbones/ │ │ ├── __init__.py │ │ └── vovnet.py │ ├── bbox/ │ │ ├── __init__.py │ │ ├── assigners/ │ │ │ ├── __init__.py │ │ │ └── hungarian_assigner_3d.py │ │ ├── coders/ │ │ │ ├── __init__.py │ │ │ └── nms_free_coder.py │ │ ├── match_costs/ │ │ │ ├── __init__.py │ │ │ └── match_cost.py │ │ └── utils.py │ ├── checkpoint.py │ ├── csrc/ │ │ ├── __init__.py │ │ ├── msmv_sampling/ │ │ │ ├── msmv_sampling.cpp │ │ │ ├── msmv_sampling.h │ │ │ ├── msmv_sampling_backward.cu │ │ │ └── msmv_sampling_forward.cu │ │ ├── setup.py │ │ └── wrapper.py │ ├── loss_utils.py │ ├── matcher.py │ ├── sparse_voxel_decoder.py │ ├── sparsebev_head.py │ ├── sparsebev_sampling.py │ ├── sparsebev_transformer.py │ ├── sparseocc.py │ ├── sparseocc_head.py │ ├── sparseocc_transformer.py │ └── utils.py ├── old_metrics.py ├── ray_metrics.py ├── timing.py ├── train.py ├── utils.py ├── val.py └── viz_prediction.py
SYMBOL INDEX (353 symbols across 38 files)
FILE: gen_instance_info.py
function convert_to_nusc_box (line 39) | def convert_to_nusc_box(bboxes, lift_center=False, wlh_margin=0.0):
function meshgrid3d (line 63) | def meshgrid3d(occ_size, pc_range): # points in ego coord
function process_add_instance_info (line 77) | def process_add_instance_info(sample):
function add_instance_info (line 200) | def add_instance_info(sample_infos):
FILE: gen_sweep_info.py
function get_cam_info (line 18) | def get_cam_info(nusc, sample_data):
function add_sweep_info (line 40) | def add_sweep_info(nusc, sample_infos):
FILE: lib/dvr/dvr.cpp
function render_forward (line 39) | std::vector<torch::Tensor>
function render (line 51) | std::vector<torch::Tensor> render(torch::Tensor sigma, torch::Tensor ori...
function init (line 61) | torch::Tensor init(torch::Tensor points, torch::Tensor tindex,
function PYBIND11_MODULE (line 68) | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
FILE: loaders/builder.py
function build_dataloader (line 9) | def build_dataloader(dataset,
FILE: loaders/ego_pose_dataset.py
function trans_matrix (line 8) | def trans_matrix(T, R):
class EgoPoseDataset (line 16) | class EgoPoseDataset(Dataset):
method __init__ (line 17) | def __init__(self, data_infos):
method __len__ (line 29) | def __len__(self):
method get_ego_from_lidar (line 32) | def get_ego_from_lidar(self, info):
method get_global_pose (line 38) | def get_global_pose(self, info, inverse=False):
method __getitem__ (line 50) | def __getitem__(self, idx):
FILE: loaders/nuscenes_dataset.py
class CustomNuScenesDataset (line 9) | class CustomNuScenesDataset(NuScenesDataset):
method collect_sweeps (line 11) | def collect_sweeps(self, index, into_past=60, into_future=0):
method get_data_info (line 34) | def get_data_info(self, index):
FILE: loaders/nuscenes_occ_dataset.py
class NuSceneOcc (line 19) | class NuSceneOcc(NuScenesDataset):
method __init__ (line 20) | def __init__(self, occ_gt_root, *args, **kwargs):
method collect_sweeps (line 35) | def collect_sweeps(self, index, into_past=150, into_future=0):
method get_data_info (line 58) | def get_data_info(self, index):
method evaluate (line 118) | def evaluate(self, occ_results, runner=None, show_dir=None, **eval_kwa...
method format_results (line 177) | def format_results(self, occ_results, submission_prefix, **kwargs):
FILE: loaders/old_metrics.py
function pcolor (line 12) | def pcolor(string, color, on_color=None, attrs=None):
function getCellCoordinates (line 35) | def getCellCoordinates(points, voxelSize):
function getNumUniqueCells (line 39) | def getNumUniqueCells(cells):
class Metric_mIoU (line 44) | class Metric_mIoU():
method __init__ (line 45) | def __init__(self,
method hist_info (line 76) | def hist_info(self, n_cl, pred, gt):
method per_class_iu (line 104) | def per_class_iu(self, hist):
method compute_mIoU (line 110) | def compute_mIoU(self, pred, label, n_classes):
method add_batch (line 120) | def add_batch(self,semantics_pred,semantics_gt,mask_lidar,mask_camera):
method count_miou (line 143) | def count_miou(self):
class Metric_FScore (line 156) | class Metric_FScore():
method __init__ (line 157) | def __init__(self,
method voxel2points (line 181) | def voxel2points(self, voxel):
method add_batch (line 193) | def add_batch(self,semantics_pred,semantics_gt,mask_lidar,mask_camera ):
method count_fscore (line 237) | def count_fscore(self,):
class Metric_mRecall (line 242) | class Metric_mRecall():
method __init__ (line 243) | def __init__(self,
method hist_info (line 276) | def hist_info(self, n_cl, p_cl, pred, gt):
method per_class_recall (line 304) | def per_class_recall(self, hist):
method compute_mRecall (line 307) | def compute_mRecall(self, pred, label, n_classes, p_classes):
method add_batch (line 317) | def add_batch(self,semantics_pred,semantics_gt,mask_lidar,mask_camera):
method count_mrecall (line 338) | def count_mrecall(self):
class Metric_Panoptic (line 351) | class Metric_Panoptic():
method __init__ (line 352) | def __init__(self,
method add_batch (line 393) | def add_batch(self,semantics_pred,semantics_gt,instances_pred,instance...
method add_panoptic_sample (line 412) | def add_panoptic_sample(self, semantics_pred, semantics_gt, instances_...
method count_pq (line 531) | def count_pq(self, ):
FILE: loaders/pipelines/loading.py
function compose_lidar2img (line 13) | def compose_lidar2img(ego2global_translation_curr,
class LoadMultiViewImageFromMultiSweeps (line 40) | class LoadMultiViewImageFromMultiSweeps(object):
method __init__ (line 41) | def __init__(self,
method load_offline (line 57) | def load_offline(self, results):
method load_online (line 111) | def load_online(self, results):
method __call__ (line 158) | def __call__(self, results):
class LoadOccGTFromFile (line 170) | class LoadOccGTFromFile(object):
method __init__ (line 171) | def __init__(self, num_classes=18, inst_class_ids=[]):
method __call__ (line 175) | def __call__(self, results):
class BEVAug (line 246) | class BEVAug(object):
method __init__ (line 247) | def __init__(self, bda_aug_conf, classes, is_train=True):
method sample_bda_augmentation (line 252) | def sample_bda_augmentation(self):
method bev_transform (line 266) | def bev_transform(self, rotate_angle, scale_ratio, flip_dx, flip_dy):
method __call__ (line 290) | def __call__(self, results):
FILE: loaders/pipelines/transforms.py
class PadMultiViewImage (line 10) | class PadMultiViewImage(object):
method __init__ (line 21) | def __init__(self, size=None, size_divisor=None, pad_val=0):
method _pad_img (line 29) | def _pad_img(self, img):
method _pad_imgs (line 40) | def _pad_imgs(self, results):
method __call__ (line 50) | def __call__(self, results):
method __repr__ (line 60) | def __repr__(self):
class NormalizeMultiviewImage (line 69) | class NormalizeMultiviewImage(object):
method __init__ (line 79) | def __init__(self, mean, std, to_rgb=True):
method __call__ (line 84) | def __call__(self, results):
method __repr__ (line 110) | def __repr__(self):
class PhotoMetricDistortionMultiViewImage (line 117) | class PhotoMetricDistortionMultiViewImage:
method __init__ (line 136) | def __init__(self,
method __call__ (line 146) | def __call__(self, results):
method __repr__ (line 207) | def __repr__(self):
class RandomTransformImage (line 219) | class RandomTransformImage(object):
method __init__ (line 220) | def __init__(self, ida_aug_conf=None, training=True):
method __call__ (line 224) | def __call__(self, results):
method img_transform (line 270) | def img_transform(self, img, resize, resize_dims, crop, flip, rotate):
method sample_augmentation (line 313) | def sample_augmentation(self):
class GlobalRotScaleTransImage (line 345) | class GlobalRotScaleTransImage(object):
method __init__ (line 346) | def __init__(self,
method __call__ (line 354) | def __call__(self, results):
method rotate_z (line 369) | def rotate_z(self, results, rot_angle):
method scale_xyz (line 384) | def scale_xyz(self, results, scale_ratio):
FILE: loaders/ray_metrics.py
function get_rendered_pcds (line 19) | def get_rendered_pcds(origin, points, tindex, pred_dist):
function meshgrid3d (line 37) | def meshgrid3d(occ_size, pc_range):
function generate_lidar_rays (line 51) | def generate_lidar_rays():
function process_one_sample (line 77) | def process_one_sample(sem_pred, lidar_rays, output_origin, instance_pre...
function calc_rayiou (line 138) | def calc_rayiou(pcd_pred_list, pcd_gt_list, occ_class_names):
function main_rayiou (line 175) | def main_rayiou(sem_pred_list, sem_gt_list, lidar_origin_list, occ_class...
function main_raypq (line 231) | def main_raypq(sem_pred_list, sem_gt_list, inst_pred_list, inst_gt_list,...
FILE: loaders/ray_pq.py
class Metric_RayPQ (line 5) | class Metric_RayPQ:
method __init__ (line 6) | def __init__(self,
method add_batch (line 36) | def add_batch(self,semantics_pred,semantics_gt,instances_pred,instance...
method add_panoptic_sample (line 40) | def add_panoptic_sample(self, semantics_pred, semantics_gt, instances_...
method count_pq (line 155) | def count_pq(self):
FILE: models/backbones/vovnet.py
function dw_conv3x3 (line 93) | def dw_conv3x3(in_channels, out_channels, module_name, postfix, stride=1...
function conv3x3 (line 117) | def conv3x3(in_channels, out_channels, module_name, postfix, stride=1, g...
function conv1x1 (line 137) | def conv1x1(in_channels, out_channels, module_name, postfix, stride=1, g...
class Hsigmoid (line 157) | class Hsigmoid(nn.Module):
method __init__ (line 158) | def __init__(self, inplace=True):
method forward (line 162) | def forward(self, x):
class eSEModule (line 166) | class eSEModule(nn.Module):
method __init__ (line 167) | def __init__(self, channel, reduction=4):
method forward (line 173) | def forward(self, x):
class _OSA_module (line 181) | class _OSA_module(nn.Module):
method __init__ (line 182) | def __init__(self, in_ch, stage_ch, concat_ch, layer_per_block, module...
method _forward (line 211) | def _forward(self, x):
method forward (line 234) | def forward(self, x):
class _OSA_stage (line 241) | class _OSA_stage(nn.Sequential):
method __init__ (line 242) | def __init__(self, in_ch, stage_ch, concat_ch, block_per_stage, layer_...
class VoVNet (line 276) | class VoVNet(BaseModule):
method __init__ (line 277) | def __init__(self, spec_name, input_ch=3, out_features=None, frozen_st...
method _initialize_weights (line 344) | def _initialize_weights(self):
method forward (line 349) | def forward(self, x):
method _freeze_stages (line 361) | def _freeze_stages(self):
method train (line 374) | def train(self, mode=True):
FILE: models/bbox/assigners/hungarian_assigner_3d.py
class HungarianAssigner3D (line 16) | class HungarianAssigner3D(BaseAssigner):
method __init__ (line 17) | def __init__(self,
method assign (line 27) | def assign(self,
FILE: models/bbox/coders/nms_free_coder.py
class NMSFreeCoder (line 9) | class NMSFreeCoder(BaseBBoxCoder):
method __init__ (line 20) | def __init__(self,
method encode (line 34) | def encode(self):
method decode_single (line 37) | def decode_single(self, cls_scores, bbox_preds):
method decode (line 90) | def decode(self, preds_dicts):
FILE: models/bbox/match_costs/match_cost.py
class BBox3DL1Cost (line 6) | class BBox3DL1Cost(object):
method __init__ (line 12) | def __init__(self, weight=1.0):
method __call__ (line 15) | def __call__(self, bbox_pred, gt_bboxes):
class BBoxBEVL1Cost (line 31) | class BBoxBEVL1Cost(object):
method __init__ (line 32) | def __init__(self, weight, pc_range):
method __call__ (line 36) | def __call__(self, bboxes, gt_bboxes):
class IoU3DCost (line 47) | class IoU3DCost(object):
method __init__ (line 48) | def __init__(self, weight):
method __call__ (line 51) | def __call__(self, iou):
FILE: models/bbox/utils.py
function normalize_bbox (line 4) | def normalize_bbox(bboxes):
function denormalize_bbox (line 23) | def denormalize_bbox(normalized_bboxes):
function encode_bbox (line 46) | def encode_bbox(bboxes, pc_range=None):
function decode_bbox (line 63) | def decode_bbox(bboxes, pc_range=None):
function bbox2occrange (line 79) | def bbox2occrange(bboxes, occ_size, query_cube_size=None):
function occrange2bbox (line 107) | def occrange2bbox(occ_range, occ_size, pc_range):
FILE: models/checkpoint.py
function detach_variable (line 15) | def detach_variable(inputs: Tuple[Any, ...]) -> Tuple[torch.Tensor, ...]:
function check_backward_validity (line 32) | def check_backward_validity(inputs: Iterable[Any]) -> None:
function get_device_states (line 44) | def get_device_states(*args) -> Tuple[List[int], List[torch.Tensor]]:
function set_device_states (line 58) | def set_device_states(devices, states) -> None:
function _get_autocast_kwargs (line 63) | def _get_autocast_kwargs():
class CheckpointFunction (line 74) | class CheckpointFunction(torch.autograd.Function):
method forward (line 77) | def forward(ctx, run_function, preserve_rng_state, *args):
method backward (line 114) | def backward(ctx, *args):
function checkpoint (line 167) | def checkpoint(function, *args, use_reentrant: bool = True, **kwargs):
function checkpoint_sequential (line 262) | def checkpoint_sequential(functions, segments, input, use_reentrant=True...
function _checkpoint_without_reentrant (line 342) | def _checkpoint_without_reentrant(function, preserve_rng_state=True, *ar...
FILE: models/csrc/msmv_sampling/msmv_sampling.cpp
function ms_deform_attn_cuda_c2345_forward (line 98) | at::Tensor ms_deform_attn_cuda_c2345_forward(
function ms_deform_attn_cuda_c23456_forward (line 152) | at::Tensor ms_deform_attn_cuda_c23456_forward(
function ms_deform_attn_cuda_c2345_backward (line 212) | std::vector<at::Tensor> ms_deform_attn_cuda_c2345_backward(
function ms_deform_attn_cuda_c23456_backward (line 283) | std::vector<at::Tensor> ms_deform_attn_cuda_c23456_backward(
function PYBIND11_MODULE (line 363) | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
FILE: models/csrc/setup.py
function get_ext_modules (line 5) | def get_ext_modules():
FILE: models/csrc/wrapper.py
function msmv_sampling_pytorch (line 7) | def msmv_sampling_pytorch(mlvl_feats, sampling_locations, scale_weights):
class MSMVSamplingC2345 (line 35) | class MSMVSamplingC2345(torch.autograd.Function):
method forward (line 37) | def forward(ctx, feat_c2, feat_c3, feat_c4, feat_c5, sampling_location...
method backward (line 46) | def backward(ctx, grad_output):
class MSMVSamplingC23456 (line 58) | class MSMVSamplingC23456(torch.autograd.Function):
method forward (line 60) | def forward(ctx, feat_c2, feat_c3, feat_c4, feat_c5, feat_c6, sampling...
method backward (line 69) | def backward(ctx, grad_output):
function msmv_sampling (line 81) | def msmv_sampling(mlvl_feats, sampling_locations, scale_weights):
FILE: models/loss_utils.py
function get_voxel_decoder_loss_input (line 11) | def get_voxel_decoder_loss_input(voxel_semantics, occ_loc_i, seg_pred_i,...
function compute_scal_loss (line 34) | def compute_scal_loss(pred, gt, class_id, reverse=False, ignore_index=255):
class GeoScalLoss (line 85) | class GeoScalLoss(nn.Module):
method __init__ (line 86) | def __init__(self,
method forward (line 93) | def forward(self, pred, gt):
class SemScalLoss (line 102) | class SemScalLoss(nn.Module):
method __init__ (line 103) | def __init__(self,
method forward (line 116) | def forward(self, pred, gt):
function dice_loss (line 130) | def dice_loss(
function sigmoid_ce_loss (line 164) | def sigmoid_ce_loss(
function CE_ssc_loss (line 196) | def CE_ssc_loss(pred, target, class_weights=None, ignore_index=255):
function lovasz_grad (line 211) | def lovasz_grad(gt_sorted):
function lovasz_softmax (line 227) | def lovasz_softmax(probas, labels, classes='present', per_image=False, i...
function lovasz_softmax_flat (line 247) | def lovasz_softmax_flat(probas, labels, classes='present'):
function flatten_probas (line 279) | def flatten_probas(probas, labels, ignore=None):
class Mask2FormerLoss (line 311) | class Mask2FormerLoss(nn.Module):
method __init__ (line 312) | def __init__(self,
method forward (line 334) | def forward(self, mask_pred, class_pred, mask_gt, class_gt, indices, m...
method loss_masks (line 376) | def loss_masks(self, src_mask, tgt_mask, avg_factor=None, mask_camera=...
method loss_labels (line 394) | def loss_labels(self, src_class, tgt_class, empty_weight=None, avg_fac...
function mean (line 403) | def mean(l, empty=0):
FILE: models/matcher.py
function batch_dice_loss (line 12) | def batch_dice_loss(inputs: torch.Tensor, targets: torch.Tensor, mask_ca...
function batch_sigmoid_ce_loss (line 39) | def batch_sigmoid_ce_loss(inputs: torch.Tensor, targets: torch.Tensor, m...
class HungarianMatcher (line 84) | class HungarianMatcher(BaseModule):
method __init__ (line 92) | def __init__(self, cost_class: float = 1, cost_mask: float = 1, cost_d...
method forward (line 110) | def forward(self, mask_pred, class_pred, mask_gt, class_gt, mask_camera):
FILE: models/sparse_voxel_decoder.py
function index2point (line 11) | def index2point(coords, pc_range, voxel_size):
function point2bbox (line 22) | def point2bbox(coords, box_size):
function upsample (line 32) | def upsample(pre_feat, pre_coords, interval):
class SparseVoxelDecoder (line 55) | class SparseVoxelDecoder(BaseModule):
method __init__ (line 56) | def __init__(self,
method init_weights (line 106) | def init_weights(self):
method forward (line 110) | def forward(self, mlvl_feats, img_metas):
class SparseVoxelDecoderLayer (line 165) | class SparseVoxelDecoderLayer(BaseModule):
method __init__ (line 166) | def __init__(self,
method init_weights (line 211) | def init_weights(self):
method forward (line 218) | def forward(self, query_feat, query_bbox, mlvl_feats, img_metas):
FILE: models/sparsebev_head.py
class SparseBEVHead (line 14) | class SparseBEVHead(DETRHead):
method __init__ (line 15) | def __init__(self,
method _init_layers (line 48) | def _init_layers(self):
method init_weights (line 65) | def init_weights(self):
method forward (line 68) | def forward(self, mlvl_feats, img_metas):
method prepare_for_dn_input (line 117) | def prepare_for_dn_input(self, batch_size, init_query_bbox, label_enc,...
method prepare_for_dn_loss (line 218) | def prepare_for_dn_loss(self, mask_dict):
method dn_loss_single (line 233) | def dn_loss_single(self,
method calc_dn_loss (line 272) | def calc_dn_loss(self, loss_dict, preds_dicts, num_dec_layers):
method _get_target_single (line 295) | def _get_target_single(self,
method get_targets (line 323) | def get_targets(self,
method loss_single (line 343) | def loss_single(self,
method loss (line 399) | def loss(self,
method get_bboxes (line 457) | def get_bboxes(self, preds_dicts, img_metas, rescale=False):
FILE: models/sparsebev_sampling.py
function make_sample_points_from_bbox (line 7) | def make_sample_points_from_bbox(query_bbox, offset, pc_range):
function make_sample_points_from_mask (line 32) | def make_sample_points_from_mask(valid_map, pc_range, occ_size, num_poin...
function sampling_4d (line 83) | def sampling_4d(sample_points, mlvl_feats, scale_weights, lidar2img, ima...
FILE: models/sparsebev_transformer.py
class SparseBEVTransformer (line 16) | class SparseBEVTransformer(BaseModule):
method __init__ (line 17) | def __init__(self, embed_dims, num_frames=8, num_points=4, num_layers=...
method init_weights (line 28) | def init_weights(self):
method forward (line 31) | def forward(self, query_bbox, query_feat, mlvl_feats, attn_mask, img_m...
class SparseBEVTransformerDecoder (line 40) | class SparseBEVTransformerDecoder(BaseModule):
method __init__ (line 41) | def __init__(self, embed_dims, num_frames=8, num_points=4, num_layers=...
method init_weights (line 51) | def init_weights(self):
method forward (line 54) | def forward(self, query_bbox, query_feat, mlvl_feats, attn_mask, img_m...
class SparseBEVTransformerDecoderLayer (line 93) | class SparseBEVTransformerDecoderLayer(BaseModule):
method __init__ (line 94) | def __init__(self, embed_dims, num_frames=8, num_points=4, num_levels=...
method init_weights (line 136) | def init_weights(self):
method refine_bbox (line 144) | def refine_bbox(self, bbox_proposal, bbox_delta):
method forward (line 151) | def forward(self, query_bbox, query_feat, mlvl_feats, attn_mask, img_m...
class SparseBEVSelfAttention (line 184) | class SparseBEVSelfAttention(BaseModule):
method __init__ (line 185) | def __init__(self, embed_dims=256, num_heads=8, dropout=0.1, pc_range=...
method init_weights (line 197) | def init_weights(self):
method inner_forward (line 202) | def inner_forward(self, query_bbox, query_feat, pre_attn_mask=None):
method forward (line 224) | def forward(self, query_bbox, query_feat, pre_attn_mask=None):
method calc_bbox_dists (line 231) | def calc_bbox_dists(self, bboxes):
class SparseBEVSampling (line 245) | class SparseBEVSampling(BaseModule):
method __init__ (line 246) | def __init__(self, embed_dims=256, num_frames=4, num_groups=4, num_poi...
method init_weights (line 258) | def init_weights(self):
method inner_forward (line 263) | def inner_forward(self, query_bbox, query_feat, mlvl_feats, img_metas):
method forward (line 307) | def forward(self, query_bbox, query_feat, mlvl_feats, img_metas):
class AdaptiveMixing (line 314) | class AdaptiveMixing(nn.Module):
method __init__ (line 315) | def __init__(self, in_dim, in_points, n_groups=1, query_dim=None, out_...
method init_weights (line 341) | def init_weights(self):
method inner_forward (line 344) | def inner_forward(self, x, query):
method forward (line 376) | def forward(self, x, query):
class AdaptiveMixingPointOnly (line 383) | class AdaptiveMixingPointOnly(nn.Module):
method __init__ (line 384) | def __init__(self, in_dim, in_points, n_groups=1, query_dim=None, out_...
method init_weights (line 409) | def init_weights(self):
method inner_forward (line 412) | def inner_forward(self, x, query):
method forward (line 437) | def forward(self, x, query):
class DeformAggregation (line 444) | class DeformAggregation(nn.Module):
method __init__ (line 445) | def __init__(self, in_dim, in_points, n_groups=1, query_dim=None, out_...
method init_weights (line 467) | def init_weights(self):
method inner_forward (line 470) | def inner_forward(self, x, query):
method forward (line 488) | def forward(self, x, query):
FILE: models/sparseocc.py
class SparseOcc (line 13) | class SparseOcc(MVXTwoStageDetector):
method __init__ (line 14) | def __init__(self,
method extract_img_feat (line 48) | def extract_img_feat(self, img):
method extract_feat (line 60) | def extract_feat(self, img, img_metas=None):
method forward_pts_train (line 109) | def forward_pts_train(self, mlvl_feats, voxel_semantics, voxel_instanc...
method forward (line 119) | def forward(self, return_loss=True, **kwargs):
method forward_train (line 126) | def forward_train(self, img_metas=None, img=None, voxel_semantics=None...
method forward_test (line 130) | def forward_test(self, img_metas, img=None, **kwargs):
method simple_test_pts (line 155) | def simple_test_pts(self, x, img_metas, rescale=False):
method simple_test (line 160) | def simple_test(self, img_metas, img=None, rescale=False):
method simple_test_offline (line 167) | def simple_test_offline(self, img_metas, img=None, rescale=False):
method simple_test_online (line 171) | def simple_test_online(self, img_metas, img=None, rescale=False):
FILE: models/sparseocc_head.py
class SparseOccHead (line 19) | class SparseOccHead(nn.Module):
method __init__ (line 20) | def __init__(self,
method init_weights (line 45) | def init_weights(self):
method forward (line 49) | def forward(self, mlvl_feats, img_metas):
method loss (line 59) | def loss(self, voxel_semantics, voxel_instances, instance_class_ids, p...
method loss_single (line 62) | def loss_single(self, voxel_semantics, voxel_instances, instance_class...
method merge_occ_pred (line 131) | def merge_occ_pred(self, outs):
method merge_semseg (line 148) | def merge_semseg(self, mask_cls, mask_pred):
method merge_panoseg (line 160) | def merge_panoseg(self, mask_cls, mask_pred):
method merge_panoseg_single (line 176) | def merge_panoseg_single(self, mask_cls, mask_pred):
FILE: models/sparseocc_transformer.py
class SparseOccTransformer (line 16) | class SparseOccTransformer(BaseModule):
method __init__ (line 17) | def __init__(self,
method init_weights (line 60) | def init_weights(self):
method forward (line 64) | def forward(self, mlvl_feats, img_metas):
class MaskFormerOccDecoder (line 87) | class MaskFormerOccDecoder(BaseModule):
method __init__ (line 88) | def __init__(self,
method init_weights (line 121) | def init_weights(self):
method forward (line 124) | def forward(self, occ_preds, mlvl_feats, img_metas):
class MaskFormerOccDecoderLayer (line 146) | class MaskFormerOccDecoderLayer(BaseModule):
method __init__ (line 147) | def __init__(self,
method init_weights (line 175) | def init_weights(self):
method forward (line 181) | def forward(self, query_feat, valid_map, mask_pred, occ_preds, mlvl_fe...
method pred_segmentation (line 202) | def pred_segmentation(self, query_feat, mask_feat):
method inner_pred_segmentation (line 208) | def inner_pred_segmentation(self, query_feat, mask_feat):
class MaskFormerSelfAttention (line 217) | class MaskFormerSelfAttention(BaseModule):
method __init__ (line 218) | def __init__(self, embed_dims, num_heads, dropout=0.0):
method init_weights (line 224) | def init_weights(self):
method with_pos_embed (line 229) | def with_pos_embed(self, tensor, pos=None):
method inner_forward (line 232) | def inner_forward(self, query, mask = None, key_padding_mask = None,qu...
method forward (line 238) | def forward(self, query, mask = None, key_padding_mask = None,query_po...
class MaskFormerSampling (line 245) | class MaskFormerSampling(BaseModule):
method __init__ (line 246) | def __init__(self, embed_dims=256, num_frames=4, num_groups=4, num_poi...
method init_weights (line 259) | def init_weights(self, ):
method inner_forward (line 263) | def inner_forward(self, query_feat, valid_map, occ_loc, mlvl_feats, im...
method forward (line 293) | def forward(self, query_feat, valid_map, occ_loc, mlvl_feats, img_met...
FILE: models/utils.py
function conv3d_gn_relu (line 8) | def conv3d_gn_relu(in_channels, out_channels, kernel_size=1, stride=1):
function deconv3d_gn_relu (line 16) | def deconv3d_gn_relu(in_channels, out_channels, kernel_size=2, stride=2):
function sparse2dense (line 24) | def sparse2dense(indices, value, dense_shape, empty_value=0):
function generate_grid (line 38) | def generate_grid(n_vox, interval):
function batch_indexing (line 46) | def batch_indexing(batched_data: torch.Tensor, batched_indices: torch.Te...
function rotation_3d_in_axis (line 92) | def rotation_3d_in_axis(points, angles):
function inverse_sigmoid (line 123) | def inverse_sigmoid(x, eps=1e-5):
function pad_multiple (line 141) | def pad_multiple(inputs, img_metas, size_divisor=32):
function rgb_to_hsv (line 160) | def rgb_to_hsv(image: torch.Tensor, eps: float = 1e-8) -> torch.Tensor:
function hsv_to_rgb (line 215) | def hsv_to_rgb(image: torch.Tensor) -> torch.Tensor:
class GpuPhotoMetricDistortion (line 256) | class GpuPhotoMetricDistortion:
method __init__ (line 275) | def __init__(self,
method __call__ (line 285) | def __call__(self, imgs):
class DumpConfig (line 345) | class DumpConfig:
method __init__ (line 346) | def __init__(self):
FILE: old_metrics.py
function main (line 10) | def main(args):
FILE: ray_metrics.py
function main (line 14) | def main(args):
FILE: timing.py
function main (line 17) | def main():
FILE: train.py
function main (line 20) | def main():
FILE: utils.py
function init_logging (line 16) | def init_logging(filename=None, debug=False):
function backup_code (line 30) | def backup_code(work_dir, verbose=False):
class MyTextLoggerHook (line 45) | class MyTextLoggerHook(TextLoggerHook):
method _log_info (line 46) | def _log_info(self, log_dict, runner):
method log (line 84) | def log(self, runner):
method after_train_epoch (line 121) | def after_train_epoch(self, runner):
class MyTensorboardLoggerHook (line 132) | class MyTensorboardLoggerHook(LoggerHook):
method __init__ (line 133) | def __init__(self, log_dir=None, interval=10, ignore_last=True, reset_...
method before_run (line 139) | def before_run(self, runner):
method log (line 146) | def log(self, runner):
method after_run (line 179) | def after_run(self, runner):
class MyWandbLoggerHook (line 185) | class MyWandbLoggerHook(LoggerHook):
method __init__ (line 225) | def __init__(self, log_dir=None, project_name=None, team_name=None, ex...
method import_wandb (line 244) | def import_wandb(self) -> None:
method before_run (line 253) | def before_run(self, runner) -> None:
method log (line 276) | def log(self, runner) -> None:
method after_run (line 296) | def after_run(self, runner) -> None:
FILE: val.py
function evaluate (line 18) | def evaluate(dataset, results):
function main (line 28) | def main():
FILE: viz_prediction.py
function occ2img (line 42) | def occ2img(semantics):
function main (line 59) | def main():
Condensed preview — 58 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (434K chars).
[
{
"path": ".gitignore",
"chars": 513,
"preview": "# OS generated files\n.DS_Store\n.DS_Store?\n._*\n.Spotlight-V100\n.Trashes\nehthumbs.db\nThumbs.db\n\n# Compiled source\nbuild\nde"
},
{
"path": "LICENSE",
"chars": 11357,
"preview": " Apache License\n Version 2.0, January 2004\n "
},
{
"path": "README.md",
"chars": 10893,
"preview": "# SparseOcc\n\nThis is the official PyTorch implementation for our paper:\n\n> [**Fully Sparse 3D Panoptic Occupancy Predict"
},
{
"path": "configs/r50_nuimg_704x256_8f.py",
"chars": 6939,
"preview": "dataset_type = 'NuSceneOcc'\ndataset_root = 'data/nuscenes/'\nocc_gt_root = 'data/nuscenes/occ3d'\n\n# If point cloud range "
},
{
"path": "configs/r50_nuimg_704x256_8f_60e.py",
"chars": 273,
"preview": "_base_ = ['./r50_nuimg_704x256_8f.py']\n\nlr_config = dict(\n policy='step',\n warmup='linear',\n warmup_iters=500,\n"
},
{
"path": "configs/r50_nuimg_704x256_8f_openocc.py",
"chars": 2574,
"preview": "_base_ = ['./r50_nuimg_704x256_8f.py']\n\nocc_gt_root = 'data/nuscenes/openocc_v2'\n\ndet_class_names = [\n 'car', 'truck'"
},
{
"path": "configs/r50_nuimg_704x256_8f_pano.py",
"chars": 2802,
"preview": "_base_ = ['./r50_nuimg_704x256_8f.py']\n\nocc_gt_root = 'data/nuscenes/occ3d_panoptic'\n\n# For nuScenes we usually do 10-cl"
},
{
"path": "gen_instance_info.py",
"chars": 8738,
"preview": "import os\nimport tqdm\nimport glob\nimport pickle\nimport argparse\nimport numpy as np\nimport torch\nimport multiprocessing\nf"
},
{
"path": "gen_sweep_info.py",
"chars": 4163,
"preview": "# Generate info files manually\nimport os\nimport mmcv\nimport tqdm\nimport pickle\nimport argparse\nimport numpy as np\nfrom n"
},
{
"path": "lib/dvr/dvr.cpp",
"chars": 2568,
"preview": "// Acknowledgments: https://github.com/tarashakhurana/4d-occ-forecasting\n// Modified by Haisong Liu\n\n#include <string>\n#"
},
{
"path": "lib/dvr/dvr.cu",
"chars": 26778,
"preview": "// Acknowledgments: https://github.com/tarashakhurana/4d-occ-forecasting\n// Modified by Haisong Liu\n\n#include <torch/ext"
},
{
"path": "loaders/__init__.py",
"chars": 185,
"preview": "from .pipelines import __all__\nfrom .nuscenes_dataset import CustomNuScenesDataset\nfrom .nuscenes_occ_dataset import NuS"
},
{
"path": "loaders/builder.py",
"chars": 1708,
"preview": "from functools import partial\nfrom mmcv.parallel import collate\nfrom mmcv.runner import get_dist_info\nfrom torch.utils.d"
},
{
"path": "loaders/ego_pose_dataset.py",
"chars": 3222,
"preview": "import torch\nimport numpy as np\nfrom pyquaternion import Quaternion\nfrom torch.utils.data import Dataset\nnp.set_printopt"
},
{
"path": "loaders/nuscenes_dataset.py",
"chars": 3405,
"preview": "import os\nimport numpy as np\nfrom mmdet.datasets import DATASETS\nfrom mmdet3d.datasets import NuScenesDataset\nfrom pyqua"
},
{
"path": "loaders/nuscenes_occ_dataset.py",
"chars": 8093,
"preview": "import os\nimport mmcv\nimport glob\nimport torch\nimport numpy as np\nfrom tqdm import tqdm\nfrom mmdet.datasets import DATAS"
},
{
"path": "loaders/old_metrics.py",
"chars": 23417,
"preview": "import os\nimport numpy as np\nfrom sklearn.neighbors import KDTree\nfrom termcolor import colored\nfrom functools import re"
},
{
"path": "loaders/pipelines/__init__.py",
"chars": 345,
"preview": "from .loading import LoadMultiViewImageFromMultiSweeps, LoadOccGTFromFile\nfrom .transforms import PadMultiViewImage, Nor"
},
{
"path": "loaders/pipelines/loading.py",
"chars": 13433,
"preview": "import os\nimport mmcv\nimport torch\nimport numpy as np\nfrom mmdet.datasets.builder import PIPELINES\nfrom numpy.linalg imp"
},
{
"path": "loaders/pipelines/transforms.py",
"chars": 14088,
"preview": "import mmcv\nimport torch\nimport numpy as np\nfrom PIL import Image\nfrom numpy import random\nfrom mmdet.datasets.builder i"
},
{
"path": "loaders/ray_metrics.py",
"chars": 9975,
"preview": "# Acknowledgments: https://github.com/tarashakhurana/4d-occ-forecasting\n# Modified by Haisong Liu\nimport math\nimport cop"
},
{
"path": "loaders/ray_pq.py",
"chars": 8286,
"preview": "import numpy as np\nfrom prettytable import PrettyTable\n\n\nclass Metric_RayPQ:\n def __init__(self,\n occ"
},
{
"path": "models/__init__.py",
"chars": 236,
"preview": "from .backbones import __all__\r\nfrom .bbox import __all__\r\nfrom .sparseocc import SparseOcc\r\nfrom .sparseocc_head import"
},
{
"path": "models/backbones/__init__.py",
"chars": 49,
"preview": "from .vovnet import VoVNet\n\n__all__ = ['VoVNet']\n"
},
{
"path": "models/backbones/vovnet.py",
"chars": 12070,
"preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport warnings\nimport torch.utils.checkpoint as cp\nf"
},
{
"path": "models/bbox/__init__.py",
"chars": 91,
"preview": "from .assigners import __all__\nfrom .coders import __all__\nfrom .match_costs import __all__"
},
{
"path": "models/bbox/assigners/__init__.py",
"chars": 90,
"preview": "from .hungarian_assigner_3d import HungarianAssigner3D\n\n__all__ = ['HungarianAssigner3D']\n"
},
{
"path": "models/bbox/assigners/hungarian_assigner_3d.py",
"chars": 3757,
"preview": "import torch\n\nfrom mmdet.core.bbox.builder import BBOX_ASSIGNERS\nfrom mmdet.core.bbox.assigners import AssignResult\nfrom"
},
{
"path": "models/bbox/coders/__init__.py",
"chars": 69,
"preview": "from .nms_free_coder import NMSFreeCoder\n\n__all__ = ['NMSFreeCoder']\n"
},
{
"path": "models/bbox/coders/nms_free_coder.py",
"chars": 4047,
"preview": "import torch\n\nfrom mmdet.core.bbox import BaseBBoxCoder\nfrom mmdet.core.bbox.builder import BBOX_CODERS\nfrom ..utils imp"
},
{
"path": "models/bbox/match_costs/__init__.py",
"chars": 64,
"preview": "from .match_cost import BBox3DL1Cost\n\n__all__ = ['BBox3DL1Cost']"
},
{
"path": "models/bbox/match_costs/match_cost.py",
"chars": 1717,
"preview": "import torch\nfrom mmdet.core.bbox.match_costs.builder import MATCH_COST\n\n\n@MATCH_COST.register_module()\nclass BBox3DL1Co"
},
{
"path": "models/bbox/utils.py",
"chars": 3878,
"preview": "import torch \n\n\ndef normalize_bbox(bboxes):\n cx = bboxes[..., 0:1]\n cy = bboxes[..., 1:2]\n cz = bboxes[..., 2:3"
},
{
"path": "models/checkpoint.py",
"chars": 20520,
"preview": "# This page is completely copied from https://pytorch.org/docs/stable/_modules/torch/utils/checkpoint.html#checkpoint\n# "
},
{
"path": "models/csrc/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "models/csrc/msmv_sampling/msmv_sampling.cpp",
"chars": 14979,
"preview": "#include \"msmv_sampling.h\"\n\n#define MAX_POINT 32\n\nvoid ms_deformable_im2col_cuda_c2345(\n const float* feat_c2,\n co"
},
{
"path": "models/csrc/msmv_sampling/msmv_sampling.h",
"chars": 1676,
"preview": "#pragma once\n\n#include <torch/extension.h>\n\nat::Tensor ms_deform_attn_cuda_c2345_forward(\n const at::Tensor& feat_c2,"
},
{
"path": "models/csrc/msmv_sampling/msmv_sampling_backward.cu",
"chars": 17944,
"preview": "/*!\n * Modified from Deformable DETR\n */\n\n#include <cstdio>\n#include <iostream>\n#include <algorithm>\n#include <cstring>\n"
},
{
"path": "models/csrc/msmv_sampling/msmv_sampling_forward.cu",
"chars": 13094,
"preview": "/*!\n* Modified from Deformable DETR\n*/\n\n#include <cstdio>\n#include <algorithm>\n#include <cstring>\n#include <cuda_runtime"
},
{
"path": "models/csrc/setup.py",
"chars": 568,
"preview": "from setuptools import setup\nfrom torch.utils.cpp_extension import BuildExtension, CUDAExtension\n\n\ndef get_ext_modules()"
},
{
"path": "models/csrc/wrapper.py",
"chars": 3932,
"preview": "import torch\nimport torch.nn.functional as F\nfrom ._msmv_sampling_cuda import _ms_deform_attn_cuda_c2345_forward, _ms_de"
},
{
"path": "models/loss_utils.py",
"chars": 16416,
"preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom mmdet.models.builder import LOSSES, build_loss\nf"
},
{
"path": "models/matcher.py",
"chars": 6873,
"preview": "\"\"\"\nModified from https://github.com/facebookresearch/Mask2Former/blob/main/mask2former/modeling/matcher.py\n\"\"\"\nimport t"
},
{
"path": "models/sparse_voxel_decoder.py",
"chars": 8270,
"preview": "import torch\nimport torch.nn as nn\nfrom mmcv.runner import BaseModule\nfrom mmcv.cnn.bricks.transformer import FFN\nfrom ."
},
{
"path": "models/sparsebev_head.py",
"chars": 21328,
"preview": "import math\nimport torch\nimport torch.nn as nn\nfrom mmcv.runner import force_fp32\nfrom mmdet.core import multi_apply, re"
},
{
"path": "models/sparsebev_sampling.py",
"chars": 7107,
"preview": "import torch\nfrom .bbox.utils import decode_bbox\nfrom .utils import rotation_3d_in_axis, DUMP\nfrom .csrc.wrapper import "
},
{
"path": "models/sparsebev_transformer.py",
"chars": 19753,
"preview": "import torch\nimport torch.nn as nn\nimport numpy as np\nimport torch.nn.functional as F\nfrom mmcv.runner import BaseModule"
},
{
"path": "models/sparseocc.py",
"chars": 9351,
"preview": "import torch\nimport queue\nimport numpy as np\nfrom mmcv.runner import get_dist_info\nfrom mmcv.runner.fp16_utils import ca"
},
{
"path": "models/sparseocc_head.py",
"chars": 10215,
"preview": "import numpy as np\nimport torch\nimport torch.nn as nn\nfrom mmdet.models import HEADS\nfrom mmcv.runner import force_fp32,"
},
{
"path": "models/sparseocc_transformer.py",
"chars": 11856,
"preview": "import copy\nimport numpy as np\nimport torch\nimport torch.nn as nn\nfrom mmcv.runner import BaseModule\nfrom mmdet.models.u"
},
{
"path": "models/utils.py",
"chars": 12214,
"preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom numpy import random\nfrom mmcv.cnn.bricks import "
},
{
"path": "old_metrics.py",
"chars": 1467,
"preview": "import os\nimport glob\nimport torch\nimport argparse\nimport numpy as np\nfrom tqdm import tqdm\nfrom loaders.old_metrics imp"
},
{
"path": "ray_metrics.py",
"chars": 2421,
"preview": "import os\nimport glob\nimport mmcv\nimport argparse\nimport numpy as np\nimport torch\nfrom torch.utils.data import DataLoade"
},
{
"path": "timing.py",
"chars": 3256,
"preview": "import time\nimport utils\nimport logging\nimport argparse\nimport importlib\nimport torch\nimport torch.distributed\nimport to"
},
{
"path": "train.py",
"chars": 6354,
"preview": "import os\nimport utils\nimport shutil\nimport logging\nimport argparse\nimport importlib\nimport torch\nimport torch.distribut"
},
{
"path": "utils.py",
"chars": 10875,
"preview": "import os\nimport sys\nimport glob\nimport torch\nimport shutil\nimport logging\nimport datetime\nimport socket\nimport wandb\nfr"
},
{
"path": "val.py",
"chars": 3595,
"preview": "import os\nimport utils\nimport logging\nimport argparse\nimport importlib\nimport torch\nimport torch.distributed\nimport torc"
},
{
"path": "viz_prediction.py",
"chars": 4909,
"preview": "import os\nimport cv2\nimport utils\nimport logging\nimport argparse\nimport importlib\nimport torch\nimport numpy as np\nfrom t"
}
]
About this extraction
This page contains the full source code of the MCG-NJU/SparseOcc GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 58 files (409.0 KB), approximately 109.1k tokens, and a symbol index with 353 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.