Full Code of HorizonRobotics/BIP3D for AI

main b1ed6a84faaf cached

87 files

582.4 KB

148.5k tokens

451 symbols

1 requests

Download .txt

Showing preview only (612K chars total). Download the full file or copy to clipboard to get everything.

Repository: HorizonRobotics/BIP3D
Branch: main
Commit: b1ed6a84faaf
Files: 87
Total size: 582.4 KB

Directory structure:
gitextract_nz633hq0/

├── .gitignore
├── LICENSE
├── README.md
├── bip3d/
│   ├── __init__.py
│   ├── converter/
│   │   ├── extract_occupancy_ann.py
│   │   ├── generate_image_3rscan.py
│   │   ├── generate_image_matterport3d.py
│   │   └── generate_image_scannet.py
│   ├── datasets/
│   │   ├── __init__.py
│   │   ├── embodiedscan_det_grounding_dataset.py
│   │   ├── transforms/
│   │   │   ├── __init__.py
│   │   │   ├── augmentation.py
│   │   │   ├── formatting.py
│   │   │   ├── loading.py
│   │   │   ├── multiview.py
│   │   │   └── transform.py
│   │   └── utils.py
│   ├── eval/
│   │   ├── __init__.py
│   │   ├── indoor_eval.py
│   │   └── metrics/
│   │       ├── __init__.py
│   │       ├── det_metric.py
│   │       └── grounding_metric.py
│   ├── grid_mask.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── base_target.py
│   │   ├── bbox3d_decoder.py
│   │   ├── bert.py
│   │   ├── data_preprocessors/
│   │   │   ├── __init__.py
│   │   │   ├── custom_data_preprocessor.py
│   │   │   └── utils.py
│   │   ├── deformable_aggregation.py
│   │   ├── feature_enhancer.py
│   │   ├── instance_bank.py
│   │   ├── spatial_enhancer.py
│   │   ├── structure.py
│   │   ├── target.py
│   │   └── utils.py
│   ├── ops/
│   │   ├── __init__.py
│   │   ├── deformable_aggregation.py
│   │   ├── gcc.sh
│   │   ├── setup.py
│   │   └── src/
│   │       ├── deformable_aggregation.cpp
│   │       ├── deformable_aggregation_cuda.cu
│   │       ├── deformable_aggregation_with_depth.cpp
│   │       └── deformable_aggregation_with_depth_cuda.cu
│   ├── registry.py
│   ├── structures/
│   │   ├── __init__.py
│   │   ├── bbox_3d/
│   │   │   ├── __init__.py
│   │   │   ├── base_box3d.py
│   │   │   ├── box_3d_mode.py
│   │   │   ├── coord_3d_mode.py
│   │   │   ├── euler_box3d.py
│   │   │   ├── euler_depth_box3d.py
│   │   │   └── utils.py
│   │   ├── ops/
│   │   │   ├── __init__.py
│   │   │   ├── box_np_ops.py
│   │   │   ├── iou3d_calculator.py
│   │   │   └── transforms.py
│   │   └── points/
│   │       ├── __init__.py
│   │       ├── base_points.py
│   │       ├── cam_points.py
│   │       ├── depth_points.py
│   │       └── lidar_points.py
│   └── utils/
│       ├── __init__.py
│       ├── array_converter.py
│       ├── default_color_map.py
│       ├── dist_utils.py
│       ├── line_mesh.py
│       └── typing_config.py
├── configs/
│   ├── __init__.py
│   ├── bip3d_det.py
│   ├── bip3d_det_grounding.py
│   ├── bip3d_det_rgb.py
│   ├── bip3d_grounding.py
│   ├── bip3d_grounding_rgb.py
│   └── default_runtime.py
├── docs/
│   └── quick_start.md
├── engine.sh
├── requirements.txt
├── test.sh
└── tools/
    ├── anchor_bbox3d_kmeans.py
    ├── benchmark.py
    ├── ckpt_rename.py
    ├── dist_test.sh
    ├── dist_train.sh
    ├── test.py
    └── train.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
*.so

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don’t work, or not
#   install at all, so you might want to exclude Pipfile.lock from version control.
Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# VS Code
.vscode/


================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2024 Horizon Robotics

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence

<div align="center" class="authors">
    <a href="https://scholar.google.com/citations?user=pfXQwcQAAAAJ&hl=en" target="_blank">Xuewu Lin</a>,
    <a href="https://wzmsltw.github.io/" target="_blank">Tianwei Lin</a>,
    <a href="https://scholar.google.com/citations?user=F2e_jZMAAAAJ&hl=en" target="_blank">Lichao Huang</a>,
    <a href="https://openreview.net/profile?id=~HONGYU_XIE2" target="_blank">Hongyu Xie</a>,
    <a href="https://scholar.google.com/citations?user=HQfc8TEAAAAJ&hl=en" target="_blank">Zhizhong Su</a>
</div>


<div align="center" style="line-height: 3;">
  <a href="https://github.com/HorizonRobotics/BIP3D" target="_blank" style="margin: 2px;">
    <img alt="Code" src="https://img.shields.io/badge/Code-Github-bule" style="display: inline-block; vertical-align: middle;"/>
  </a>
  <a href="https://linxuewu.github.io/BIP3D-page/" target="_blank" style="margin: 2px;">
    <img alt="Homepage" src="https://img.shields.io/badge/Homepage-BIP3D-green" style="display: inline-block; vertical-align: middle;"/>
  </a>
  <a href="https://huggingface.co/HorizonRobotics/BIP3D" target="_blank" style="margin: 2px;">
    <img alt="Hugging Face" src="https://img.shields.io/badge/Models-Hugging%20Face-yellow" style="display: inline-block; vertical-align: middle;"/>
  </a>
  <a href="https://arxiv.org/abs/2411.14869" target="_blank" style="margin: 2px;">
    <img alt="Paper" src="https://img.shields.io/badge/Paper-Arxiv-red" style="display: inline-block; vertical-align: middle;"/>
  </a>
</div>


## :rocket: News
**01/Jun/2025**: We have refactored and integrated the BIP3D code into [robo_orchard_lab](https://github.com/HorizonRobotics/robo_orchard_lab/tree/master/projects/bip3d_grounding), removing the dependency on MM series. The environment is now easier to set up, and the performance are improved. Welcome to try it out!

**14/Mar/2025**: Our code has been released.

**27/Feb/2025**: Our paper has been accepted by CVPR 2025.

**22/Nov/2024**: We release our paper to [Arxiv](https://arxiv.org/abs/2411.14869).

## :open_book: Quick Start
[Quick Start](docs/quick_start.md)

## :link: Framework
<div align="center">
  <img src="https://github.com/HorizonRobotics/BIP3D/raw/main/resources/bip3d_structure.png" width="90%" alt="BIP3D" />
  <p style="font-size:0.8em; color:#555;">The Architecture Diagram of BIP3D, where the red stars indicate the parts that have been modified or added compared to the base model, GroundingDINO, and dashed lines indicate optional elements.</p>
</div>

## :trophy: Results on EmbodiedScan Benchmark
We made several improvements based on the original paper, achieving better 3D perception results. The main improvements include the following two points:
1. **New Fusion Operation**: We enhanced the decoder by replacing the deformable aggregation (DAG) with a 3D deformable attention mechanism (DAT). Specifically, we improved the feature sampling process by transitioning from bilinear interpolation to trilinear interpolation, which leverages depth distribution for more accurate feature extraction.
2. **Mixed Data Training**: To optimize the grounding model's performance, we adopted a mixed-data training strategy by integrating detection data with grounding data during the grounding finetuning process.

### 1. Results on Multi-view 3D Detection Validation Dataset
Op DAG denotes deformable aggregation, and DAT denotes 3D deformable attention. Set [`with_depth=True`](configs/bip3d_det.py#L175) to activate the DAT.

The metric in the table is `AP@0.25`. For more metrics, please refer to the logs.
|Model | Inputs | Op | Overall | Head | Common | Tail | Small | Medium | Large | ScanNet | 3RScan | MP3D | ckpt | log |
|  :----  | :---: |  :---: | :---: |:---: | :---: | :---: | :---:| :---:|:---:|:---: | :---: | :----: | :----: | :---: |
|[BIP3D](configs/bip3d_det_rgb.py) | RGB | DAG | 16.57|23.29|13.84|12.29|2.67|17.85|12.89|19.71|26.76|8.50   | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/det_rgb_dag/model_checkpoint.pth) | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/det_rgb_dag/job.log) |
|[BIP3D](configs/bip3d_det_rgb.py) | RGB | DAT | 16.67|22.41|14.19|13.18|3.32|17.25|14.89|20.80|24.18|9.91  | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/det_rgb_dat/model_checkpoint.pth) | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/det_rgb_dat/job.log) |
|[BIP3D](configs/bip3d_det.py) |RGB-D | DAG | 22.53|28.89|20.51|17.83|6.95|24.21|15.46|24.77|35.29|10.34  | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/det_rgbd_dag/model_checkpoint.pth) | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/det_rgbd_dag/job.log) |
|[BIP3D](configs/bip3d_det.py) |RGB-D | DAT | 23.24|31.51|20.20|17.62|7.31|24.09|15.82|26.35|36.29|11.44   | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/det_rgbd_dat/model_checkpoint.pth) | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/det_rgbd_dat/job.log) |

### 2. Results on Multi-view 3D Grounding Mini Dataset
To train and validate with mini dataset, set [`data_version="v1-mini"`](configs/bip3d_grounding.py#L333).
|Model | Inputs | Op | Overall | Easy | Hard | View-dep | View-indep | ScanNet | 3RScan | MP3D | ckpt | log |
|  :----  | :---: | :---: | :---: | :---: | :---:| :---:|:---:|:---: | :---: | :----: |:---: | :----: |
|[BIP3D](configs/bip3d_grounding_rgb.py) | RGB | DAG | 44.00|44.39|39.56|46.05|42.92|48.62|42.47|36.40  | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/grounding_mini_rgb_dag/model_checkpoint.pth) | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/grounding_mini_rgb_dag/job.log) |
|[BIP3D](configs/bip3d_grounding_rgb.py) | RGB | DAT | 44.43|44.74|41.02|45.17|44.04|49.70|41.81|37.28  | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/grounding_mini_rgb_dat/model_checkpoint.pth) | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/grounding_mini_rgb_dat/job.log) |
|[BIP3D](configs/bip3d_grounding.py) | RGB-D | DAG | 45.79|46.22|40.91|45.93|45.71|48.94|46.61|37.36  | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/grounding_mini_rgbd_dag/model_checkpoint.pth) | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/grounding_mini_rgbd_dag/job.log) |
|[BIP3D](configs/bip3d_grounding.py) | RGB-D | DAT | 58.47|59.02|52.23|60.20|57.56|66.63|54.79|46.72  | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/grounding_mini_rgbd_dat/model_checkpoint.pth) | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/grounding_mini_rgbd_dat/job.log) |


### 3. Results on Multi-view 3D Grounding Validation Dataset
|Model | Inputs | Op | Mixed Data | Overall | Easy | Hard | View-dep | View-indep | ScanNet | 3RScan | MP3D | ckpt | log |
|  :----  | :---: | :---: | :---: |:---: | :---: | :---:| :---:|:---:|:---: | :---: | :----: |:---: | :----: |
|[BIP3D](configs/bip3d_grounding_rgb.py) | RGB | DAG |No| 45.81|46.21|41.34|47.07|45.09|50.40|47.53|32.97   | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/grounding_rgb_dag/model_checkpoint.pth) | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/grounding_rgb_dag/job.log) |
|[BIP3D](configs/bip3d_grounding_rgb.py) | RGB | DAT |No| 47.29|47.82|41.42|48.58|46.56|52.74|47.85|34.60   | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/grounding_rgb_dat/model_checkpoint.pth) | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/grounding_rgb_dat/job.log) |
|[BIP3D](configs/bip3d_grounding.py) | RGB-D | DAG |No| 53.75|53.87|52.43|55.21|52.93|60.05|54.92|38.20   | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/grounding_rgbd_dag/model_checkpoint.pth) | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/grounding_rgbd_dag/job.log) |
|[BIP3D](configs/bip3d_grounding.py) | RGB-D | DAT |No|61.36|61.88|55.58|62.43|60.76|66.96|62.75|46.92   | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/grounding_rgbd_dat/model_checkpoint.pth) | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/grounding_rgbd_dat/job.log) |
|[BIP3D](configs/bip3d_det_grounding.py) | RGB-D | DAT |Yes|66.58|66.99|62.07|67.95|65.81|72.43|68.26|51.14   | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/grounding_rgbd_dat_mixdata/model_checkpoint.pth) | [link](https://huggingface.co/HorizonRobotics/BIP3D/blob/main/grounding_rgbd_dat_mixdata/job.log) |


### 4. [Results on Multi-view 3D Grounding Test Dataset](https://huggingface.co/spaces/AGC2024/visual-grounding-2024)

|Model | Overall | Easy | Hard | View-dep | View-indep | ckpt | log |
|  :----  | :---: | :---: | :---: | :---: | :---:| :---:|:---:|
|[EmbodiedScan](https://github.com/OpenRobotLab/EmbodiedScan) | 39.67 | 40.52 | 30.24 | 39.05 | 39.94 | - | - |
|[SAG3D*](https://opendrivelab.github.io/Challenge%202024/multiview_Mi-Robot.pdf) | 46.92 | 47.72 | 38.03 | 46.31 | 47.18 | - | - |
|[DenseG*](https://opendrivelab.github.io/Challenge%202024/multiview_THU-LenovoAI.pdf) | 59.59 | 60.39 | 50.81 | 60.50 | 59.20 |  - | - |
|[BIP3D](configs/bip3d_det_grounding.py) | 67.38 | 68.12 | 59.08 | 67.88 | 67.16 |  - | - |
|BIP3D-B | 70.53 | 71.22 | 62.91 | 70.69 | 70.47 | - | - |

`*` denotes model ensemble, and note that our BIP3D does not use the ensemble trick. This differs from what is mentioned in the paper and shows significant improvements.

Our best model, BIP3D-B, is based on GroundingDINO-base and is trained with the addition ARKitScenes dataset.


## :page_facing_up: Citation
```
@article{lin2024bip3d,
  title={BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence},
  author={Lin, Xuewu and Lin, Tianwei and Huang, Lichao and Xie, Hongyu and Su, Zhizhong},
  journal={arXiv preprint arXiv:2411.14869},
  year={2024}
}
```


## :handshake: Acknowledgement
[EmbodiedScan](https://github.com/OpenRobotLab/EmbodiedScan)

[Sparse4D](https://github.com/HorizonRobotics/Sparse4D)

[3D-deformable-attention](https://github.com/IDEA-Research/3D-deformable-attention)

[mmdet-GroundingDINO](https://github.com/open-mmlab/mmdetection/tree/main/configs/grounding_dino)


================================================
FILE: bip3d/__init__.py
================================================
from .registry import *


================================================
FILE: bip3d/converter/extract_occupancy_ann.py
================================================
import os
import shutil
from argparse import ArgumentParser

from tqdm import tqdm


def extract_occupancy(dataset, src, dst):
    """Extract occupancy annotations of a single dataset to dataset root."""
    print('Processing dataset', dataset)
    scenes = os.listdir(os.path.join(src, dataset))
    dst_dataset = os.path.join(dst, dataset)
    if not os.path.exists(dst_dataset):
        print('Missing dataset:', dataset)
        return
    for scene in tqdm(scenes):
        if dataset == 'scannet':
            dst_scene = os.path.join(dst_dataset, 'scans', scene)
        else:
            dst_scene = os.path.join(dst_dataset, scene)

        if not os.path.exists(dst_scene):
            print(f'Missing scene {scene} in dataset {dataset}')
            continue
        dst_occ = os.path.join(dst_scene, 'occupancy')
        if not os.path.exists(dst_occ):
            shutil.copytree(os.path.join(src, dataset, scene), dst_occ)
        else:
            files = os.listdir(os.path.join(src, dataset, scene))
            for file in files:
                if not os.path.exists(os.path.join(dst_occ, file)):
                    shutil.copyfile(os.path.join(src, dataset, scene, file),
                                    os.path.join(dst_occ, file))


if __name__ == '__main__':
    parser = ArgumentParser()
    parser.add_argument('--src',
                        required=True,
                        help='folder of the occupancy annotations')
    parser.add_argument('--dst',
                        required=True,
                        help='folder of the raw datasets')
    args = parser.parse_args()
    datasets = os.listdir(args.src)
    for dataset in datasets:
        extract_occupancy(dataset, args.src, args.dst)


================================================
FILE: bip3d/converter/generate_image_3rscan.py
================================================
import os
import zipfile
from argparse import ArgumentParser
from functools import partial

import mmengine


def process_scene(path, scene_name):
    """Process single 3Rscan scene."""
    with zipfile.ZipFile(os.path.join(path, scene_name, 'sequence.zip'),
                         'r') as zip_ref:
        zip_ref.extractall(os.path.join(path, scene_name, 'sequence'))


if __name__ == '__main__':
    parser = ArgumentParser()
    parser.add_argument('--dataset_folder',
                        required=True,
                        help='folder of the dataset.')
    parser.add_argument('--nproc', type=int, default=8)
    args = parser.parse_args()

    mmengine.track_parallel_progress(func=partial(process_scene,
                                                  args.dataset_folder),
                                     tasks=os.listdir(args.dataset_folder),
                                     nproc=args.nproc)


================================================
FILE: bip3d/converter/generate_image_matterport3d.py
================================================
import os
import zipfile
from argparse import ArgumentParser
from functools import partial

import mmengine


def process_scene(path, output_folder, scene_name):
    """Process single 3Rscan scene."""
    files = list(os.listdir(os.path.join(path, scene_name)))
    for file in files:
        if not file.endswith(".zip"):
            continue
        if file != "sens.zip":
            continue
        with zipfile.ZipFile(os.path.join(path, scene_name, file),
                             'r') as zip_ref:
            if file == "sens.zip":
                zip_ref.extractall(os.path.join(output_folder, scene_name, file[:-4]))
            else:
                zip_ref.extractall(output_folder)


if __name__ == '__main__':
    parser = ArgumentParser()
    parser.add_argument('--dataset_folder',
                        required=True,
                        help='folder of the dataset.')
    parser.add_argument('--output_folder',
                        required=True,
                        help='output folder of the dataset.')
    parser.add_argument('--nproc', type=int, default=8)
    args = parser.parse_args()

    if not os.path.exists(args.output_folder):
        os.makedirs(args.output_folder, exist_ok=True)

    mmengine.track_parallel_progress(func=partial(process_scene,
                                                  args.dataset_folder, args.output_folder),
                                              tasks=os.listdir(args.dataset_folder),
                                     nproc=args.nproc)


================================================
FILE: bip3d/converter/generate_image_scannet.py
================================================
# Modified from https://github.com/ScanNet/ScanNet/blob/master/SensReader/python/SensorData.py # noqa
import os
import struct
import zlib
from argparse import ArgumentParser
from functools import partial

import imageio
import mmengine
import numpy as np

COMPRESSION_TYPE_COLOR = {-1: 'unknown', 0: 'raw', 1: 'png', 2: 'jpeg'}

COMPRESSION_TYPE_DEPTH = {
    -1: 'unknown',
    0: 'raw_ushort',
    1: 'zlib_ushort',
    2: 'occi_ushort'
}


class RGBDFrame:
    """Class for single ScanNet RGB-D image processing."""

    def load(self, file_handle):
        """Load basic information of a given RGBD frame."""
        self.camera_to_world = np.asarray(struct.unpack(
            'f' * 16, file_handle.read(16 * 4)),
                                          dtype=np.float32).reshape(4, 4)
        self.timestamp_color = struct.unpack('Q', file_handle.read(8))[0]
        self.timestamp_depth = struct.unpack('Q', file_handle.read(8))[0]
        self.color_size_bytes = struct.unpack('Q', file_handle.read(8))[0]
        self.depth_size_bytes = struct.unpack('Q', file_handle.read(8))[0]
        self.color_data = b''.join(
            struct.unpack('c' * self.color_size_bytes,
                          file_handle.read(self.color_size_bytes)))
        self.depth_data = b''.join(
            struct.unpack('c' * self.depth_size_bytes,
                          file_handle.read(self.depth_size_bytes)))

    def decompress_depth(self, compression_type):
        """Decompress the depth data."""
        assert compression_type == 'zlib_ushort'
        return zlib.decompress(self.depth_data)

    def decompress_color(self, compression_type):
        """Decompress the RGB image data."""
        assert compression_type == 'jpeg'
        return imageio.imread(self.color_data)


class SensorData:
    """Class for single ScanNet scene processing.

    Single scene file contains multiple RGB-D images.
    """

    def __init__(self, filename, fast=False):
        self.version = 4
        self.load(filename, fast)

    def load(self, filename, fast):
        """Load a single scene data with multiple RGBD frames."""
        with open(filename, 'rb') as f:
            version = struct.unpack('I', f.read(4))[0]
            assert self.version == version
            strlen = struct.unpack('Q', f.read(8))[0]
            self.sensor_name = b''.join(
                struct.unpack('c' * strlen, f.read(strlen)))
            self.intrinsic_color = np.asarray(struct.unpack(
                'f' * 16, f.read(16 * 4)),
                                              dtype=np.float32).reshape(4, 4)
            self.extrinsic_color = np.asarray(struct.unpack(
                'f' * 16, f.read(16 * 4)),
                                              dtype=np.float32).reshape(4, 4)
            self.intrinsic_depth = np.asarray(struct.unpack(
                'f' * 16, f.read(16 * 4)),
                                              dtype=np.float32).reshape(4, 4)
            self.extrinsic_depth = np.asarray(struct.unpack(
                'f' * 16, f.read(16 * 4)),
                                              dtype=np.float32).reshape(4, 4)
            self.color_compression_type = COMPRESSION_TYPE_COLOR[struct.unpack(
                'i', f.read(4))[0]]
            self.depth_compression_type = COMPRESSION_TYPE_DEPTH[struct.unpack(
                'i', f.read(4))[0]]
            self.color_width = struct.unpack('I', f.read(4))[0]
            self.color_height = struct.unpack('I', f.read(4))[0]
            self.depth_width = struct.unpack('I', f.read(4))[0]
            self.depth_height = struct.unpack('I', f.read(4))[0]
            self.depth_shift = struct.unpack('f', f.read(4))[0]
            num_frames = struct.unpack('Q', f.read(8))[0]
            self.num_frames = num_frames
            self.frames = []
            if fast:
                index = list(range(num_frames))[::10]
            else:
                index = list(range(num_frames))
            self.index = index
            for i in range(num_frames):
                frame = RGBDFrame()
                frame.load(f)
                if i in index:
                    self.frames.append(frame)

    def export_depth_images(self, output_path):
        """Export depth images to the output path."""
        if not os.path.exists(output_path):
            os.makedirs(output_path)
        for f in range(len(self.frames)):
            depth_data = self.frames[f].decompress_depth(
                self.depth_compression_type)
            depth = np.fromstring(depth_data, dtype=np.uint16).reshape(
                self.depth_height, self.depth_width)
            imageio.imwrite(
                os.path.join(output_path,
                             self.index_to_str(self.index[f]) + '.png'), depth)

    def export_color_images(self, output_path):
        """Export RGB images to the output path."""
        if not os.path.exists(output_path):
            os.makedirs(output_path)
        for f in range(len(self.frames)):
            color = self.frames[f].decompress_color(
                self.color_compression_type)
            imageio.imwrite(
                os.path.join(output_path,
                             self.index_to_str(self.index[f]) + '.jpg'), color)

    @staticmethod
    def index_to_str(index):
        """Convert the sample index to string."""
        return str(index).zfill(5)

    @staticmethod
    def save_mat_to_file(matrix, filename):
        """Save a matrix to file."""
        with open(filename, 'w') as f:
            for line in matrix:
                np.savetxt(f, line[np.newaxis], fmt='%f')

    def export_poses(self, output_path):
        """Export camera poses to the output path."""
        if not os.path.exists(output_path):
            os.makedirs(output_path)
        for f in range(len(self.frames)):
            self.save_mat_to_file(
                self.frames[f].camera_to_world,
                os.path.join(output_path,
                             self.index_to_str(self.index[f]) + '.txt'))

    def export_intrinsics(self, output_path):
        """Export the intrinsic matrix to the output path."""
        if not os.path.exists(output_path):
            os.makedirs(output_path)
        self.save_mat_to_file(self.intrinsic_color,
                              os.path.join(output_path, 'intrinsic.txt'))

    def export_depth_intrinsics(self, output_path):
        """Export the depth intrinsic matrix to the output path."""
        if not os.path.exists(output_path):
            os.makedirs(output_path)
        self.save_mat_to_file(self.intrinsic_depth,
                              os.path.join(output_path, 'depth_intrinsic.txt'))


def process_scene(path, fast, idx):
    """Process single ScanNet scene.

    Extract RGB images, poses and camera intrinsics.
    """
    try:
        print(path, fast, idx)
        data = SensorData(os.path.join(path, idx, f'{idx}.sens'), fast)
        # output_path = os.path.join('/bucket/output/robot_lab/users/xuewu.lin/scannet_posed_images', idx)
        output_path = os.path.join('/horizon-bucket/robot_lab/users/xuewu.lin/scannet_posed_images', idx)
        data.export_color_images(output_path)
        data.export_intrinsics(output_path)
        data.export_poses(output_path)
        data.export_depth_images(output_path)
        data.export_depth_intrinsics(output_path)
    except Exception as e:
        print("bug info: ", path, fast, idx)
        raise e


def process_directory(path, fast, nproc):
    """Process the files in a directory with parallel support."""
    mmengine.track_parallel_progress(func=partial(process_scene, path, fast),
                                     # tasks=['scene0175_00'],
                                     tasks=['scene0288_01'],
                                     # tasks=['scene0288_01', 'scene0175_00'],
                                     # tasks=os.listdir(path),
                                     nproc=nproc)


if __name__ == '__main__':
    parser = ArgumentParser()
    parser.add_argument('--dataset_folder',
                        default=None,
                        help='folder of the dataset.')
    parser.add_argument('--nproc', type=int, default=8)
    parser.add_argument('--fast', action='store_true')
    args = parser.parse_args()

    if args.dataset_folder is not None:
        os.chdir(args.dataset_folder)

    # process train and val scenes
    if os.path.exists('scans'):
        process_directory('scans', args.fast, args.nproc)


================================================
FILE: bip3d/datasets/__init__.py
================================================
from .embodiedscan_det_grounding_dataset import EmbodiedScanDetGroundingDataset
from .transforms import *  # noqa: F401,F403

__all__ = [
    'EmbodiedScanDetGroundingDataset'
]


================================================
FILE: bip3d/datasets/embodiedscan_det_grounding_dataset.py
================================================
import math
import pickle
import copy
import tqdm
import os
import warnings
from typing import Callable, List, Optional, Union

import mmengine
import numpy as np
from mmengine.dataset import BaseDataset, force_full_init
from mmengine.fileio import load
from mmengine.logging import print_log

from bip3d.registry import DATASETS
from bip3d.structures import get_box_type
from .utils import sample

class_names = (
    'adhesive tape', 'air conditioner', 'alarm', 'album', 'arch', 'backpack',
    'bag', 'balcony', 'ball', 'banister', 'bar', 'barricade', 'baseboard',
    'basin', 'basket', 'bathtub', 'beam', 'beanbag', 'bed', 'bench', 'bicycle',
    'bidet', 'bin', 'blackboard', 'blanket', 'blinds', 'board', 'body loofah',
    'book', 'boots', 'bottle', 'bowl', 'box', 'bread', 'broom', 'brush',
    'bucket', 'cabinet', 'calendar', 'camera', 'can', 'candle', 'candlestick',
    'cap', 'car', 'carpet', 'cart', 'case', 'chair', 'chandelier', 'cleanser',
    'clock', 'clothes', 'clothes dryer', 'coat hanger', 'coffee maker', 'coil',
    'column', 'commode', 'computer', 'conducting wire', 'container', 'control',
    'copier', 'cosmetics', 'couch', 'counter', 'countertop', 'crate', 'crib',
    'cube', 'cup', 'curtain', 'cushion', 'decoration', 'desk', 'detergent',
    'device', 'dish rack', 'dishwasher', 'dispenser', 'divider', 'door',
    'door knob', 'doorframe', 'doorway', 'drawer', 'dress', 'dresser', 'drum',
    'duct', 'dumbbell', 'dustpan', 'dvd', 'eraser', 'excercise equipment',
    'fan', 'faucet', 'fence', 'file', 'fire extinguisher', 'fireplace',
    'flowerpot', 'flush', 'folder', 'food', 'footstool', 'frame', 'fruit',
    'furniture', 'garage door', 'garbage', 'glass', 'globe', 'glove',
    'grab bar', 'grass', 'guitar', 'hair dryer', 'hamper', 'handle', 'hanger',
    'hat', 'headboard', 'headphones', 'heater', 'helmets', 'holder', 'hook',
    'humidifier', 'ironware', 'jacket', 'jalousie', 'jar', 'kettle',
    'keyboard', 'kitchen island', 'kitchenware', 'knife', 'label', 'ladder',
    'lamp', 'laptop', 'ledge', 'letter', 'light', 'luggage', 'machine',
    'magazine', 'mailbox', 'map', 'mask', 'mat', 'mattress', 'menu',
    'microwave', 'mirror', 'molding', 'monitor', 'mop', 'mouse', 'napkins',
    'notebook', 'ottoman', 'oven', 'pack', 'package', 'pad', 'pan', 'panel',
    'paper', 'paper cutter', 'partition', 'pedestal', 'pen', 'person', 'piano',
    'picture', 'pillar', 'pillow', 'pipe', 'pitcher', 'plant', 'plate',
    'player', 'plug', 'plunger', 'pool', 'pool table', 'poster', 'pot',
    'price tag', 'printer', 'projector', 'purse', 'rack', 'radiator', 'radio',
    'rail', 'range hood', 'refrigerator', 'remote control', 'ridge', 'rod',
    'roll', 'roof', 'rope', 'sack', 'salt', 'scale', 'scissors', 'screen',
    'seasoning', 'shampoo', 'sheet', 'shelf', 'shirt', 'shoe', 'shovel',
    'shower', 'sign', 'sink', 'soap', 'soap dish', 'soap dispenser', 'socket',
    'speaker', 'sponge', 'spoon', 'stairs', 'stall', 'stand', 'stapler',
    'statue', 'steps', 'stick', 'stool', 'stopcock', 'stove', 'structure',
    'sunglasses', 'support', 'switch', 'table', 'tablet', 'teapot',
    'telephone', 'thermostat', 'tissue', 'tissue box', 'toaster', 'toilet',
    'toilet paper', 'toiletry', 'tool', 'toothbrush', 'toothpaste', 'towel',
    'toy', 'tray', 'treadmill', 'trophy', 'tube', 'tv', 'umbrella', 'urn',
    'utensil', 'vacuum cleaner', 'vanity', 'vase', 'vent', 'ventilation',
    'wardrobe', 'washbasin', 'washing machine', 'water cooler', 'water heater',
    'window', 'window frame', 'windowsill', 'wine', 'wire', 'wood', 'wrap')
head_labels = [
    48, 177, 82, 179, 37, 243, 28, 277, 32, 84, 215, 145, 182, 170, 22, 72, 30,
    141, 65, 257, 221, 225, 52, 75, 231, 158, 236, 156, 47, 74, 6, 18, 71, 242,
    217, 251, 66, 263, 5, 45, 14, 73, 278, 198, 24, 23, 196, 252, 19, 135, 26,
    229, 183, 200, 107, 272, 246, 269, 125, 59, 279, 15, 163, 258, 57, 195, 51,
    88, 97, 58, 102, 36, 137, 31, 80, 160, 155, 61, 238, 96, 190, 25, 219, 152,
    142, 201, 274, 249, 178, 192
]
common_labels = [
    189, 164, 101, 205, 273, 233, 131, 180, 86, 220, 67, 268, 224, 270, 53,
    203, 237, 226, 10, 133, 248, 41, 55, 16, 199, 134, 99, 185, 2, 20, 234,
    194, 253, 35, 174, 8, 223, 13, 91, 262, 230, 121, 49, 63, 119, 162, 79,
    168, 245, 267, 122, 104, 100, 1, 176, 280, 140, 209, 259, 143, 165, 147,
    117, 85, 105, 95, 109, 207, 68, 175, 106, 60, 4, 46, 171, 204, 111, 211,
    108, 120, 157, 222, 17, 264, 151, 98, 38, 261, 123, 78, 118, 127, 240, 124
]
tail_labels = [
    76, 149, 173, 250, 275, 255, 34, 77, 266, 283, 112, 115, 186, 136, 256, 40,
    254, 172, 9, 212, 213, 181, 154, 94, 191, 193, 3, 130, 146, 70, 128, 167,
    126, 81, 7, 11, 148, 228, 239, 247, 21, 42, 89, 153, 161, 244, 110, 0, 29,
    114, 132, 159, 218, 232, 260, 56, 92, 116, 282, 33, 113, 138, 12, 188, 44,
    150, 197, 271, 169, 206, 90, 235, 103, 281, 184, 208, 216, 202, 214, 241,
    129, 210, 276, 64, 27, 87, 139, 227, 187, 62, 43, 50, 69, 93, 144, 166,
    265, 54, 83, 39
]


@DATASETS.register_module()
class EmbodiedScanDetGroundingDataset(BaseDataset):
    def __init__(
        self,
        data_root: str,
        ann_file: str,
        vg_file=None,
        metainfo: Optional[dict] = None,
        pipeline: List[Union[dict, Callable]] = [],
        test_mode: bool = False,
        load_eval_anns: bool = True,
        filter_empty_gt: bool = True,
        remove_dontcare: bool = False,
        box_type_3d: str = "Euler-Depth",
        dataset_length=None,
        mode="detection",
        max_n_images=50,
        n_images_per_sample=1,
        drop_last_per_scene=False,
        part=None,
        temporal=False,
        num_text=1,
        tokens_positive_rebuild=True,
        sep_token="[SEP]",
        **kwargs,
    ):
        self.box_type_3d, self.box_mode_3d = get_box_type(box_type_3d)
        self.filter_empty_gt = filter_empty_gt
        self.remove_dontcare = remove_dontcare
        self.load_eval_anns = load_eval_anns
        self.dataset_length = dataset_length
        self.part = part
        self.mode = mode
        assert self.mode in ["detection", "continuous", "grounding"]
        super().__init__(
            ann_file=ann_file,
            metainfo=metainfo,
            data_root=data_root,
            pipeline=pipeline,
            test_mode=test_mode,
            serialize_data=self.mode == "detection",
            **kwargs,
        )
        if self.mode == "continuous":
            self.max_n_images = max_n_images
            self.n_images_per_sample = n_images_per_sample
            self.drop_last_per_scene = drop_last_per_scene
            self.convert_to_continuous()
        elif self.mode == "grounding":
            self.vg_file = vg_file
            self.num_text = num_text
            self.tokens_positive_rebuild = tokens_positive_rebuild
            self.sep_token = sep_token
            self.load_language_data()
            self.data_bytes, self.data_address = self._serialize_data()
            self.serialize_data = True
        print_log(f"dataset length : {self.__len__()}")

    def process_metainfo(self):
        assert "categories" in self._metainfo

        if "classes" not in self._metainfo:
            self._metainfo.setdefault(
                "classes", list(self._metainfo["categories"].keys())
            )

        self.label_mapping = np.full(
            max(list(self._metainfo["categories"].values())) + 1, -1, dtype=int
        )
        for key, value in self._metainfo["categories"].items():
            if key in self._metainfo["classes"]:
                self.label_mapping[value] = self._metainfo["classes"].index(
                    key
                )

    def parse_data_info(self, info: dict):
        info["box_type_3d"] = self.box_type_3d
        info["axis_align_matrix"] = self._get_axis_align_matrix(info)
        info["scan_id"] = info["sample_idx"]
        ann_dataset = info["sample_idx"].split("/")[0]
        if ann_dataset == "matterport3d":
            info["depth_shift"] = 4000.0
        else:
            info["depth_shift"] = 1000.0
        # Because multi-view settings are different from original designs
        # we temporarily follow the ori design in ImVoxelNet
        info["img_path"] = []
        info["depth_img_path"] = []
        if "cam2img" in info:
            cam2img = info["cam2img"].astype(np.float32)
        else:
            cam2img = []

        extrinsics = []
        for i in range(len(info["images"])):
            img_path = os.path.join(
                self.data_prefix.get("img_path", ""),
                info["images"][i]["img_path"],
            )
            depth_img_path = os.path.join(
                self.data_prefix.get("img_path", ""),
                info["images"][i]["depth_path"],
            )

            info["img_path"].append(img_path)
            info["depth_img_path"].append(depth_img_path)
            align_global2cam = np.linalg.inv(
                info["axis_align_matrix"] @ info["images"][i]["cam2global"]
            )
            extrinsics.append(align_global2cam.astype(np.float32))
            if "cam2img" not in info:
                cam2img.append(info["images"][i]["cam2img"].astype(np.float32))

        info["depth2img"] = dict(
            extrinsic=extrinsics,
            intrinsic=cam2img,
            origin=np.array([0.0, 0.0, 0.5]).astype(np.float32),
        )

        if "depth_cam2img" not in info:
            info["depth_cam2img"] = cam2img

        if not self.test_mode:
            info["ann_info"] = self.parse_ann_info(info)

        if self.test_mode and self.load_eval_anns:
            info["ann_info"] = self.parse_ann_info(info)
            info["eval_ann_info"] = self._remove_dontcare(info["ann_info"])

        return info

    def parse_ann_info(self, info: dict):
        """Process the `instances` in data info to `ann_info`.

        Args:
            info (dict): Info dict.

        Returns:
            dict: Processed `ann_info`.
        """

        ann_info = None
        if "instances" in info and len(info["instances"]) > 0:
            ann_info = dict(
                gt_bboxes_3d=np.zeros(
                    (len(info["instances"]), 9), dtype=np.float32
                ),
                gt_labels_3d=np.zeros(
                    (len(info["instances"]),), dtype=np.int64
                ),
                gt_names=[],
                bbox_id=np.zeros((len(info["instances"]),), dtype=np.int64) - 1,
            )
            for idx, instance in enumerate(info["instances"]):
                ann_info["gt_bboxes_3d"][idx] = instance["bbox_3d"]
                ann_info["gt_labels_3d"][idx] = self.label_mapping[
                    instance["bbox_label_3d"]
                ]
                ann_info["gt_names"].append(
                    self._metainfo["classes"][ann_info["gt_labels_3d"][idx]]
                    if ann_info["gt_labels_3d"][idx] >= 0
                    else "others"
                )
                ann_info["bbox_id"][idx] = instance["bbox_id"]

        # pack ann_info for return
        if ann_info is None:
            ann_info = dict()
            ann_info["gt_bboxes_3d"] = np.zeros((0, 9), dtype=np.float32)
            ann_info["gt_labels_3d"] = np.zeros((0,), dtype=np.int64)
            ann_info["bbox_id"] = np.zeros((0,), dtype=np.int64) - 1
            ann_info["gt_names"] = []

        # post-processing/filtering ann_info if not empty gt
        if "visible_instance_ids" in info["images"][0]:
            ids = []
            for i in range(len(info["images"])):
                ids.append(info["images"][i]["visible_instance_ids"])
            mask_length = ann_info["gt_labels_3d"].shape[0]
            ann_info["visible_instance_masks"] = self._ids2masks(
                ids, mask_length
            )

        if self.remove_dontcare:
            ann_info = self._remove_dontcare(ann_info)

        ann_dataset = info["sample_idx"].split("/")[0]
        ann_info["gt_bboxes_3d"] = self.box_type_3d(
            ann_info["gt_bboxes_3d"],
            box_dim=ann_info["gt_bboxes_3d"].shape[-1],
            with_yaw=True,
            origin=(0.5, 0.5, 0.5),
        )
        return ann_info

    @staticmethod
    def _get_axis_align_matrix(info: dict):
        """Get axis_align_matrix from info. If not exist, return identity mat.

        Args:
            info (dict): Info of a single sample data.

        Returns:
            np.ndarray: 4x4 transformation matrix.
        """
        if "axis_align_matrix" in info:
            return np.array(info["axis_align_matrix"])
        else:
            warnings.warn(
                "axis_align_matrix is not found in ScanNet data info, please "
                "use new pre-process scripts to re-generate ScanNet data"
            )
            return np.eye(4).astype(np.float32)

    def _ids2masks(self, ids, mask_length):
        """Change visible_instance_ids to visible_instance_masks."""
        masks = []
        for idx in range(len(ids)):
            mask = np.zeros((mask_length,), dtype=bool)
            mask[ids[idx]] = 1
            masks.append(mask)
        return masks

    def _remove_dontcare(self, ann_info: dict):
        """Remove annotations that do not need to be cared.

        -1 indicates dontcare in MMDet3d.

        Args:
            ann_info (dict): Dict of annotation infos. The
                instance with label `-1` will be removed.

        Returns:
            dict: Annotations after filtering.
        """
        img_filtered_annotations = {}
        filter_mask = ann_info["gt_labels_3d"] > -1
        for key in ann_info.keys():
            if key == "instances":
                img_filtered_annotations[key] = ann_info[key]
            elif key == "visible_instance_masks":
                img_filtered_annotations[key] = []
                for idx in range(len(ann_info[key])):
                    img_filtered_annotations[key].append(
                        ann_info[key][idx][filter_mask]
                    )
            elif key == "gt_names":
                img_filtered_annotations[key] = [
                    x for i, x in enumerate(ann_info[key]) if filter_mask[i]
                ]
            else:
                img_filtered_annotations[key] = ann_info[key][filter_mask]
        return img_filtered_annotations

    def load_data_list(self):
        annotations = load(self.ann_file)
        if not isinstance(annotations, dict):
            raise TypeError(
                f"The annotations loaded from annotation file "
                f"should be a dict, but got {type(annotations)}!"
            )
        if "data_list" not in annotations or "metainfo" not in annotations:
            raise ValueError(
                "Annotation must have data_list and metainfo " "keys"
            )
        metainfo = annotations["metainfo"]
        raw_data_list = annotations["data_list"]

        # Meta information load from annotation file will not influence the
        # existed meta information load from `BaseDataset.METAINFO` and
        # `metainfo` arguments defined in constructor.
        for k, v in metainfo.items():
            self._metainfo.setdefault(k, v)

        self.process_metainfo()

        # load and parse data_infos.
        data_list = []
        for raw_data_info in tqdm.tqdm(
            raw_data_list,
            mininterval=10,
            desc=f"Loading {'Test' if self.test_mode else 'Train'} dataset",
        ):
            if self.part is not None:
                valid = False
                for x in self.part:
                    if x in raw_data_info["sample_idx"]:
                        valid = True
                        break
                if not valid:
                    continue

            data_info = self.parse_data_info(raw_data_info)
            if data_info is None:
                continue
            assert isinstance(data_info, dict)
            data_list.append(data_info)

            if (
                self.dataset_length is not None
                and len(data_list) >= self.dataset_length
            ):
                break
        return data_list

    @staticmethod
    def _get_axis_align_matrix(info: dict):
        if 'axis_align_matrix' in info:
            return np.array(info['axis_align_matrix'])
        else:
            warnings.warn(
                'axis_align_matrix is not found in ScanNet data info, please '
                'use new pre-process scripts to re-generate ScanNet data')
            return np.eye(4).astype(np.float32)

    @staticmethod
    def _is_view_dep(text):
        """Check whether to augment based on sr3d utterance."""
        rels = [
            'front', 'behind', 'back', 'left', 'right', 'facing', 'leftmost',
            'rightmost', 'looking', 'across'
        ]
        words = set(text.split())
        return any(rel in words for rel in rels)

    def convert_info_to_scan(self):
        self.scans = dict()
        for data in self.data_list:
            scan_id = data['scan_id']
            self.scans[scan_id] = data
        self.scan_ids = list(self.scans.keys())

    def load_language_data(self):
        self.convert_info_to_scan()
        if isinstance(self.vg_file, str):
            language_annotations = load(os.path.join(self.data_root, self.vg_file))
        else:
            language_annotations = []
            for x in self.vg_file:
                language_annotations.extend(load(os.path.join(self.data_root, x)))
        if self.dataset_length is not None:
            interval = len(language_annotations) / self.dataset_length
            output = []
            for i in range(self.dataset_length):
                output.append(ids[int(interval*i)])
            language_annotations = output
        self.data_list = language_annotations
        self.scan_id_to_data_idx = {}
        for scan_id in self.scan_ids:
            self.scan_id_to_data_idx[scan_id] = []
        for i, d in enumerate(self.data_list):
            self.scan_id_to_data_idx[d["scan_id"]].append(i)

    def convert_to_continuous(self):
        self.convert_info_to_scan()
        data_list = []
        self.flag = []
        self.image_id_dict = {}
        for flag, scan_id in enumerate(self.scan_ids):
            total_n = len(self.scans[scan_id]["images"])
            sample_n = min(self.max_n_images, total_n)
            ids = sample(total_n, sample_n, True).tolist()
            self.image_id_dict[scan_id] = ids
            if self.n_images_per_sample > 1:
                if self.drop_last_per_scene:
                    sample_n = math.floor(sample_n / self.n_images_per_sample)
                else:
                    sample_n = math.ceil(sample_n / self.n_images_per_sample)

                ids = [
                    ids[i*self.n_images_per_sample : (i+1)*self.n_images_per_sample]
                    for i in range(sample_n)
                ]
            data_list.extend(
                [dict(scan_id=scan_id, image_id=i) for i in ids]
            )
            self.flag.extend([flag] * len(ids))
        self.data_list = data_list
        self.flag = np.array(self.flag)

    def get_data_info_continuous(self, data_info):
        scan_id = data_info["scan_id"]
        data = copy.deepcopy(self.scans[scan_id])
        img_idx = data_info["image_id"]
        if isinstance(img_idx, int):
            img_idx = [img_idx]
        for k in ["images", "img_path", "depth_img_path"]:
            data[k] = index(data[k], img_idx)

        seen_image_id = self.image_id_dict[scan_id]
        seen_image_id = seen_image_id[:seen_image_id.index(img_idx[0])]

        if len(seen_image_id) != 0:
            last_visible_mask = index(
                data["ann_info"]["visible_instance_masks"], seen_image_id
            )
            last_visible_mask = np.stack(last_visible_mask).any(axis=0)
        else:
            last_visible_mask = data["ann_info"]["visible_instance_masks"][0] * False

        visible_instance_masks = index(
            data["ann_info"]["visible_instance_masks"], img_idx
        )

        current_visible_mask = np.stack(visible_instance_masks).any(axis=0)
        ignore_mask = np.logical_and(
            last_visible_mask, ~current_visible_mask
        )
        all_visible_mask = np.logical_or(
            last_visible_mask, current_visible_mask,
        )
        data["visible_instance_masks"] = visible_instance_masks
        data["all_visible_mask"] = all_visible_mask
        data["ignore_mask"] = ignore_mask

        data["depth2img"]["extrinsic"] = index(
            data["depth2img"]["extrinsic"], img_idx
        )
        if isinstance(data["depth2img"]["intrinsic"], list):
            data["depth2img"]["intrinsic"] = index(
                data["depth2img"]["intrinsic"], img_idx
            )
        return data

    def merge_grounding_data(self, data_infos):
        output = dict(
            text="",
        )
        for key in ["target_id", "distractor_ids", "target", "anchors", "anchor_ids", "tokens_positive"]:
            if key in data_infos[0]:
                output[key] = []
        for data_info in data_infos:
            if "target_id" in data_info and data_info["target_id"] in output["target_id"]:
                continue

            if self.tokens_positive_rebuild and "target" in data_info:
                start_idx = data_info["text"].find(data_info["target"])
                end_idx = start_idx + len(data_info["target"])
                tokens_positive = [[start_idx, end_idx]]
            elif "tokens_positive" in data_info:
                tokens_positive = data_info["tokens_positive"]
            else:
                tokens_positive = None

            if len(output["text"]) == 0:
                output["text"] = data_info["text"]
            else:
                if tokens_positive is not None:
                    tokens_positive = np.array(tokens_positive)
                    tokens_positive += len(output["text"]) + len(self.sep_token)
                    tokens_positive = tokens_positive.tolist()
                output["text"] += self.sep_token + data_info["text"]
            if tokens_positive is not None:
                output["tokens_positive"].append(tokens_positive)
            for k in ["target_id", "distractor_ids", "target", "anchors", "anchor_ids"]:
                if k not in data_info:
                    continue
                output[k].append(data_info[k])
        output["scan_id"] = data_infos[0]["scan_id"]
        return output

    def get_data_info_grounding(self, data_info):

        flags = {}
        if "distractor_ids" in data_info:
            flags["is_unique"] = len(data_info["distractor_ids"]) == 0
            flags["is_hard"] = len(data_info["distractor_ids"]) > 3
        if "text" in data_info:
            flags["is_view_dep"] = self._is_view_dep(data_info["text"])

        scan_id = data_info["scan_id"]
        scan_data = copy.deepcopy(self.scans[scan_id])
        data_info = [data_info]
        if self.num_text > 1:
            data_idx = self.scan_id_to_data_idx[scan_id]
            sample_idx = sample(
                len(data_idx),
                max(min(int(np.random.rand()*self.num_text), len(data_idx))-1, 1),
                fix_interval=False
            )
            for i in sample_idx:
                data_info.append(super().get_data_info(data_idx[i]))
        data_info = self.merge_grounding_data(data_info)
        scan_data["text"] = data_info["text"]

        if "ann_info" in scan_data and "target" in data_info:
            tokens_positive = []
            obj_idx = []
            for i, (target_name, id) in enumerate(
                zip(data_info["target"], data_info["target_id"])
            ):
                mask = np.logical_and(
                    scan_data["ann_info"]["bbox_id"] == id,
                    np.array(scan_data["ann_info"]["gt_names"]) == target_name
                )
                if np.sum(mask) != 1:
                    continue
                obj_idx.append(np.where(mask)[0][0])
                tokens_positive.append(data_info["tokens_positive"][i])
            obj_idx = np.array(obj_idx, dtype=np.int32)
            scan_data["ann_info"]["gt_bboxes_3d"] = scan_data["ann_info"]["gt_bboxes_3d"][obj_idx]
            scan_data["ann_info"]["gt_labels_3d"] = scan_data["ann_info"]["gt_labels_3d"][obj_idx]
            scan_data["ann_info"]["gt_names"] = [
                scan_data["ann_info"]["gt_names"][i] for i in obj_idx
            ]
            if "visible_instance_masks" in scan_data["ann_info"]:
                scan_data["ann_info"]["visible_instance_masks"] = [
                    visible_instance_mask[obj_idx]
                    for visible_instance_mask in scan_data["ann_info"]["visible_instance_masks"]
                ]
            scan_data["tokens_positive"] = tokens_positive
            scan_data["eval_ann_info"] = scan_data["ann_info"]
            scan_data["eval_ann_info"].update(flags)
        elif "tokens_positive" in data_info:
            scan_data["tokens_positive"] = data_info.get("tokens_positive")
        return scan_data

    @force_full_init
    def get_data_info(self, idx):
        data_info = super().get_data_info(idx)
        if self.mode == "detection":
            return data_info
        elif self.mode == "continuous":
            return self.get_data_info_continuous(data_info)
        elif self.mode == "grounding":
            return self.get_data_info_grounding(data_info)


def index(input, idx):
    if isinstance(idx, int):
        idx = [idx]
    output = []
    for i in idx:
        output.append(input[i])
    return output


================================================
FILE: bip3d/datasets/transforms/__init__.py
================================================
from .augmentation import (
    GlobalRotScaleTrans,
    RandomFlip3D,
    ResizeCropFlipImage,
)
from .formatting import Pack3DDetInputs
from .loading import LoadAnnotations3D, LoadDepthFromFile
from .multiview import MultiViewPipeline

from .transform import (
    CategoryGroundingDataPrepare,
    CamIntrisicStandardization,
    CustomResize,
    DepthProbLabelGenerator,
)


================================================
FILE: bip3d/datasets/transforms/augmentation.py
================================================
from typing import List, Union

import cv2
import numpy as np
from mmcv.transforms import BaseTransform
from mmdet.datasets.transforms import RandomFlip

from bip3d.registry import TRANSFORMS


@TRANSFORMS.register_module()
class RandomFlip3D(RandomFlip):
    """Flip the points & bbox.

    If the input dict contains the key "flip", then the flag will be used,
    otherwise it will be randomly decided by a ratio specified in the init
    method.

    Required Keys:

    - points (np.float32)
    - gt_bboxes_3d (np.float32)

    Modified Keys:

    - points (np.float32)
    - gt_bboxes_3d (np.float32)

    Added Keys:

    - points (np.float32)
    - pcd_trans (np.float32)
    - pcd_rotation (np.float32)
    - pcd_rotation_angle (np.float32)
    - pcd_scale_factor (np.float32)

    Args:
        sync_2d (bool): Whether to apply flip according to the 2D
            images. If True, it will apply the same flip as that to 2D images.
            If False, it will decide whether to flip randomly and independently
            to that of 2D images. Defaults to True.
        flip_2d (bool): Whether to apply flip for the img data.
            If True, it will adopt the flip augmentation for the img.
            False occurs on bev augmentation for bev-based image 3d det.
            Defaults to True.
        flip_3d (bool): Whether to apply flip for the 3d point cloud data.
            If True, it will adopt the flip augmentation for the point cloud.
            Defaults to True.
        flip_ratio_bev_horizontal (float): The flipping probability
            in horizontal direction. Defaults to 0.0.
        flip_ratio_bev_vertical (float): The flipping probability
            in vertical direction. Defaults to 0.0.
        flip_box3d (bool): Whether to flip bounding box. In most of the case,
            the box should be fliped. In cam-based bev detection, this is set
            to False, since the flip of 2D images does not influence the 3D
            box. Defaults to True.
    """

    def __init__(self,
                 sync_2d: bool = True,
                 flip_2d: bool = True,
                 flip_3d: bool = True,
                 flip_ratio_bev_horizontal: float = 0.0,
                 flip_ratio_bev_vertical: float = 0.0,
                 flip_box3d: bool = True,
                 update_lidar2cam: bool = False,
                 **kwargs) -> None:
        # `flip_ratio_bev_horizontal` is equal to
        # for flip prob of 2d image when
        # `sync_2d` is True
        super(RandomFlip3D, self).__init__(prob=flip_ratio_bev_horizontal,
                                           direction='horizontal',
                                           **kwargs)
        self.sync_2d = sync_2d
        self.flip_2d = flip_2d
        self.flip_3d = flip_3d
        self.flip_ratio_bev_horizontal = flip_ratio_bev_horizontal
        self.flip_ratio_bev_vertical = flip_ratio_bev_vertical
        self.flip_box3d = flip_box3d
        self.update_lidar2cam = update_lidar2cam
        if flip_ratio_bev_horizontal is not None:
            assert isinstance(flip_ratio_bev_horizontal, (int, float)) \
                    and 0 <= flip_ratio_bev_horizontal <= 1
        if flip_ratio_bev_vertical is not None:
            assert isinstance(flip_ratio_bev_vertical, (int, float)) \
                    and 0 <= flip_ratio_bev_vertical <= 1

    def transform(self, input_dict: dict) -> dict:
        """Call function to flip points, values in the ``bbox3d_fields`` and
        also flip 2D image and its annotations.

        Args:
            input_dict (dict): Result dict from loading pipeline.

        Returns:
            dict: Flipped results, 'flip', 'flip_direction',
            'pcd_horizontal_flip' and 'pcd_vertical_flip' keys are added
            into result dict.
        """
        # flip 2D image and its annotations
        if self.flip_2d:
            # only handle the 2D image
            if 'img' in input_dict:
                super(RandomFlip3D, self).transform(input_dict)
            flip = input_dict.get('flip', False)
            if flip:
                input_dict = self.random_flip_data_2d(input_dict)

        if self.flip_3d:
            # only handle the 3D points
            if self.sync_2d and 'img' in input_dict:
                # TODO check if this is necessary in FOCS3D
                input_dict['pcd_horizontal_flip'] = input_dict['flip']
                input_dict['pcd_vertical_flip'] = False
            else:
                if 'pcd_horizontal_flip' not in input_dict:
                    if np.random.rand() < self.flip_ratio_bev_horizontal:
                        flip_horizontal = True
                    else:
                        flip_horizontal = False
                    input_dict['pcd_horizontal_flip'] = flip_horizontal
                if 'pcd_vertical_flip' not in input_dict:
                    if np.random.rand() < self.flip_ratio_bev_vertical:
                        flip_vertical = True
                    else:
                        flip_vertical = False
                    input_dict['pcd_vertical_flip'] = flip_vertical

            if 'transformation_3d_flow' not in input_dict:
                input_dict['transformation_3d_flow'] = []

            if input_dict['pcd_horizontal_flip']:
                self.random_flip_data_3d(input_dict, 'horizontal')
                input_dict['transformation_3d_flow'].extend(['HF'])
            if input_dict['pcd_vertical_flip']:
                self.random_flip_data_3d(input_dict, 'vertical')
                input_dict['transformation_3d_flow'].extend(['VF'])
            if self.update_lidar2cam:
                self._transform_lidar2cam(input_dict)
        return input_dict

    def random_flip_data_3d(self,
                            input_dict: dict,
                            direction: str = 'horizontal') -> None:
        """Flip 3D data randomly.

        `random_flip_data_3d` should take these situations into consideration:

        - 1. LIDAR-based 3d detection
        - 2. LIDAR-based 3d segmentation
        - 3. vision-only detection
        - 4. multi-modality 3d detection.

        Args:
            input_dict (dict): Result dict from loading pipeline.
            direction (str): Flip direction. Defaults to 'horizontal'.

        Returns:
            dict: Flipped results, 'points', 'bbox3d_fields' keys are
            updated in the result dict.
        """
        assert direction in ['horizontal', 'vertical']
        if self.flip_box3d:
            if 'gt_bboxes_3d' in input_dict:
                if 'points' in input_dict:
                    input_dict['points'] = input_dict['gt_bboxes_3d'].flip(
                        direction, points=input_dict['points'])
                else:
                    # vision-only detection
                    input_dict['gt_bboxes_3d'].flip(direction)
            else:
                input_dict['points'].flip(direction)

    def random_flip_data_2d(self,
                            input_dict: dict,
                            direction: str = 'horizontal') -> dict:
        if 'centers_2d' in input_dict:
            assert self.sync_2d is True and direction == 'horizontal', \
                'Only support sync_2d=True and horizontal flip with images'
            w = input_dict['img_shape'][1]
            input_dict['centers_2d'][..., 0] = \
                w - input_dict['centers_2d'][..., 0]
            # need to modify the horizontal position of camera center
            # along u-axis in the image (flip like centers2d)
            # ['cam2img'][0][2] = c_u
            # see more details and examples at
            # https://github.com/open-mmlab/mmdetection3d/pull/744
            input_dict['cam2img'][0][2] = w - input_dict['cam2img'][0][2]

        if 'fov_ori2aug' not in input_dict:
            fov_ori2aug = np.eye(4, 4)
        else:
            fov_ori2aug = input_dict['fov_ori2aug']
        # get the value of w
        w = input_dict['img_shape'][1]
        # flip_matrix[0,0] = -1
        # flip_matrix[0,3] = w
        # fov_ori2aug = np.matmul(fov_ori2aug, flip_matrix)
        fov_ori2aug[0] *= -1
        fov_ori2aug[0, 3] += w
        input_dict['fov_ori2aug'] = fov_ori2aug
        return input_dict

    def _flip_on_direction(self, results: dict) -> None:
        """Function to flip images, bounding boxes, semantic segmentation map
        and keypoints.

        Add the override feature that if 'flip' is already in results, use it
        to do the augmentation.
        """
        if 'flip' not in results:
            cur_dir = self._choose_direction()
        else:
            # `flip_direction` works only when `flip` is True.
            # For example, in `MultiScaleFlipAug3D`, `flip_direction` is
            # 'horizontal' but `flip` is False.
            if results['flip']:
                assert 'flip_direction' in results, 'flip and flip_direction '
                'must exist simultaneously'
                cur_dir = results['flip_direction']
            else:
                cur_dir = None
        if cur_dir is None:
            results['flip'] = False
            results['flip_direction'] = None
        else:
            results['flip'] = True
            results['flip_direction'] = cur_dir
            self._flip(results)

    def _transform_lidar2cam(self, results: dict) -> None:
        """TODO."""
        aug_matrix = np.eye(4)
        if results.get('pcd_horizontal_flip', False):
            aug_matrix[1, 1] *= -1
        if results.get('pcd_vertical_flip', False):
            aug_matrix[0, 0] *= -1
        lidar2cam_list = []
        for lidar2cam in results['lidar2cam']:
            lidar2cam = np.array(lidar2cam)
            lidar2cam = np.matmul(lidar2cam, aug_matrix)
            lidar2cam_list.append(lidar2cam.tolist())
        results['lidar2cam'] = lidar2cam_list

    def __repr__(self) -> str:
        """str: Return a string that describes the module."""
        repr_str = self.__class__.__name__
        repr_str += f'(sync_2d={self.sync_2d},'
        repr_str += f' flip_ratio_bev_vertical={self.flip_ratio_bev_vertical})'
        return repr_str


@TRANSFORMS.register_module()
class GlobalRotScaleTrans(BaseTransform):
    """Apply global rotation, scaling and translation to a 3D scene.

    Required Keys:

    - points (np.float32)
    - gt_bboxes_3d (np.float32)

    Modified Keys:

    - points (np.float32)
    - gt_bboxes_3d (np.float32)

    Added Keys:

    - points (np.float32)
    - pcd_trans (np.float32)
    - pcd_rotation (np.float32)
    - pcd_rotation_angle (np.float32)
    - pcd_scale_factor (np.float32)

    Args:
        rot_range (list[float]): Range of rotation angle.
            Defaults to [-0.78539816, 0.78539816] (close to [-pi/4, pi/4]).
        rot_dof (int): DoF of rotation noise. Defaults to 1.
        scale_ratio_range (list[float]): Range of scale ratio.
            Defaults to [0.95, 1.05].
        translation_std (list[float]): The standard deviation of
            translation noise applied to a scene, which
            is sampled from a gaussian distribution whose standard deviation
            is set by ``translation_std``. Defaults to [0, 0, 0].
        shift_height (bool): Whether to shift height.
            (the fourth dimension of indoor points) when scaling.
            Defaults to False.
    """

    def __init__(self,
                 rot_range: Union[List[float], int,
                                  float] = [-0.78539816, 0.78539816],
                 rot_dof: int = 1,
                 scale_ratio_range: List[float] = [0.95, 1.05],
                 translation_std: List[int] = [0, 0, 0],
                 shift_height: bool = False,
                 update_lidar2cam: bool = False) -> None:
        seq_types = (list, tuple, np.ndarray)
        if not isinstance(rot_range, seq_types):
            assert isinstance(rot_range, (int, float)), \
                f'unsupported rot_range type {type(rot_range)}'
            rot_range = [-rot_range, rot_range]
        self.rot_range = rot_range
        self.rot_dof = rot_dof
        self.update_lidar2cam = update_lidar2cam

        assert isinstance(scale_ratio_range, seq_types), \
            f'unsupported scale_ratio_range type {type(scale_ratio_range)}'

        self.scale_ratio_range = scale_ratio_range

        if not isinstance(translation_std, seq_types):
            assert isinstance(translation_std, (int, float)), \
                f'unsupported translation_std type {type(translation_std)}'
            translation_std = [
                translation_std, translation_std, translation_std
            ]
        assert all([std >= 0 for std in translation_std]), \
            'translation_std should be positive'
        self.translation_std = translation_std
        self.shift_height = shift_height

    def transform(self, input_dict: dict) -> dict:
        """Private function to rotate, scale and translate bounding boxes and
        points.

        Args:
            input_dict (dict): Result dict from loading pipeline.

        Returns:
            dict: Results after scaling, 'points', 'pcd_rotation',
            'pcd_scale_factor', 'pcd_trans' and `gt_bboxes_3d` are updated
            in the result dict.
        """
        if 'transformation_3d_flow' not in input_dict:
            input_dict['transformation_3d_flow'] = []

        self._rot_bbox_points(input_dict)

        if 'pcd_scale_factor' not in input_dict:
            self._random_scale(input_dict)
        self._scale_bbox_points(input_dict)

        self._trans_bbox_points(input_dict)

        input_dict['transformation_3d_flow'].extend(['R', 'S', 'T'])
        if self.update_lidar2cam:
            self._transform_lidar2cam(input_dict)
        return input_dict

    def _trans_bbox_points(self, input_dict: dict) -> None:
        """Private function to translate bounding boxes and points.

        Args:
            input_dict (dict): Result dict from loading pipeline.

        Returns:
            dict: Results after translation, 'points', 'pcd_trans'
            and `gt_bboxes_3d` is updated in the result dict.
        """
        translation_std = np.array(self.translation_std, dtype=np.float32)
        trans_factor = np.random.normal(scale=translation_std, size=3).T

        if 'points' in input_dict:
            input_dict['points'].translate(trans_factor)
        input_dict['pcd_trans'] = trans_factor
        if 'gt_bboxes_3d' in input_dict:
            input_dict['gt_bboxes_3d'].translate(trans_factor)

    def _rot_bbox_points(self, input_dict: dict) -> None:
        """Private function to rotate bounding boxes and points.

        Args:
            input_dict (dict): Result dict from loading pipeline.

        Returns:
            dict: Results after rotation, 'points', 'pcd_rotation'
            and `gt_bboxes_3d` is updated in the result dict.
        """
        rotation = self.rot_range
        if self.rot_dof == 1:
            noise_rotation = np.random.uniform(rotation[0], rotation[1])
            noise_rotation *= -1
        elif self.rot_dof > 1:
            noise_rotation = np.array([
                -np.random.uniform(rotation[0], rotation[1]),
                -np.random.uniform(rotation[0], rotation[1]),
                -np.random.uniform(rotation[0], rotation[1])
            ])
        else:
            raise NotImplementedError
        # TODO delete this. And -1 is to align the rotation with
        # the version of 0.17.
        # if 'gt_bboxes_3d' in input_dict and \
        #         len(input_dict['gt_bboxes_3d'].tensor) != 0:
        if 'gt_bboxes_3d' in input_dict:
            # rotate points with bboxes
            if 'points' in input_dict:
                points, rot_mat_T = input_dict['gt_bboxes_3d'].rotate(
                    noise_rotation, input_dict['points'])
                input_dict['points'] = points
            else:
                rot_mat_T = input_dict['gt_bboxes_3d'].rotate(noise_rotation)
        elif 'points' in input_dict:
            # if no bbox in input_dict, only rotate points
            rot_mat_T = input_dict['points'].rotate(noise_rotation)

        input_dict['pcd_rotation'] = rot_mat_T
        input_dict['pcd_rotation_angle'] = noise_rotation

    def _scale_bbox_points(self, input_dict: dict) -> None:
        """Private function to scale bounding boxes and points.

        Args:
            input_dict (dict): Result dict from loading pipeline.

        Returns:
            dict: Results after scaling, 'points' and
            `gt_bboxes_3d` is updated in the result dict.
        """
        scale = input_dict['pcd_scale_factor']
        if 'points' in input_dict:
            points = input_dict['points']
            points.scale(scale)
            if self.shift_height:
                assert 'height' in points.attribute_dims.keys(), \
                  'setting shift_height=True \
                   but points have no height attribute'

                points.tensor[:, points.attribute_dims['height']] *= scale
            input_dict['points'] = points

        if 'gt_bboxes_3d' in input_dict and \
                len(input_dict['gt_bboxes_3d'].tensor) != 0:
            input_dict['gt_bboxes_3d'].scale(scale)

    def _random_scale(self, input_dict: dict) -> None:
        """Private function to randomly set the scale factor.

        Args:
            input_dict (dict): Result dict from loading pipeline.

        Returns:
            dict: Results after scaling, 'pcd_scale_factor'
            are updated in the result dict.
        """
        scale_factor = np.random.uniform(self.scale_ratio_range[0],
                                         self.scale_ratio_range[1])
        input_dict['pcd_scale_factor'] = scale_factor

    def _transform_lidar2cam(self, input_dict: dict) -> None:
        aug_matrix = np.eye(4)

        if 'pcd_rotation' in input_dict:
            aug_matrix[:3, :3] = input_dict['pcd_rotation'].T.numpy(
            ) * input_dict['pcd_scale_factor']
        else:
            aug_matrix[:3, :3] = np.eye(3).view(
                1, 3, 3) * input_dict['pcd_scale_factor']
        aug_matrix[:3, -1] = input_dict['pcd_trans'].reshape(1, 3)
        aug_matrix[-1, -1] = 1.0
        aug_matrix = np.linalg.inv(aug_matrix)
        lidar2cam_list = []
        for lidar2cam in input_dict["depth2img"]["extrinsic"]:
        # for lidar2cam in input_dict['lidar2cam']:
            lidar2cam = np.array(lidar2cam)
            lidar2cam = np.matmul(lidar2cam, aug_matrix)
            lidar2cam_list.append(lidar2cam.tolist())
        input_dict["depth2img"]["extrinsic"] = lidar2cam_list
        # input_dict['lidar2cam'] = lidar2cam_list

    def __repr__(self) -> str:
        """str: Return a string that describes the module."""
        repr_str = self.__class__.__name__
        repr_str += f'(rot_range={self.rot_range},'
        repr_str += f' scale_ratio_range={self.scale_ratio_range},'
        repr_str += f' translation_std={self.translation_std},'
        repr_str += f' shift_height={self.shift_height})'
        return repr_str


@TRANSFORMS.register_module()
class ResizeCropFlipImage(object):
    def __init__(self, data_aug_conf, test_mode=False):
        self.data_aug_conf = data_aug_conf
        self.test_mode = test_mode

    def get_augmentation(self):
        if self.data_aug_conf is None:
            return None
        H, W = self.data_aug_conf["H"], self.data_aug_conf["W"]
        fH, fW = self.data_aug_conf["final_dim"]
        if not self.test_mode:
            resize = np.random.uniform(*self.data_aug_conf["resize_lim"])
            resize_dims = (int(W * resize), int(H * resize))
            newW, newH = resize_dims
            crop_h = (
                int(
                    (1 - np.random.uniform(*self.data_aug_conf["bot_pct_lim"]))
                    * newH
                )
                - fH
            )
            crop_w = int(np.random.uniform(0, max(0, newW - fW)))
            crop = (crop_w, crop_h, crop_w + fW, crop_h + fH)
            flip = False
            if self.data_aug_conf["rand_flip"] and np.random.choice([0, 1]):
                flip = True
            rotate = np.random.uniform(*self.data_aug_conf["rot_lim"])
            # rotate_3d = np.random.uniform(*self.data_aug_conf["rot3d_range"])
            rotate_3d = 0
        else:
            resize = max(fH / H, fW / W)
            resize_dims = (int(W * resize), int(H * resize))
            newW, newH = resize_dims
            crop_h = (
                int((1 - np.mean(self.data_aug_conf["bot_pct_lim"])) * newH)
                - fH
            )
            crop_w = int(max(0, newW - fW) / 2)
            crop = (crop_w, crop_h, crop_w + fW, crop_h + fH)
            flip = False
            rotate = 0
            rotate_3d = 0
        aug_config = {
            "resize": resize,
            "resize_dims": resize_dims,
            "crop": crop,
            "flip": flip,
            "rotate": rotate,
            "rotate_3d": rotate_3d,
        }
        return aug_config

    def __call__(self, results):
        img = results["img"]
        aug_configs = self.get_augmentation()
        resize = aug_configs.get("resize", 1)
        results["img"], extend_matrix = self._transform(img, aug_configs)
        results["trans_mat"] = extend_matrix
        if "depth_img" in results:
            results["depth_img"], _ = self._transform(results["depth_img"], aug_configs)
        return results

    def _transform(self, img, aug_configs):
        H, W = img.shape[:2]
        resize = aug_configs.get("resize", 1)
        resize_dims = (int(W * resize), int(H * resize))
        crop = aug_configs.get("crop", [0, 0, *resize_dims])
        flip = aug_configs.get("flip", False)
        rotate = aug_configs.get("rotate", 0)

        transform_matrix = np.eye(3)
        transform_matrix[:2, :2] *= resize
        transform_matrix[:2, 2] -= np.array(crop[:2])
        if flip:
            flip_matrix = np.array(
                [[-1, 0, crop[2] - crop[0]], [0, 1, 0], [0, 0, 1]]
            )
            transform_matrix = flip_matrix @ transform_matrix
        rotate = rotate / 180 * np.pi
        rot_matrix = np.array(
            [
                [np.cos(rotate), np.sin(rotate), 0],
                [-np.sin(rotate), np.cos(rotate), 0],
                [0, 0, 1],
            ]
        )
        rot_center = np.array([crop[2] - crop[0], crop[3] - crop[1]]) / 2
        rot_matrix[:2, 2] = -rot_matrix[:2, :2] @ rot_center + rot_center
        transform_matrix = rot_matrix @ transform_matrix
        extend_matrix = np.eye(4)
        extend_matrix[:3, :3] = transform_matrix

        img = cv2.warpAffine(img, extend_matrix[:2, :3], img.shape[:2][::-1])
        return img, extend_matrix


================================================
FILE: bip3d/datasets/transforms/formatting.py
================================================
from typing import List, Sequence, Union

import mmengine
import numpy as np
import torch
from mmcv.transforms import BaseTransform
from mmengine.structures import InstanceData, PixelData

from bip3d.registry import TRANSFORMS
from bip3d.structures.bbox_3d import BaseInstance3DBoxes
from bip3d.structures.points import BasePoints
from bip3d.utils.typing_config import Det3DDataElement, PointData


def to_tensor(
    data: Union[torch.Tensor, np.ndarray, Sequence, int,
                float]) -> torch.Tensor:
    """Convert objects of various python types to :obj:`torch.Tensor`.

    Supported types are: :class:`numpy.ndarray`, :class:`torch.Tensor`,
    :class:`Sequence`, :class:`int` and :class:`float`.

    Args:
        data (torch.Tensor | numpy.ndarray | Sequence | int | float): Data to
            be converted.

    Returns:
        torch.Tensor: the converted data.
    """

    if isinstance(data, torch.Tensor):
        return data
    elif isinstance(data, np.ndarray):
        if data.dtype is np.dtype('float64'):
            data = data.astype(np.float32)
        return torch.from_numpy(data)
    elif isinstance(data, Sequence) and not mmengine.is_str(data):
        return torch.tensor(data)
    elif isinstance(data, int):
        return torch.LongTensor([data])
    elif isinstance(data, float):
        return torch.FloatTensor([data])
    else:
        raise TypeError(f'type {type(data)} cannot be converted to tensor.')


@TRANSFORMS.register_module()
class Pack3DDetInputs(BaseTransform):
    INPUTS_KEYS = ['points', 'img', "depth_img"]
    # to be compatible with depths in bevdepth
    INSTANCEDATA_3D_KEYS = [
        'gt_bboxes_3d', 'gt_labels_3d', 'attr_labels', 'depths', 'centers_2d'
    ]
    INSTANCEDATA_2D_KEYS = [
        'gt_bboxes',
        'gt_bboxes_labels',
    ]

    SEG_KEYS = [
        'gt_seg_map', 'pts_instance_mask', 'pts_semantic_mask',
        'gt_semantic_seg'
    ]

    def __init__(
        self,
        keys: dict,
        meta_keys: dict = (
            'img_path', 'ori_shape', 'img_shape', 'lidar2img', 'depth2img',
            'cam2img', 'pad_shape', 'depth_map_path', 'scale_factor', 'flip',
            'pcd_horizontal_flip', 'pcd_vertical_flip', 'box_mode_3d',
            'box_type_3d', 'img_norm_cfg', 'num_pts_feats', 'pcd_trans',
            'sample_idx', 'pcd_scale_factor', 'pcd_rotation',
            'pcd_rotation_angle', 'lidar_path', 'transformation_3d_flow',
            'trans_mat', 'affine_aug', 'sweep_img_metas', 'ori_cam2img',
            'cam2global', 'crop_offset', 'img_crop_offset', 'resize_img_shape',
            'lidar2cam', 'ori_lidar2img', 'num_ref_frames', 'num_views',
            'ego2global', 'fov_ori2aug', 'ego2cam', 'axis_align_matrix',
            'text', 'tokens_positive', 'scan_id', "distractor_bboxes", "anchor_bboxes",
            "trans_mat", "ignore_mask",
        )):
        self.keys = keys
        self.meta_keys = meta_keys

    def _remove_prefix(self, key: str) -> str:
        if key.startswith('gt_'):
            key = key[3:]
        return key

    def transform(self, results: Union[dict,
                                       List[dict]]):
        """Method to pack the input data. when the value in this dict is a
        list, it usually is in Augmentations Testing.

        Args:
            results (dict | list[dict]): Result dict from the data pipeline.

        Returns:
            dict | List[dict]:

            - 'inputs' (dict): The forward data of models. It usually contains
              following keys:

                - points
                - img

            - 'data_samples' (:obj:`Det3DDataSample`): The annotation info of
              the sample.
        """
        # augtest
        if isinstance(results, list):
            if len(results) == 1:
                # simple test
                return self.pack_single_results(results[0])
            pack_results = []
            for single_result in results:
                pack_results.append(self.pack_single_results(single_result))
            return pack_results
        # norm training and simple testing
        elif isinstance(results, dict):
            return self.pack_single_results(results)
        else:
            raise NotImplementedError

    def pack_single_results(self, results: dict) -> dict:
        """Method to pack the single input data. when the value in this dict is
        a list, it usually is in Augmentations Testing.

        Args:
            results (dict): Result dict from the data pipeline.

        Returns:
            dict: A dict contains

            - 'inputs' (dict): The forward data of models. It usually contains
              following keys:

                - points
                - img

            - 'data_samples' (:obj:`Det3DDataSample`): The annotation info
              of the sample.
        """

        if 'points' in results:
            if isinstance(results['points'], BasePoints):
                results['points'] = results['points'].tensor
            # multi-sweep points
            elif isinstance(results['points'], list):
                if isinstance(results['points'][0], BasePoints):
                    for idx in range(len(results['points'])):
                        results['points'][idx] = results['points'][idx].tensor

        for key in ["img", "depth_img"]:
            if key not in results:
                continue
            if isinstance(results[key], list):
                # process multiple imgs in single frame
                imgs = np.stack(results[key], axis=0)
                if len(imgs.shape) <= 3:
                    imgs = imgs[..., None]
                if imgs.flags.c_contiguous:
                    imgs = to_tensor(imgs).permute(0, 3, 1, 2).contiguous()
                else:
                    imgs = to_tensor(
                        np.ascontiguousarray(imgs.transpose(0, 3, 1, 2)))
                results[key] = imgs
            else:
                img = results[key]
                if len(img.shape) < 3:
                    img = np.expand_dims(img, -1)
                # To improve the computational speed by by 3-5 times, apply:
                # `torch.permute()` rather than `np.transpose()`.
                # Refer to https://github.com/open-mmlab/mmdetection/pull/9533
                # for more details
                if img.flags.c_contiguous:
                    img = to_tensor(img).permute(2, 0, 1).contiguous()
                else:
                    img = to_tensor(
                        np.ascontiguousarray(img.transpose(2, 0, 1)))
                results[key] = img

        for key in [
                'proposals', 'gt_bboxes', 'gt_bboxes_ignore', 'gt_labels',
                'gt_bboxes_labels', 'attr_labels', 'pts_instance_mask',
                'pts_semantic_mask', 'centers_2d', 'depths', 'gt_labels_3d'
        ]:
            if key not in results:
                continue
            if isinstance(results[key], list):
                results[key] = [to_tensor(res) for res in results[key]]
            else:
                results[key] = to_tensor(results[key])
        if 'gt_bboxes_3d' in results:
            # multi-sweep version
            if isinstance(results['gt_bboxes_3d'], list):
                if not isinstance(results['gt_bboxes_3d'][0],
                                  BaseInstance3DBoxes):
                    for idx in range(len(results['gt_bboxes_3d'])):
                        results['gt_bboxes_3d'][idx] = to_tensor(
                            results['gt_bboxes_3d'][idx])
            elif not isinstance(results['gt_bboxes_3d'], BaseInstance3DBoxes):
                results['gt_bboxes_3d'] = to_tensor(results['gt_bboxes_3d'])

        if 'gt_semantic_seg' in results:
            results['gt_semantic_seg'] = to_tensor(
                results['gt_semantic_seg'][None])
        if 'gt_seg_map' in results:
            results['gt_seg_map'] = results['gt_seg_map'][None, ...]
        if 'depth_map' in results:
            results['depth_map'] = to_tensor(results['depth_map'])

        data_sample = Det3DDataElement()
        gt_instances_3d = InstanceData()
        gt_instances = InstanceData()
        gt_pts_seg = PointData()
        gt_depth_map = PixelData()

        data_metas = {}
        for key in self.meta_keys:
            if key in results:
                data_metas[key] = results[key]
            # TODO: unify ScanNet multi-view info with nuScenes and Waymo
            elif 'images' in results and isinstance(results['images'], dict):
                if len(results['images'].keys()) == 1:
                    cam_type = list(results['images'].keys())[0]
                    # single-view image
                    if key in results['images'][cam_type]:
                        data_metas[key] = results['images'][cam_type][key]
                else:
                    # multi-view image
                    img_metas = []
                    cam_types = list(results['images'].keys())
                    for cam_type in cam_types:
                        if key in results['images'][cam_type]:
                            img_metas.append(results['images'][cam_type][key])
                    if len(img_metas) > 0:
                        data_metas[key] = img_metas
            elif 'lidar_points' in results and isinstance(
                    results['lidar_points'], dict):
                if key in results['lidar_points']:
                    data_metas[key] = results['lidar_points'][key]
        data_sample.set_metainfo(data_metas)

        inputs = {}
        for key in self.keys:
            if key in results:
                if key in self.INPUTS_KEYS:
                    inputs[key] = results[key]
                elif key in self.INSTANCEDATA_3D_KEYS:
                    gt_instances_3d[self._remove_prefix(key)] = results[key]
                elif key in self.INSTANCEDATA_2D_KEYS:
                    if key == 'gt_bboxes_labels':
                        gt_instances['labels'] = results[key]
                    else:
                        gt_instances[self._remove_prefix(key)] = results[key]
                elif key in self.SEG_KEYS:
                    gt_pts_seg[self._remove_prefix(key)] = results[key]
                elif key == 'depth_map':
                    gt_depth_map.set_data(dict(data=results[key]))
                elif key == 'gt_occupancy':
                    data_sample.gt_occupancy = to_tensor(
                        results['gt_occupancy'])
                    if isinstance(results['gt_occupancy_masks'], list):
                        data_sample.gt_occupancy_masks = [
                            to_tensor(mask)
                            for mask in results['gt_occupancy_masks']
                        ]
                    else:
                        data_sample.gt_occupancy_masks = to_tensor(
                            results['gt_occupancy_masks'])
                else:
                    raise NotImplementedError(f'Please modified '
                                              f'`Pack3DDetInputs` '
                                              f'to put {key} to '
                                              f'corresponding field')

        data_sample.gt_instances_3d = gt_instances_3d
        data_sample.gt_instances = gt_instances
        data_sample.gt_pts_seg = gt_pts_seg
        data_sample.gt_depth_map = gt_depth_map

        if 'eval_ann_info' in results:
            data_sample.eval_ann_info = results['eval_ann_info']
        else:
            data_sample.eval_ann_info = None

        packed_results = dict()
        packed_results['data_samples'] = data_sample
        packed_results['inputs'] = inputs
        return packed_results

    def __repr__(self) -> str:
        """str: Return a string that describes the module."""
        repr_str = self.__class__.__name__
        repr_str += f'(keys={self.keys})'
        repr_str += f'(meta_keys={self.meta_keys})'
        return repr_str


================================================
FILE: bip3d/datasets/transforms/loading.py
================================================
from typing import Optional

import mmcv
import mmengine
import numpy as np
from mmcv.transforms import BaseTransform
from mmdet.datasets.transforms import LoadAnnotations

from bip3d.registry import TRANSFORMS


@TRANSFORMS.register_module()
class LoadDepthFromFile(BaseTransform):
    """Load a depth image from file.

    Required Keys:

    - depth_img_path

    Modified Keys:

    - depth_img
    - depth_img_shape

    Args:
        imdecode_backend (str): The image decoding backend type. The backend
            argument for :func:`mmcv.imfrombytes`.
            See :func:`mmcv.imfrombytes` for details.
            Defaults to 'cv2'.
        ignore_empty (bool): Whether to allow loading empty image or file path
            not existent. Defaults to False.
        backend_args (dict, optional): Instantiates the corresponding file
            backend. It may contain `backend` key to specify the file
            backend. If it contains, the file backend corresponding to this
            value will be used and initialized with the remaining values,
            otherwise the corresponding file backend will be selected
            based on the prefix of the file path. Defaults to None.
            New in version 2.0.0rc4.
    """

    def __init__(self,
                 imdecode_backend: str = 'cv2',
                 ignore_empty: bool = False,
                 *,
                 backend_args: Optional[dict] = None):
        self.ignore_empty = ignore_empty
        self.imdecode_backend = imdecode_backend

        self.backend_args = None
        if backend_args is not None:
            self.backend_args = backend_args.copy()

    def transform(self, results: dict):
        """Functions to load image.

        Args:
            results (dict): Result dict from
                :class:`mmengine.dataset.BaseDataset`.

        Returns:
            dict: The dict contains loaded image and meta information.
        """

        filename = results['depth_img_path']
        depth_shift = results['depth_shift']

        try:
            depth_img_bytes = mmengine.fileio.get(
                filename, backend_args=self.backend_args)
            depth_img = mmcv.imfrombytes(depth_img_bytes,
                                         flag='unchanged',
                                         backend=self.imdecode_backend).astype(
                                             np.float32) / depth_shift
        except Exception as e:
            if self.ignore_empty:
                return None
            else:
                raise e
        results['depth_img'] = depth_img
        return results

    def __repr__(self):
        repr_str = (f'{self.__class__.__name__}('
                    f'ignore_empty={self.ignore_empty}, '
                    f"imdecode_backend='{self.imdecode_backend}', ")

        if self.backend_args is not None:
            repr_str += f'backend_args={self.backend_args})'
        else:
            repr_str += f'backend_args={self.backend_args})'

        return repr_str


# TODO : refine
@TRANSFORMS.register_module()
class LoadAnnotations3D(LoadAnnotations):
    """Load Annotations3D.

    Load instance mask and semantic mask of points and
    encapsulate the items into related fields.

    Required Keys:

    - ann_info (dict)

        - gt_bboxes_3d (:obj:`LiDARInstance3DBoxes` |
          :obj:`DepthInstance3DBoxes` | :obj:`CameraInstance3DBoxes`):
          3D ground truth bboxes. Only when `with_bbox_3d` is True
        - gt_labels_3d (np.int64): Labels of ground truths.
          Only when `with_label_3d` is True.
        - gt_bboxes (np.float32): 2D ground truth bboxes.
          Only when `with_bbox` is True.
        - gt_labels (np.ndarray): Labels of ground truths.
          Only when `with_label` is True.
        - depths (np.ndarray): Only when
          `with_bbox_depth` is True.
        - centers_2d (np.ndarray): Only when
          `with_bbox_depth` is True.
        - attr_labels (np.ndarray): Attribute labels of instances.
          Only when `with_attr_label` is True.

    - pts_instance_mask_path (str): Path of instance mask file.
      Only when `with_mask_3d` is True.
    - pts_semantic_mask_path (str): Path of semantic mask file.
      Only when `with_seg_3d` is True.
    - pts_panoptic_mask_path (str): Path of panoptic mask file.
      Only when both `with_panoptic_3d` is True.

    Added Keys:

    - gt_bboxes_3d (:obj:`LiDARInstance3DBoxes` |
      :obj:`DepthInstance3DBoxes` | :obj:`CameraInstance3DBoxes`):
      3D ground truth bboxes. Only when `with_bbox_3d` is True
    - gt_labels_3d (np.int64): Labels of ground truths.
      Only when `with_label_3d` is True.
    - gt_bboxes (np.float32): 2D ground truth bboxes.
      Only when `with_bbox` is True.
    - gt_labels (np.int64): Labels of ground truths.
      Only when `with_label` is True.
    - depths (np.float32): Only when
      `with_bbox_depth` is True.
    - centers_2d (np.ndarray): Only when
      `with_bbox_depth` is True.
    - attr_labels (np.int64): Attribute labels of instances.
      Only when `with_attr_label` is True.
    - pts_instance_mask (np.int64): Instance mask of each point.
      Only when `with_mask_3d` is True.
    - pts_semantic_mask (np.int64): Semantic mask of each point.
      Only when `with_seg_3d` is True.

    Args:
        with_bbox_3d (bool): Whether to load 3D boxes. Defaults to True.
        with_label_3d (bool): Whether to load 3D labels. Defaults to True.
        with_attr_label (bool): Whether to load attribute label.
            Defaults to False.
        with_mask_3d (bool): Whether to load 3D instance masks for points.
            Defaults to False.
        with_seg_3d (bool): Whether to load 3D semantic masks for points.
            Defaults to False.
        with_bbox (bool): Whether to load 2D boxes. Defaults to False.
        with_label (bool): Whether to load 2D labels. Defaults to False.
        with_mask (bool): Whether to load 2D instance masks. Defaults to False.
        with_seg (bool): Whether to load 2D semantic masks. Defaults to False.
        with_bbox_depth (bool): Whether to load 2.5D boxes. Defaults to False.
        with_panoptic_3d (bool): Whether to load 3D panoptic masks for points.
            Defaults to False.
        poly2mask (bool): Whether to convert polygon annotations to bitmasks.
            Defaults to True.
        seg_3d_dtype (str): String of dtype of 3D semantic masks.
            Defaults to 'np.int64'.
        seg_offset (int): The offset to split semantic and instance labels from
            panoptic labels. Defaults to None.
        dataset_type (str): Type of dataset used for splitting semantic and
            instance labels. Defaults to None.
        backend_args (dict, optional): Arguments to instantiate the
            corresponding backend. Defaults to None.
    """

    def __init__(self,
                 with_bbox_3d: bool = True,
                 with_label_3d: bool = True,
                 with_depth_map: bool = False,
                 with_attr_label: bool = False,
                 with_mask_3d: bool = False,
                 with_seg_3d: bool = False,
                 with_bbox: bool = False,
                 with_label: bool = False,
                 with_mask: bool = False,
                 with_seg: bool = False,
                 with_bbox_depth: bool = False,
                 with_panoptic_3d: bool = False,
                 with_visible_instance_masks: bool = False,
                 with_occupancy: bool = False,
                 with_visible_occupancy_masks: bool = False,
                 poly2mask: bool = True,
                 seg_3d_dtype: str = 'np.int64',
                 seg_offset: int = None,
                 dataset_type: str = None,
                 backend_args: Optional[dict] = None):
        super().__init__(with_bbox=with_bbox,
                         with_label=with_label,
                         with_mask=with_mask,
                         with_seg=with_seg,
                         poly2mask=poly2mask,
                         backend_args=backend_args)
        self.with_bbox_3d = with_bbox_3d
        self.with_bbox_depth = with_bbox_depth
        self.with_label_3d = with_label_3d
        self.with_depth_map = with_depth_map
        self.with_attr_label = with_attr_label
        self.with_mask_3d = with_mask_3d
        self.with_seg_3d = with_seg_3d
        self.with_panoptic_3d = with_panoptic_3d
        self.with_visible_instance_masks = with_visible_instance_masks
        self.with_occupancy = with_occupancy
        self.with_visible_occupancy_masks = with_visible_occupancy_masks
        self.seg_3d_dtype = eval(seg_3d_dtype)
        self.seg_offset = seg_offset
        self.dataset_type = dataset_type

    def _load_bboxes_3d(self, results: dict):
        """Private function to move the 3D bounding box annotation from
        `ann_info` field to the root of `results`.

        Args:
            results (dict): Result dict from :obj:`mmdet3d.CustomDataset`.

        Returns:
            dict: The dict containing loaded 3D bounding box annotations.
        """

        results['gt_bboxes_3d'] = results['ann_info']['gt_bboxes_3d']
        return results

    def _load_bboxes_depth(self, results: dict):
        """Private function to load 2.5D bounding box annotations.

        Args:
            results (dict): Result dict from :obj:`mmdet3d.CustomDataset`.

        Returns:
            dict: The dict containing loaded 2.5D bounding box annotations.
        """

        results['depths'] = results['ann_info']['depths']
        results['centers_2d'] = results['ann_info']['centers_2d']
        return results

    def _load_labels_3d(self, results: dict):
        """Private function to load label annotations.

        Args:
            results (dict): Result dict from :obj:`mmdet3d.CustomDataset`.

        Returns:
            dict: The dict containing loaded label annotations.
        """

        results['gt_labels_3d'] = results['ann_info']['gt_labels_3d']
        return results

    def _load_attr_labels(self, results: dict):
        """Private function to load label annotations.

        Args:
            results (dict): Result dict from :obj:`mmdet3d.CustomDataset`.

        Returns:
            dict: The dict containing loaded label annotations.
        """
        results['attr_labels'] = results['ann_info']['attr_labels']
        return results

    def _load_depth_map(self, results: dict):

        img_filename = results['img_path']
        pts_filename = img_filename.replace('samples', 'depth_points') + '.bin'
        results['depth_map_path'] = pts_filename
        if self.file_client is None:
            self.file_client = mmengine.FileClient(**self.backend_args)
        try:
            pts_bytes = self.file_client.get(pts_filename)
            points = np.frombuffer(pts_bytes, dtype=np.float32)
        except ConnectionError:
            mmengine.check_file_exist(pts_filename)
            if pts_filename.endswith('.npy'):
                points = np.load(pts_filename)
            else:
                points = np.fromfile(pts_filename, dtype=np.float32)
        pts_img = points.reshape(-1, 3)
        img_shape = results['ori_shape']
        depth_img = np.zeros(img_shape, dtype=np.float32)
        iy = np.round(pts_img[:, 1]).astype(np.int64)
        ix = np.round(pts_img[:, 0]).astype(np.int64)
        depth_img[iy, ix] = pts_img[:, 2]
        results['depth_map'] = depth_img

        return results

    def _load_masks_3d(self, results: dict):
        """Private function to load 3D mask annotations.

        Args:
            results (dict): Result dict from :obj:`mmdet3d.CustomDataset`.

        Returns:
            dict: The dict containing loaded 3D mask annotations.
        """
        pts_instance_mask_path = results['pts_instance_mask_path']

        try:
            mask_bytes = mmengine.fileio.get(pts_instance_mask_path,
                                             backend_args=self.backend_args)
            pts_instance_mask = np.frombuffer(mask_bytes, dtype=np.int64)
        except ConnectionError:
            mmengine.check_file_exist(pts_instance_mask_path)
            pts_instance_mask = np.fromfile(pts_instance_mask_path,
                                            dtype=np.int64)

        results['pts_instance_mask'] = pts_instance_mask
        # 'eval_ann_info' will be passed to evaluator
        if 'eval_ann_info' in results:
            results['eval_ann_info']['pts_instance_mask'] = pts_instance_mask
        return results

    def _load_semantic_seg_3d(self, results: dict):
        """Private function to load 3D semantic segmentation annotations.

        Args:
            results (dict): Result dict from :obj:`mmdet3d.CustomDataset`.

        Returns:
            dict: The dict containing the semantic segmentation annotations.
        """
        pts_semantic_mask_path = results['pts_semantic_mask_path']

        try:
            mask_bytes = mmengine.fileio.get(pts_semantic_mask_path,
                                             backend_args=self.backend_args)
            # add .copy() to fix read-only bug
            pts_semantic_mask = np.frombuffer(mask_bytes,
                                              dtype=self.seg_3d_dtype).copy()
        except ConnectionError:
            mmengine.check_file_exist(pts_semantic_mask_path)
            pts_semantic_mask = np.fromfile(pts_semantic_mask_path,
                                            dtype=np.int64)

        if self.dataset_type == 'semantickitti':
            pts_semantic_mask = pts_semantic_mask.astype(np.int64)
            pts_semantic_mask = pts_semantic_mask % self.seg_offset
        # nuScenes loads semantic and panoptic labels from different files.

        results['pts_semantic_mask'] = pts_semantic_mask

        # 'eval_ann_info' will be passed to evaluator
        if 'eval_ann_info' in results:
            results['eval_ann_info']['pts_semantic_mask'] = pts_semantic_mask
        return results

    def _load_panoptic_3d(self, results: dict):
        """Private function to load 3D panoptic segmentation annotations.

        Args:
            results (dict): Result dict from :obj:`mmdet3d.CustomDataset`.

        Returns:
            dict: The dict containing the panoptic segmentation annotations.
        """
        pts_panoptic_mask_path = results['pts_panoptic_mask_path']

        try:
            mask_bytes = mmengine.fileio.get(pts_panoptic_mask_path,
                                             backend_args=self.backend_args)
            # add .copy() to fix read-only bug
            pts_panoptic_mask = np.frombuffer(mask_bytes,
                                              dtype=self.seg_3d_dtype).copy()
        except ConnectionError:
            mmengine.check_file_exist(pts_panoptic_mask_path)
            pts_panoptic_mask = np.fromfile(pts_panoptic_mask_path,
                                            dtype=np.int64)

        if self.dataset_type == 'semantickitti':
            pts_semantic_mask = pts_panoptic_mask.astype(np.int64)
            pts_semantic_mask = pts_semantic_mask % self.seg_offset
        elif self.dataset_type == 'nuscenes':
            pts_semantic_mask = pts_semantic_mask // self.seg_offset

        results['pts_semantic_mask'] = pts_semantic_mask

        # We can directly take panoptic labels as instance ids.
        pts_instance_mask = pts_panoptic_mask.astype(np.int64)
        results['pts_instance_mask'] = pts_instance_mask

        # 'eval_ann_info' will be passed to evaluator
        if 'eval_ann_info' in results:
            results['eval_ann_info']['pts_semantic_mask'] = pts_semantic_mask
            results['eval_ann_info']['pts_instance_mask'] = pts_instance_mask
        return results

    def _load_visible_instance_masks(self, results: dict):
        """Private function to move the 3D bounding box annotation from
        `ann_info` field to the root of `results`.

        Args:
            results (dict): Result dict from :obj:`mmdet3d.CustomDataset`.

        Returns:
            dict: The dict containing loaded 3D bounding box annotations.
        """

        results['visible_instance_masks'] = results['ann_info'][
            'visible_instance_masks']
        return results

    def _load_occupancy(self, results: dict):
        """Private function to move the 3D bounding box annotation from
        `ann_info` field to the root of `results`.

        Args:
            results (dict): Result dict from :obj:`mmdet3d.CustomDataset`.

        Returns:
            dict: The dict containing loaded 3D bounding box annotations.
        """

        results['gt_occupancy'] = results['ann_info']['gt_occupancy']
        return results

    def _load_visible_occupancy_masks(self, results: dict):
        """Private function to move the 3D bounding box annotation from
        `ann_info` field to the root of `results`.

        Args:
            results (dict): Result dict from :obj:`mmdet3d.CustomDataset`.

        Returns:
            dict: The dict containing loaded 3D bounding box annotations.
        """

        results['visible_occupancy_masks'] = results['ann_info'][
            'visible_occupancy_masks']
        return results

    def _load_bboxes(self, results: dict):
        """Private function to load bounding box annotations.

        The only difference is it remove the proceess for
        `ignore_flag`

        Args:
            results (dict): Result dict from :obj:`mmcv.BaseDataset`.

        Returns:
            dict: The dict contains loaded bounding box annotations.
        """

        results['gt_bboxes'] = results['ann_info']['gt_bboxes']

    def _load_labels(self, results: dict):
        """Private function to load label annotations.

        Args:
            results (dict): Result dict from :obj :obj:`mmcv.BaseDataset`.

        Returns:
            dict: The dict contains loaded label annotations.
        """
        results['gt_bboxes_labels'] = results['ann_info']['gt_bboxes_labels']

    def transform(self, results: dict):
        """Function to load multiple types annotations.

        Args:
            results (dict): Result dict from :obj:`mmdet3d.CustomDataset`.

        Returns:
            dict: The dict containing loaded 3D bounding box, label, mask and
            semantic segmentation annotations.
        """
        results = super().transform(results)
        if self.with_bbox_3d:
            results = self._load_bboxes_3d(results)
        if self.with_bbox_depth:
            results = self._load_bboxes_depth(results)
        if self.with_label_3d:
            results = self._load_labels_3d(results)
        if self.with_depth_map:
            results = self._load_depth_map(results)
        if self.with_attr_label:
            results = self._load_attr_labels(results)
        if self.with_panoptic_3d:
            results = self._load_panoptic_3d(results)
        if self.with_mask_3d:
            results = self._load_masks_3d(results)
        if self.with_seg_3d:
            results = self._load_semantic_seg_3d(results)
        if self.with_visible_instance_masks:
            results = self._load_visible_instance_masks(results)
        if self.with_occupancy:
            results = self._load_occupancy(results)
        if self.with_visible_occupancy_masks:
            results = self._load_visible_occupancy_masks(results)
        return results

    def __repr__(self):
        """str: Return a string that describes the module."""
        indent_str = '    '
        repr_str = self.__class__.__name__ + '(\n'
        repr_str += f'{indent_str}with_bbox_3d={self.with_bbox_3d}, '
        repr_str += f'{indent_str}with_label_3d={self.with_label_3d}, '
        repr_str += f'{indent_str}with_attr_label={self.with_attr_label}, '
        repr_str += f'{indent_str}with_mask_3d={self.with_mask_3d}, '
        repr_str += f'{indent_str}with_seg_3d={self.with_seg_3d}, '
        repr_str += f'{indent_str}with_panoptic_3d={self.with_panoptic_3d}, '
        repr_str += f'{indent_str}with_bbox={self.with_bbox}, '
        repr_str += f'{indent_str}with_label={self.with_label}, '
        repr_str += f'{indent_str}with_mask={self.with_mask}, '
        repr_str += f'{indent_str}with_seg={self.with_seg}, '
        repr_str += f'{indent_str}with_bbox_depth={self.with_bbox_depth}, '
        repr_str += f'{indent_str}poly2mask={self.poly2mask}), '
        repr_str += f'{indent_str}seg_offset={self.seg_offset}), '
        repr_str += f'{indent_str}with_visible_instance_masks='
        repr_str += f'{self.with_visible_instance_masks}), '
        repr_str += f'{indent_str}with_occupancy={self.with_occupancy}), '
        repr_str += f'{indent_str}with_visible_occupancy_masks='
        repr_str += f'{self.with_visible_occupancy_masks})'

        return repr_str


================================================
FILE: bip3d/datasets/transforms/multiview.py
================================================
import copy
import random

import numpy as np
import torch
from mmcv.transforms import BaseTransform, Compose

from bip3d.registry import TRANSFORMS
from bip3d.structures.points import get_points_type
from ..utils import sample


@TRANSFORMS.register_module()
class MultiViewPipeline(BaseTransform):
    """HARD CODE"""
    def __init__(
        self,
        transforms,
        n_images,
        max_n_images=None,
        ordered=False,
        rotate_3rscan=False,
    ):
        super().__init__()
        self.transforms = Compose(transforms)
        self.n_images = n_images
        self.max_n_images = (
            max_n_images if max_n_images is not None else n_images
        )
        self.ordered = ordered
        self.rotate_3rscan = rotate_3rscan

    def transform(self, results: dict):
        """Transform function.

        Args:
            results (dict): Result dict from loading pipeline.

        Returns:
            dict: output dict after transformation.
        """
        imgs = []
        img_paths = []
        points = []
        intrinsics = []
        extrinsics = []
        depth_imgs = []
        trans_mat = []
        
        total_n = len(results["img_path"])
        sample_n = min(max(self.n_images, total_n), self.max_n_images)
        ids = sample(total_n, sample_n, self.ordered)
        for i in ids.tolist():
            _results = dict()
            _results["img_path"] = results["img_path"][i]
            if "depth_img_path" in results:
                _results["depth_img_path"] = results["depth_img_path"][i]
                if isinstance(results["depth_cam2img"], list):
                    _results["depth_cam2img"] = results["depth_cam2img"][i]
                    _results["cam2img"] = results["depth2img"]["intrinsic"][i]
                else:
                    _results["depth_cam2img"] = results["depth_cam2img"]
                    _results["cam2img"] = results["cam2img"]
                _results["depth_shift"] = results["depth_shift"]
            _results = self.transforms(_results)
            if "depth_shift" in _results:
                _results.pop("depth_shift")
            if "img" in _results:
                imgs.append(_results["img"])
                img_paths.append(_results["img_path"])
            if "depth_img" in _results:
                depth_imgs.append(_results["depth_img"])
            if "points" in _results:
                points.append(_results["points"])
            if _results.get("modify_cam2img"):
                intrinsics.append(_results["cam2img"])
            elif isinstance(results["depth2img"]["intrinsic"], list):
                intrinsics.append(results["depth2img"]["intrinsic"][i])
            else:
                intrinsics.append(results["depth2img"]["intrinsic"])
            extrinsics.append(results["depth2img"]["extrinsic"][i])
            if "trans_mat" in _results:
                trans_mat.append(_results["trans_mat"])
        for key in _results.keys():
            if key not in ["img", "points", "img_path"]:
                results[key] = _results[key]
        if len(imgs):
            if self.rotate_3rscan and "3rscan" in img_paths[0]:
                imgs = [np.transpose(x, (1, 0, 2)) for x in imgs]
                rot_mat = np.array(
                    [
                        [0, 1, 0, 0],
                        [1, 0, 0, 0],
                        [0, 0, 1, 0],
                        [0, 0, 0, 1],
                    ]
                )
                rot_mat = [np.linalg.inv(x) @ rot_mat @ x for x in intrinsics]
                extrinsics = [x @ y for x, y in zip(rot_mat, extrinsics)]
                results["scale_factor"] = results["scale_factor"][::-1]
                results["ori_shape"] = results["ori_shape"][::-1]
            results["img"] = imgs
            results["img_path"] = img_paths
        if len(depth_imgs):
            if self.rotate_3rscan and "3rscan" in img_paths[0]:
                depth_imgs = [np.transpose(x, (1, 0)) for x in depth_imgs]
            results["depth_img"] = depth_imgs

        if len(points):
            results["points"] = points
        if (
            "ann_info" in results
            and "visible_instance_masks" in results["ann_info"]
        ):
            results["visible_instance_masks"] = [
                results["ann_info"]["visible_instance_masks"][i] for i in ids
            ]
            results["ann_info"]["visible_instance_masks"] = results[
                "visible_instance_masks"
            ]
        elif "visible_instance_masks" in results:
            results["visible_instance_masks"] = [
                results["visible_instance_masks"][i] for i in ids
            ]
        results["depth2img"]["intrinsic"] = intrinsics
        results["depth2img"]["extrinsic"] = extrinsics
        if len(trans_mat) != 0:
            results["depth2img"]["trans_mat"] = trans_mat
        return results


================================================
FILE: bip3d/datasets/transforms/transform.py
================================================
from typing import List, Optional, Tuple, Union
import cv2
import copy

import numpy as np
from numpy import random
import mmcv
import torch
from PIL import Image
from mmcv.transforms import BaseTransform, Resize

from bip3d.registry import TRANSFORMS
from bip3d.structures.bbox_3d import points_cam2img, points_img2cam
from bip3d.structures.points import BasePoints, get_points_type


@TRANSFORMS.register_module()
class CategoryGroundingDataPrepare(BaseTransform):
    def __init__(
        self,
        classes,
        training,
        max_class=None,
        sep_token="[SEP]",
        filter_others=True,
        z_range=None,
        filter_invisible=False,
        instance_mask_key="visible_instance_masks",
    ):
        self.classes = list(classes)
        self.training = training
        self.max_class = max_class
        self.sep_token = sep_token
        self.filter_others = filter_others
        self.z_range = z_range
        self.filter_invisible = filter_invisible
        self.instance_mask_key = instance_mask_key

    def transform(self, results):
        visible_instance_masks = results.get(self.instance_mask_key)
        if isinstance(visible_instance_masks, (list, tuple)):
            visible_instance_masks = np.stack(visible_instance_masks).any(axis=0)
            
        ignore_mask = results.get("ignore_mask")
        gt_names = results["ann_info"]["gt_names"]
        if self.filter_others and "gt_labels_3d" in results:
            mask = results["gt_labels_3d"] >= 0
            results["gt_labels_3d"] = results["gt_labels_3d"][mask]
            results["gt_bboxes_3d"] = results["gt_bboxes_3d"][mask]
            visible_instance_masks = visible_instance_masks[mask]
            gt_names = [x for i, x in enumerate(gt_names) if mask[i]]
            if ignore_mask is not None:
                ignore_mask = ignore_mask[mask]

        if (
            self.z_range is not None
            and "gt_labels_3d" in results
            and self.training
        ):
            mask = torch.logical_and(
                results["gt_bboxes_3d"].tensor[..., 2] >= self.z_range[0],
                results["gt_bboxes_3d"].tensor[..., 2] <= self.z_range[1],
            ).numpy()
            results["gt_labels_3d"] = results["gt_labels_3d"][mask]
            results["gt_bboxes_3d"] = results["gt_bboxes_3d"][mask]
            visible_instance_masks = visible_instance_masks[mask]
            gt_names = [x for i, x in enumerate(gt_names) if mask[i]]
            if ignore_mask is not None:
                ignore_mask = ignore_mask[mask]

        if self.training or self.filter_invisible:
            results["gt_labels_3d"] = results["gt_labels_3d"][
                visible_instance_masks
            ]
            results["gt_bboxes_3d"] = results["gt_bboxes_3d"][
                visible_instance_masks
            ]
            gt_names = [
                x for i, x in enumerate(gt_names) if visible_instance_masks[i]
            ]
            if ignore_mask is not None:
                ignore_mask = ignore_mask[visible_instance_masks]
                results["ignore_mask"] = ignore_mask

        if self.training:
            if (
                self.max_class is not None
                and len(self.classes) > self.max_class
            ):
                classes = copy.deepcopy(gt_names)
                random.shuffle(self.classes)
                for c in self.classes:
                    if c in classes:
                        continue
                    classes.append(c)
                    if len(classes) >= self.max_class:
                        break
                random.shuffle(classes)
            else:
                classes = copy.deepcopy(self.classes)
        else:
            classes = copy.deepcopy(self.classes)
            gt_names = classes

        results["text"] = self.sep_token.join(classes)
        tokens_positive = []
        for name in gt_names:
            start = results["text"].find(
                self.sep_token + name + self.sep_token
            )
            if start == -1:
                if results["text"].startswith(name + self.sep_token):
                    start = 0
                else:
                    start = results["text"].find(self.sep_token + name) + len(
                        self.sep_token
                    )
            else:
                start += len(self.sep_token)
            end = start + len(name)
            tokens_positive.append([[start, end]])
        results["tokens_positive"] = tokens_positive
        return results


@TRANSFORMS.register_module()
class CamIntrisicStandardization(BaseTransform):
    def __init__(self, dst_intrinsic, dst_wh):
        if not isinstance(dst_intrinsic, np.ndarray):
            dst_intrinsic = np.array(dst_intrinsic)
        if dst_intrinsic.shape[0] == 3:
            tmp = np.eye(4)
            tmp[:3, :3] = dst_intrinsic
            dst_intrinsic = tmp
        self.dst_intrinsic = dst_intrinsic
        self.dst_wh = dst_wh
        u, v = np.arange(dst_wh[0]), np.arange(dst_wh[1])
        u = np.repeat(u[None], dst_wh[1], 0)
        v = np.repeat(v[:, None], dst_wh[0], 1)
        uv = np.stack([u, v, np.ones_like(u)], axis=-1)
        self.dst_pts = uv @ np.linalg.inv(self.dst_intrinsic[:3, :3]).T

    def transform(self, results):
        src_intrinsic = results["cam2img"][:3, :3]
        src_uv = self.dst_pts @ src_intrinsic.T
        src_uv = src_uv.astype(np.float32)
        if "depth_img" in results and results["img"].shape[:2] != results["depth_img"].shape[:2]:
            results["depth_img"] = cv2.resize(
                results["depth_img"], results["img"].shape[:2][::-1],
                interpolation=cv2.INTER_LINEAR,
            )
        for key in ["img", "depth_img"]:
            if key not in results:
                continue
            results[key] = cv2.remap(
                results[key],
                src_uv[..., 0],
                src_uv[..., 1],
                cv2.INTER_NEAREST,
            )
            # warp_mat = self.dst_intrinsic[:3, :3] @ np.linalg.inv(src_intrinsic)
            # cv2.warpAffine(results[key], warp_mat[:2, :3], self.dst_wh)
        results["cam2img"] = copy.deepcopy(self.dst_intrinsic)
        results["depth_cam2img"] = copy.deepcopy(self.dst_intrinsic)
        results["scale"] = self.dst_wh
        results['img_shape'] = results["img"].shape[:2]
        results['scale_factor'] = (1, 1)
        results['keep_ratio'] = False
        results["modify_cam2img"] = True
        return results


@TRANSFORMS.register_module()
class CustomResize(Resize):
    def _resize_img(self, results: dict, key="img"):
        """Resize images with ``results['scale']``."""

        if results.get(key, None) is not None:
            if self.keep_ratio:
                img, scale_factor = mmcv.imrescale(
                    results[key],
                    results['scale'],
                    interpolation=self.interpolation,
                    return_scale=True,
                    backend=self.backend)
                # the w_scale and h_scale has minor difference
                # a real fix should be done in the mmcv.imrescale in the future
                new_h, new_w = img.shape[:2]
                h, w = results[key].shape[:2]
                w_scale = new_w / w
                h_scale = new_h / h
            else:
                img, w_scale, h_scale = mmcv.imresize(
                    results[key],
                    results['scale'],
                    interpolation=self.interpolation,
                    return_scale=True,
                    backend=self.backend)
            results[key] = img
            if key == "img":
                results['img_shape'] = img.shape[:2]
                results['scale_factor'] = (w_scale, h_scale)
                results['keep_ratio'] = self.keep_ratio

    def transform(self, results: dict):
        if self.scale:
            results['scale'] = self.scale
        else:
            img_shape = results['img'].shape[:2]
            results['scale'] = _scale_size(img_shape[::-1],
                                           self.scale_factor)  # type: ignore
        self._resize_img(results)
        self._resize_bboxes(results)
        self._resize_seg(results)
        self._resize_keypoints(results)
        if "depth_img" in results:
            self._resize_img(results, key="depth_img")
        return results


@TRANSFORMS.register_module()
class DepthProbLabelGenerator(BaseTransform):
    def __init__(
        self,
        max_depth=10,
        min_depth=0.25,
        num_depth=64,
        stride=[8, 16, 32, 64],
        origin_stride=1,
    ):
        self.max_depth = max_depth
        self.min_depth = min_depth
        self.num_depth = num_depth
        self.stride = [x//origin_stride for x in stride]
        self.origin_stride = origin_stride

    def transform(self, input_dict):
        depth = input_dict["inputs"]["depth_img"].cpu().numpy()
        if self.origin_stride != 1:
            H, W = depth.shape[-2:]
            depth = np.transpose(depth, (0, 2, 3, 1))
            depth = [mmcv.imresize(
                x, (W//self.origin_stride, H//self.origin_stride),
                interpolation="nearest",
            ) for x in depth]
            depth = np.stack(depth)[:,None]
        depth = np.clip(
            depth,
            a_min=self.min_depth,
            a_max=self.max_depth,
        )
        depth_anchor = np.linspace(
            self.min_depth, self.max_depth, self.num_depth)[:, None, None]
        distance = np.abs(depth - depth_anchor)
        mask = distance < (depth_anchor[1] - depth_anchor[0])
        depth_gt = np.where(mask, depth_anchor, 0)
        y = depth_gt.sum(axis=1, keepdims=True) - depth_gt
        depth_valid_mask = depth > 0
        depth_prob_gt = np.where(
            (depth_gt != 0) & depth_valid_mask,
            (depth - y) / (depth_gt - y),
            0,
        )
        views, _, H, W = depth.shape
        gt = []
        gt_map = []
        for s in self.stride:
            gt_tmp = np.reshape(
                depth_prob_gt, (views, self.num_depth, H//s, s, W//s, s))
            gt_tmp = gt_tmp.sum(axis=-1).sum(axis=3)
            mask_tmp = depth_valid_mask.reshape(views, 1, H//s, s, W//s, s)
            mask_tmp = mask_tmp.sum(axis=-1).sum(axis=3)
            gt_tmp /= np.clip(mask_tmp, a_min=1, a_max=None)
            # gt_map.append(np.transpose(gt_tmp, (0, 2, 3, 1)))
            gt_tmp = gt_tmp.reshape(views, self.num_depth, -1)
            gt_tmp = np.transpose(gt_tmp, (0, 2, 1))
            gt.append(gt_tmp)
        gt = np.concatenate(gt, axis=1)
        gt = np.clip(gt, a_min=0.0, a_max=1.0)
        input_dict["inputs"]["depth_prob_gt"] = torch.from_numpy(gt).to(
            input_dict["inputs"]["depth_img"])
        return input_dict


================================================
FILE: bip3d/datasets/utils.py
================================================
import numpy as np
from scipy.spatial.transform import Rotation


def sample(total_n, sample_n, fix_interval=True):
    ids = np.arange(total_n)
    if sample_n == total_n:
        return ids
    elif sample_n > total_n:
        return np.concatenate([
            ids, sample(total_n, sample_n - total_n, fix_interval)
        ])
    elif fix_interval:
        interval = total_n / sample_n
        output = []
        for i in range(sample_n):
            output.append(ids[int(interval * i)])
        return np.array(output)
    return np.random.choice(ids, sample_n, replace=False)


def xyzrpy_to_matrix(input):
    output = np.eye(4)
    output[:3, :3] = Rotation.from_euler("xyz", np.array(input[3:])).as_matrix()
    output[:3, 3] = input[:3]
    return output


================================================
FILE: bip3d/eval/__init__.py
================================================
from .metrics import GroundingMetric, IndoorDetMetric

__all__ = ['IndoorDetMetric', 'GroundingMetric']


================================================
FILE: bip3d/eval/indoor_eval.py
================================================
# Copyright (c) OpenRobotLab. All rights reserved.
import numpy as np
import torch
from mmengine.logging import print_log
from terminaltables import AsciiTable
# from bip3d.visualization.default_color_map import DEFAULT_COLOR_MAP
import matplotlib.pyplot as plt


def average_precision(recalls, precisions, mode='area'):
    """Calculate average precision (for single or multiple scales).

    Args:
        recalls (np.ndarray): Recalls with shape of (num_scales, num_dets)
            or (num_dets, ).
        precisions (np.ndarray): Precisions with shape of
            (num_scales, num_dets) or (num_dets, ).
        mode (str): 'area' or '11points', 'area' means calculating the area
            under precision-recall curve, '11points' means calculating
            the average precision of recalls at [0, 0.1, ..., 1]

    Returns:
        float or np.ndarray: Calculated average precision.
    """
    if recalls.ndim == 1:
        recalls = recalls[np.newaxis, :]
        precisions = precisions[np.newaxis, :]

    assert recalls.shape == precisions.shape
    assert recalls.ndim == 2

    num_scales = recalls.shape[0]
    ap = np.zeros(num_scales, dtype=np.float32)
    if mode == 'area':
        zeros = np.zeros((num_scales, 1), dtype=recalls.dtype)
        ones = np.ones((num_scales, 1), dtype=recalls.dtype)
        mrec = np.hstack((zeros, recalls, ones))
        mpre = np.hstack((zeros, precisions, zeros))
        for i in range(mpre.shape[1] - 1, 0, -1):
            mpre[:, i - 1] = np.maximum(mpre[:, i - 1], mpre[:, i])
        for i in range(num_scales):
            ind = np.where(mrec[i, 1:] != mrec[i, :-1])[0]
            ap[i] = np.sum(
                (mrec[i, ind + 1] - mrec[i, ind]) * mpre[i, ind + 1])
    elif mode == '11points':
        for i in range(num_scales):
            for thr in np.arange(0, 1 + 1e-3, 0.1):
                precs = precisions[i, recalls[i, :] >= thr]
                prec = precs.max() if precs.size > 0 else 0
                ap[i] += prec
            ap /= 11
    else:
        raise ValueError(
            'Unrecognized mode, only "area" and "11points" are supported')
    return ap


def eval_det_cls(pred, gt, iou_thr=None, area_range=None):
    """Generic functions to compute precision/recall for object detection for a
    single class.

    Args:
        pred (dict): Predictions mapping from image id to bounding boxes
            and scores.
        gt (dict): Ground truths mapping from image id to bounding boxes.
        iou_thr (list[float]): A list of iou thresholds.

    Return:
        tuple (np.ndarray, np.ndarray, float): Recalls, precisions and
            average precision.
    """

    # {img_id: {'bbox': box structure, 'det': matched list}}
    class_recs = {}
    npos = 0
    # figure out the bbox code size first
    gt_bbox_code_size = 9
    pred_bbox_code_size = 9
    for img_id in gt.keys():
        if len(gt[img_id]) != 0:
            gt_bbox_code_size = gt[img_id][0].tensor.shape[1]
            break
    for img_id in pred.keys():
        if len(pred[img_id][0]) != 0:
            pred_bbox_code_size = pred[img_id][0][0].tensor.shape[1]
            break
    assert gt_bbox_code_size == pred_bbox_code_size
    for img_id in gt.keys():
        cur_gt_num = len(gt[img_id])
        if cur_gt_num != 0:
            gt_cur = torch.zeros([cur_gt_num, gt_bbox_code_size],
                                 dtype=torch.float32)
            for i in range(cur_gt_num):
                gt_cur[i] = gt[img_id][i].tensor
            bbox = gt[img_id][0].new_box(gt_cur)
        else:
            bbox = gt[img_id]
        det = [[False] * len(bbox) for i in iou_thr]
        npos += len(bbox)
        class_recs[img_id] = {'bbox': bbox, 'det': det}
        if area_range is not None and len(bbox) > 0:
            area = bbox.tensor[:, 3:6].cumprod(dim=-1)[..., -1]
            ignore = torch.logical_or(area < area_range[0], area > area_range[1])
            npos -= ignore.sum().item()
            class_recs[img_id]["ignore"] = ignore

    # construct dets
    image_ids = []
    confidence = []
    ious = []
    for img_id in pred.keys():
        cur_num = len(pred[img_id])
        if cur_num == 0:
            continue
        pred_cur = torch.zeros((cur_num, pred_bbox_code_size),
                               dtype=torch.float32)
        box_idx = 0
        for box, score in pred[img_id]:
            image_ids.append(img_id)
            confidence.append(score)
            # handle outlier (too thin) predicted boxes
            w, l, h = box.tensor[0, 3:6]
            faces = [w * l, w * h, h * l]
            if torch.any(box.tensor.new_tensor(faces) < 2e-4):
                # print('Find small predicted boxes,',
                #       'and clamp short edges to 2e-2 meters.')
                box.tensor[:, 3:6] = torch.clamp(box.tensor[:, 3:6], min=2e-2)
            pred_cur[box_idx] = box.tensor
            box_idx += 1
        pred_cur = box.new_box(pred_cur)
        gt_cur = class_recs[img_id]['bbox']
        if len(gt_cur) > 0:
            # calculate iou in each image
            iou_cur = pred_cur.overlaps(pred_cur, gt_cur)
            for i in range(cur_num):
                ious.append(iou_cur[i])
        else:
            for i in range(cur_num):
                ious.append(np.zeros(1))

    confidence = np.array(confidence)

    # sort by confidence
    sorted_ind = np.argsort(-confidence)
    image_ids = [image_ids[x] for x in sorted_ind]
    ious = [ious[x] for x in sorted_ind]

    # go down dets and mark TPs and FPs
    num_images = len(image_ids)
    tp_thr = [np.zeros(num_images) for i in iou_thr]
    fp_thr = [np.zeros(num_images) for i in iou_thr]
    for d in range(num_images):
        R = class_recs[image_ids[d]]
        iou_max = -np.inf
        BBGT = R['bbox']
        cur_iou = ious[d]

        if len(BBGT) > 0:
            # compute overlaps
            for j in range(len(BBGT)):
                # iou = get_iou_main(get_iou_func, (bb, BBGT[j,...]))
                iou = cur_iou[j]
                if iou > iou_max:
                    iou_max = iou
                    jmax = j

        for iou_idx, thresh in enumerate(iou_thr):
            if iou_max > thresh:
                if "ignore" in R and R["ignore"][jmax]:
                    continue
                if not R['det'][iou_idx][jmax]:
                    tp_thr[iou_idx][d] = 1.
                    R['det'][iou_idx][jmax] = 1
                else:
                    fp_thr[iou_idx][d] = 1.
            else:
                fp_thr[iou_idx][d] = 1.

    ret = []
    for iou_idx, thresh in enumerate(iou_thr):
        # compute precision recall
        fp = np.cumsum(fp_thr[iou_idx])
        tp = np.cumsum(tp_thr[iou_idx])
        recall = tp / float(npos)
        # avoid divide by zero in case the first detection matches a difficult
        # ground truth
        precision = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
        ap = average_precision(recall, precision)
        ret.append((recall, precision, ap))

    return ret


def eval_map_recall(pred, gt, ovthresh=None, area_range=None):
    """Evaluate mAP and recall.

    Generic functions to compute precision/recall for object detection
        for multiple classes.

    Args:
        pred (dict): Information of detection results,
            which maps class_id and predictions.
        gt (dict): Information of ground truths, which maps class_id and
            ground truths.
        ovthresh (list[float], optional): iou threshold. Default: None.

    Return:
        tuple[dict]: dict results of recall, AP, and precision for all classes.
    """

    ret_values = {}
    for classname in gt.keys():
        if classname in pred:
            ret_values[classname] = eval_det_cls(pred[classname],
                                                 gt[classname], ovthresh, area_range)
    recall = [{} for i in ovthresh]
    precision = [{} for i in ovthresh]
    ap = [{} for i in ovthresh]

    for label in gt.keys():
        for iou_idx, thresh in enumerate(ovthresh):
            if label in pred:
                recall[iou_idx][label], precision[iou_idx][label], ap[iou_idx][
                    label] = ret_values[label][iou_idx]
            else:
                recall[iou_idx][label] = np.zeros(1)
                precision[iou_idx][label] = np.zeros(1)
                ap[iou_idx][label] = np.zeros(1)

    return recall, precision, ap


def save_pr_curve(recs, precs, metrics):
    color = list(DEFAULT_COLOR_MAP.values())
    for rec, prec, metric in zip(recs, precs, metrics):
        plt.figure(figsize=(20, 18))
        fontsize = 50
        font = 'Times New Roman'
        for k in rec.keys():
            if np.isnan(rec[k]).any():
                continue
            plt.plot(rec[k], prec[k])
        plt.xlabel('Recall', fontsize=fontsize)
        plt.ylabel('Precision', fontsize=fontsize)
        plt.savefig(f'./pr_curver_{metric}.png',dpi=500.)


def indoor_eval(gt_annos,
                dt_annos,
                metric,
                label2cat,
                logger=None,
                box_mode_3d=None,
                classes_split=None,
                part="Overall"):
    """Indoor Evaluation.

    Evaluate the result of the detection.

    Args:
        gt_annos (list[dict]): Ground truth annotations.
        dt_annos (list[dict]): Detection annotations. the dict
            includes the following keys

            - labels_3d (torch.Tensor): Labels of boxes.
            - bboxes_3d (:obj:`BaseInstance3DBoxes`):
                3D bounding boxes in Depth coordinate.
            - scores_3d (torch.Tensor): Scores of boxes.
        metric (list[float]): IoU thresholds for computing average precisions.
        label2cat (tuple): Map from label to category.
        logger (logging.Logger | str, optional): The way to print the mAP
            summary. See `mmdet.utils.print_log()` for details. Default: None.

    Return:
        dict[str, float]: Dict of results.
    """
    assert len(dt_annos) == len(gt_annos)
    pred = {}  # map {class_id: pred}
    gt = {}  # map {class_id: gt}
    for img_id in range(len(dt_annos)):
        # parse detected annotations
        det_anno = dt_annos[img_id]
        for i in range(len(det_anno['labels_3d'])):
            label = det_anno['labels_3d'].numpy()[i]
            bbox = det_anno['bboxes_3d'].convert_to(box_mode_3d)[i]
            score = det_anno['scores_3d'].numpy()[i]
            if label not in pred:
                pred[int(label)] = {}
            if img_id not in pred[label]:
                pred[int(label)][img_id] = []
            if label not in gt:
                gt[int(label)] = {}
            if img_id not in gt[label]:
                gt[int(label)][img_id] = []
            pred[int(label)][img_id].append((bbox, score))

        # parse gt annotations
        gt_anno = gt_annos[img_id]

        gt_boxes = gt_anno['gt_bboxes_3d']
        labels_3d = gt_anno['gt_labels_3d']

        for i in range(len(labels_3d)):
            label = labels_3d[i]
            bbox = gt_boxes[i]
            if label not in gt:
                gt[label] = {}
            if img_id not in gt[label]:
                gt[label][img_id] = []
            gt[label][img_id].append(bbox)

    rec, prec, ap = eval_map_recall(pred, gt, metric)

    # filter nan results
    ori_keys = list(ap[0].keys())
    for key in ori_keys:
        if np.isnan(ap[0][key][0]):
            for r in rec:
                del r[key]
            for p in prec:
                del p[key]
            for a in ap:
                del a[key]

    ret_dict = dict()
    header = ['classes']
    table_columns = [[label2cat[label]
                      for label in ap[0].keys()] + [part]]

    for i, iou_thresh in enumerate(metric):
        header.append(f'AP_{iou_thresh:.2f}')
        header.append(f'AR_{iou_thresh:.2f}')
        rec_list = []
        for label in ap[i].keys():
            ret_dict[f'{label2cat[label]}_AP_{iou_thresh:.2f}'] = float(
                ap[i][label][0])
        ret_dict[f'mAP_{iou_thresh:.2f}'] = float(np.mean(list(
            ap[i].values())))

        table_columns.append(list(map(float, list(ap[i].values()))))
        table_columns[-1] += [ret_dict[f'mAP_{iou_thresh:.2f}']]
        table_columns[-1] = [f'{x:.4f}' for x in table_columns[-1]]

        for label in rec[i].keys():
            ret_dict[f'{label2cat[label]}_rec_{iou_thresh:.2f}'] = float(
                rec[i][label][-1])
            rec_list.append(rec[i][label][-1])
        ret_dict[f'mAR_{iou_thresh:.2f}'] = float(np.mean(rec_list))

        table_columns.append(list(map(float, rec_list)))
        table_columns[-1] += [ret_dict[f'mAR_{iou_thresh:.2f}']]
        table_columns[-1] = [f'{x:.4f}' for x in table_columns[-1]]

    table_data = [header]
    table_rows = list(zip(*table_columns))
    table_data += table_rows
    table = AsciiTable(table_data)
    table.inner_footing_row_border = True
    print_log('\n' + table.table, logger=logger)

    if classes_split is not None:
        splits = ['head', 'common', 'tail']
        for idx in range(len(splits)):
            header = [f'{splits[idx]}_classes']
            # init the category list/column
            cat_list = []
            for label in classes_split[idx]:
                if label in ap[0]:
                    cat_list.append(label2cat[label])
            table_columns = [cat_list + [part]]

            for i, iou_thresh in enumerate(metric):
                header.append(f'AP_{iou_thresh:.2f}')
                header.append(f'AR_{iou_thresh:.2f}')
                ap_list = []
                for label in classes_split[idx]:
                    if label in ap[i]:
                        ap_list.append(float(ap[i][label][0]))
                mean_ap = float(np.mean(ap_list))

                table_columns.append(list(map(float, ap_list)))
                table_columns[-1] += [mean_ap]
                table_columns[-1] = [f'{x:.4f}' for x in table_columns[-1]]

                rec_list = []
                for label in classes_split[idx]:
                    if label in rec[i]:
                        rec_list.append(rec[i][label][-1])
                mean_rec = float(np.mean(rec_list))

                table_columns.append(list(map(float, rec_list)))
                table_columns[-1] += [mean_rec]
                table_columns[-1] = [f'{x:.4f}' for x in table_columns[-1]]

            table_data = [header]
            table_rows = list(zip(*table_columns))
            table_data += table_rows
            table = AsciiTable(table_data)
            table.inner_footing_row_border = True
            print_log('\n' + table.table, logger=logger)

    area_split = {
        "small": [0, 0.2**3],
        "medium": [0.2**3, 1.0**3],
        "large": [1.0**3, float("inf")],
    }
    if area_split is not None:
        header = ["area"]
        for i, iou_thresh in enumerate(metric):
            header.append(f'AP_{iou_thresh:.2f}')
            header.append(f'AR_{iou_thresh:.2f}')
        table_rows = []
        area_ret_dict = dict()
        for idx, (area, area_range) in enumerate(area_split.items()):
            # header = [area]
            rec, prec, ap = eval_map_recall(pred, gt, metric, area_range)
            ori_keys = list(ap[0].keys())
            for key in ori_keys:
                if np.isnan(ap[0][key][0]):
                    for r in rec:
                        del r[key]
                    for p in prec:
                        del p[key]
                    for a in ap:
                        del a[key]

            # table_columns = [[f"{area}_{part}"]]
            table_rows.append([f"{area}_{part}"])
            for i, iou_thresh in enumerate(metric):
                # header.append(f'AP_{iou_thresh:.2f}')
                # header.append(f'AR_{iou_thresh:.2f}')
                rec_list = []
                for label in ap[i].keys():
                    area_ret_dict[f'{label2cat[label]}_AP_{iou_thresh:.2f}'] = float(
                        ap[i][label][0])
                area_ret_dict[f'mAP_{iou_thresh:.2f}'] = float(np.mean(list(
                    ap[i].values())))

                table_rows[-1].append(f"{area_ret_dict[f'mAP_{iou_thresh:.2f}']:.4f}")
                # table_columns.append([f"{area_ret_dict[f'mAP_{iou_thresh:.2f}']:.4f}"])

                for label in rec[i].keys():
                    area_ret_dict[f'{label2cat[label]}_rec_{iou_thresh:.2f}'] = float(
                        rec[i][label][-1])
                    rec_list.append(rec[i][label][-1])
                area_ret_dict[f'mAR_{iou_thresh:.2f}'] = float(np.mean(rec_list))

                table_rows[-1].append(f"{area_ret_dict[f'mAR_{iou_thresh:.2f}']:.4f}")
                # table_columns.append([f"{area_ret_dict[f'mAR_{iou_thresh:.2f}']:.4f}"])

            # table_data = [header]
            # table_rows = list(zip(*table_columns))
            # table_data += table_rows
            # table = AsciiTable(table_data)
            # table.inner_footing_row_border = True
            # print_log('\n' + table.table, logger=logger)
        table_data = [header]
        table_data += table_rows
        table = AsciiTable(table_data)
        table.inner_footing_row_border = False
        print_log('\n' + table.table, logger=logger)
    return ret_dict


================================================
FILE: bip3d/eval/metrics/__init__.py
================================================
from .det_metric import IndoorDetMetric
from .grounding_metric import GroundingMetric
# from .occupancy_metric import OccupancyMetric

__all__ = ['IndoorDetMetric', 'GroundingMetric']


================================================
FILE: bip3d/eval/metrics/det_metric.py
================================================
# Copyright (c) OpenRobotLab. All rights reserved.
import logging
from collections import OrderedDict
from typing import Dict, List, Optional, Sequence, Union

import numpy as np
from terminaltables import AsciiTable
from mmdet.evaluation import eval_map
from mmengine.dist import (broadcast_object_list, collect_results,
                           is_main_process, get_rank)
from mmengine.evaluator import BaseMetric
from mmengine.evaluator.metric import _to_cpu
from mmengine.logging import MMLogger, print_log

from bip3d.registry import METRICS
from bip3d.structures import get_box_type

from ..indoor_eval import indoor_eval


@METRICS.register_module()
class IndoorDetMetric(BaseMetric):
    """Indoor scene evaluation metric.

    Args:
        iou_thr (float or List[float]): List of iou threshold when calculate
            the metric. Defaults to [0.25, 0.5].
        collect_device (str): Device name used for collecting results from
            different ranks during distributed training. Must be 'cpu' or
            'gpu'. Defaults to 'cpu'.
        prefix (str, optional): The prefix that will be added in the metric
            names to disambiguate homonymous metrics of different evaluators.
            If prefix is not provided in the argument, self.default_prefix will
            be used instead. Defaults to None.
    """

    def __init__(self,
                 iou_thr: List[float] = [0.25, 0.5],
                 collect_device: str = 'cpu',
                 prefix: Optional[str] = None,
                 batchwise_anns: bool = False,
                 save_result_path=None,
                 eval_part=None,
                 **kwargs):
        super(IndoorDetMetric, self).__init__(prefix=prefix,
                                              collect_device=collect_device,
                                              **kwargs)
        self.iou_thr = [iou_thr] if isinstance(iou_thr, float) else iou_thr
        self.batchwise_anns = batchwise_anns
        self.save_result_path = save_result_path
        if eval_part is None:
            eval_part = ["scannet", "3rscan", "matterport3d", "arkit"]
        self.eval_part = eval_part

    def process(self, data_batch: dict, data_samples: Sequence[dict]):
        """Process one batch of data samples and predictions.

        The processed results should be stored in ``self.results``, which will
        be used to compute the metrics when all batches have been processed.

        Args:
            data_batch (dict): A batch of data from the dataloader.
            data_samples (Sequence[dict]): A batch of outputs from the model.
        """
        for data_sample in data_samples:
            pred_3d = data_sample['pred_instances_3d']
            eval_ann_info = data_sample['eval_ann_info']
            eval_ann_info["scan_id"] = data_sample["scan_id"]
            cpu_pred_3d = dict()
            for k, v in pred_3d.items():
                if hasattr(v, 'to'):
                    cpu_pred_3d[k] = v.to('cpu')
                else:
                    cpu_pred_3d[k] = v
            self.results.append((eval_ann_info, cpu_pred_3d))

    def compute_metrics(self, results: list, part="Overall", split=True):
        """Compute the metrics from processed results.

        Args:
            results (list): The processed results of each batch.

        Returns:
            Dict[str, float]: The computed metrics. The keys are the names of
            the metrics, and the values are corresponding results.
        """
        logger: MMLogger = MMLogger.get_current_instance()
        ann_infos = []
        pred_results = []

        for eval_ann, sinlge_pred_results in results:
            ann_infos.append(eval_ann)
            pred_results.append(sinlge_pred_results)

        # some checkpoints may not record the key "box_type_3d"
        box_type_3d, box_mode_3d = get_box_type(
            self.dataset_meta.get('box_type_3d', 'depth'))

        print_log(f"eval : {len(ann_infos)}, {len(pred_results)}, rank: {get_rank()}")
        ret_dict = indoor_eval(ann_infos,
                               pred_results,
                               self.iou_thr,
                               self.dataset_meta['classes'],
                               logger=logger,
                               box_mode_3d=box_mode_3d,
                               classes_split=self.dataset_meta.get(
                                   'classes_split', None) if split else None,
                               part=part)

        return ret_dict

    def evaluate(self, size: int):
        """Evaluate the model performance of the whole dataset after processing
        all batches.

        Args:
            size (int): Length of the entire validation dataset. When batch
                size > 1, the dataloader may pad some data samples to make
                sure all ranks have the same length of dataset slice. The
                ``collect_results`` function will drop the padded data based on
                this size.

        Returns:
            dict: Evaluation metrics dict on the val dataset. The keys are the
            names of the metrics, and the values are corresponding results.
        """
        if len(self.results) == 0:
            print_log(
                f'{self.__class__.__name__} got empty `self.results`. Please '
                'ensure that the processed results are properly added into '
                '`self.results` in `process` method.',
                logger='current',
                level=logging.WARNING)
        print_log(
            f"number of results: "
            f"{len(self.results)}, size: {size}, "
            f"batchwise_anns: {self.batchwise_anns}, rank: {get_rank()}, "
            f"collect_dir: {self.collect_dir}"
        )
        if self.batchwise_anns:
            # the actual dataset length/size is the len(self.results)
            if self.collect_device == 'cpu':
                results = collect_results(self.results,
                                          len(self.results),
                                          self.collect_device,
                                          tmpdir=self.collect_dir)
            else:
                results = collect_results(self.results, len(self.results),
                                          self.collect_device)
        else:
            if self.collect_device == 'cpu':
                results = collect_results(self.results,
                                          size,
                                          self.collect_device,
                                          tmpdir=self.collect_dir)
            else:
                results = collect_results(self.results, size,
                                          self.collect_device)

        print_log(
            f"number of collected results: "
            f"{len(results) if results is not None else 'None'},"
            f"rank: {get_rank()}"
        )
        if is_main_process():
            # cast all tensors in results list to cpu
            results = _to_cpu(results)
            # import pdb; pdb.set_trace()
            if self.save_result_path is not None:
                import pickle
                import copy
                _ret = []
                for x in results:
                    x = list(copy.deepcopy(x))
                    x[0]["gt_bboxes_3d"] = x[0]["gt_bboxes_3d"].cpu().numpy()
                    x[1]["bboxes_3d"] = x[1]["bboxes_3d"].cpu().numpy()
                    x[1]["scores_3d"] = x[1]["scores_3d"].cpu().numpy()
                    x[1]["labels_3d"] = x[1]["labels_3d"].cpu().numpy()
                    x[1]["target_scores_3d"] = x[1]["target_scores_3d"].cpu().numpy()
                    _ret.append(x)
                pickle.dump(_ret, open(os.path.join(self.save_result_path, "results.pkl"), "wb"))

            _metrics = self.compute_metrics(results)  # type: ignore
            for part in self.eval_part:
            # for part in []:
                print(f"=================== {part} =========================")
                part_ret = [x for x in results if part in x[0]["scan_id"]]
                if len(part_ret) > 0:
                    _metrics_part = self.compute_metrics(part_ret, part=part)
                    _metrics_part = {f"{part}--{k}": v for k, v in _metrics_part.items()}
                    _metrics.update(_metrics_part)
            # Add prefix to metric names
            if self.prefix:
                _metrics = {
                    '/'.join((self.prefix, k)): v
                    for k, v in _metrics.items()
                }
            metrics = [_metrics]
        else:
            metrics = [None]  # type: ignore

        broadcast_object_list(metrics)

        if is_main_process():
            table = []
            header = ["Summary"]
            for i, iou_thresh in enumerate(self.iou_thr):
                header.append(f'AP_{iou_thresh:.2f}')
                header.append(f'AR_{iou_thresh:.2f}')
            table.append(header)

            for part in self.eval_part + ["Overall"]:
                table_data = [part]
                for i, iou_thresh in enumerate(self.iou_thr):
                    key = f"{part+'--' if part!='Overall' else ''}mAP_{iou_thresh:.2f}"
                    if key in _metrics:
                        table_data.append(f"{_metrics[key]:.4f}")
                    else:
                        table_data.append(f"0.xxxx")
                    key = f"{part+'--' if part!='Overall' else ''}mAR_{iou_thresh:.2f}"
                    if key in _metrics:
                        table_data.append(f"{_metrics[key]:.4f}")
                    else:
                        table_data.append(f"0.xxxx")
                table.append(table_data)
            table = AsciiTable(table)
            table.inner_footing_row_border = True
            logger: MMLogger = MMLogger.get_current_instance()
            print_log('\n' + table.table, logger=logger)

        # reset the results list
        self.results.clear()
        return metrics[0]


@METRICS.register_module()
class Indoor2DMetric(BaseMetric):
    """indoor 2d predictions evaluation metric.

    Args:
        iou_thr (float or List[float]): List of iou threshold when calculate
            the metric. Defaults to [0.5].
        collect_device (str): Device name used for collecting results from
            different ranks during distributed training. Must be 'cpu' or
            'gpu'. Defaults to 'cpu'.
        prefix (str, optional): The prefix that will be added in the metric
            names to disambiguate homonymous metrics of different evaluators.
            If prefix is not provided in the argument, self.default_prefix will
            be used instead. Defaults to None.
    """

    def __init__(self,
                 iou_thr: Union[float, List[float]] = [0.5],
                 collect_device: str = 'cpu',
                 prefix: Optional[str] = None):
        super(Indoor2DMetric, self).__init__(prefix=prefix,
                                             collect_device=collect_device)
        self.iou_thr = [iou_thr] if isinstance(iou_thr, float) else iou_thr

    def process(self, data_batch: dict, data_samples: Sequence[dict]) -> None:
        """Process one batch of data samples and predictions.

        The processed results should be stored in ``self.results``, which will
        be used to compute the metrics when all batches have been processed.

        Args:
            data_batch (dict): A batch of data from the dataloader.
            data_samples (Sequence[dict]): A batch of outputs from the model.
        """
        for data_sample in data_samples:
            pred = data_sample['pred_instances']
            eval_ann_info = data_sample['eval_ann_info']
            ann = dict(labels=eval_ann_info['gt_bboxes_labels'],
                       bboxes=eval_ann_info['gt_bboxes'])

            pred_bboxes = pred['bboxes'].cpu().numpy()
            pred_scores = pred['scores'].cpu().numpy()
            pred_labels = pred['labels'].cpu().numpy()

            dets = []
            for label in range(len(self.dataset_meta['classes'])):
                index = np.where(pred_labels == label)[0]
                pred_bbox_scores = np.hstack(
                    [pred_bboxes[index], pred_scores[index].reshape((-1, 1))])
                dets.append(pred_bbox_scores)

            self.results.append((ann, dets))

    def compute_metrics(self, results: list) -> Dict[str, float]:
        """Compute the metrics from processed results.

        Args:
            results (list): The processed results of each batch.

        Returns:
            Dict[str, float]: The computed metrics. The keys are the names of
            the metrics, and the values are corresponding results.
        """
        logger: MMLogger = MMLogger.get_current_instance()
        annotations, preds = zip(*results)
        eval_results = OrderedDict()
        for iou_thr_2d_single in self.iou_thr:
            mean_ap, _ = eval_map(preds,
                                  annotations,
                                  scale_ranges=None,
                                  iou_thr=iou_thr_2d_single,
                                  dataset=self.dataset_meta['classes'],
                                  logger=logger)
            eval_results['mAP_' + str(iou_thr_2d_single)] = mean_ap
        return eval_results


================================================
FILE: bip3d/eval/metrics/grounding_metric.py
================================================
# Copyright (c) OpenRobotLab. All rights reserved.
import os
from typing import Dict, List, Optional, Sequence
import pickle

import mmengine
from mmengine.evaluator import BaseMetric
from mmengine.logging import MMLogger, print_log
from terminaltables import AsciiTable

from bip3d.registry import METRICS
from bip3d.structures import EulerDepthInstance3DBoxes


@METRICS.register_module()
class GroundingMetric(BaseMetric):
    """Lanuage grounding evaluation metric. We calculate the grounding
    performance based on the alignment score of each bbox with the input
    prompt.

    Args:
        iou_thr (float or List[float]): List of iou threshold when calculate
            the metric. Defaults to [0.25, 0.5].
        collect_device (str): Device name used for collecting results from
            different ranks during distributed training. Must be 'cpu' or
            'gpu'. Defaults to 'cpu'.
        prefix (str, optional): The prefix that will be added in the metric
            names to disambiguate homonymous metrics of different evaluators.
            If prefix is not provided in the argument, self.default_prefix will
            be used instead. Defaults to None.
        format_only (bool): Whether to only inference the predictions without
            evaluation. Defaults to False.
        result_dir (str): Dir to save results, e.g., if result_dir = './',
            the result file will be './test_results.json'. Defaults to ''.
    """

    def __init__(self,
                 iou_thr: List[float] = [0.25, 0.5],
                 collect_device: str = 'cpu',
                 prefix: Optional[str] = None,
                 format_only=False,
                 submit_info=None,
                 result_dir='', **kwargs) -> None:
        super(GroundingMetric, self).__init__(prefix=prefix,
                                              collect_device=collect_device,
                                              **kwargs)
        self.iou_thr = [iou_thr] if isinstance(iou_thr, float) else iou_thr
        self.prefix = prefix
        self.format_only = format_only
        self.result_dir = result_dir
        self.submit_info = submit_info if submit_info is not None else {}

    def process(self, data_batch: dict, data_samples: Sequence[dict]) -> None:
        """Process one batch of data samples and predictions.

        The processed results should be stored in ``self.results``, which will
        be used to compute the metrics when all batches have been processed.

        Args:
            data_batch (dict): A batch of data from the dataloader.
            data_samples (Sequence[dict]): A batch of outputs from the model.
        """
        for data_sample in data_samples:
            pred_3d = data_sample['pred_instances_3d']
            eval_ann_info = data_sample['eval_ann_info']
            eval_ann_info["scan_id"] = data_sample["scan_id"]
            cpu_pred_3d = dict()
            for k, v in pred_3d.items():
                if hasattr(v, 'to'):
                    cpu_pred_3d[k] = v.to('cpu')
                else:
                    cpu_pred_3d[k] = v
            self.results.append((eval_ann_info, cpu_pred_3d))

    def ground_eval(self, gt_annos, det_annos, logger=None, part="Overall"):

        assert len(det_annos) == len(gt_annos)

        pred = {}
        gt = {}

        object_types = [
            'Easy', 'Hard', 'View-Dep', 'View-Indep', 'Unique', 'Multi',
            'Overall'
        ]

        for t in self.iou_thr:
            for object_type in object_types:
                pred.update({object_type + '@' + str(t): 0})
                gt.update({object_type + '@' + str(t): 1e-14})

        for sample_id in range(len(det_annos)):
            det_anno = det_annos[sample_id]
            gt_anno = gt_annos[sample_id]
            target_scores = det_anno['target_scores_3d']  # (num_query, )

            bboxes = det_anno['bboxes_3d']
            gt_bboxes = gt_anno['gt_bboxes_3d']
            bboxes = EulerDepthInstance3DBoxes(bboxes.tensor,
                                               origin=(0.5, 0.5, 0.5))
            gt_bboxes = EulerDepthInstance3DBoxes(gt_bboxes.tensor,
                                                  origin=(0.5, 0.5, 0.5))

            view_dep = gt_anno['is_view_dep']
            hard = gt_anno['is_hard']
            unique = gt_anno['is_unique']

            box_index = target_scores.argsort(dim=-1, descending=True)[:10]
            top_bbox = bboxes[box_index]

            iou = top_bbox.overlaps(top_bbox, gt_bboxes)  # (num_query, 1)

            for t in self.iou_thr:
                threshold = iou > t
                found = int(threshold.any())
                if view_dep:
                    gt['View-Dep@' + str(t)] += 1
                    pred['View-Dep@' + str(t)] += found
                else:
                    gt['View-Indep@' + str(t)] += 1
                    pred['View-Indep@' + str(t)] += found
                if hard:
                    gt['Hard@' + str(t)] += 1
                    pred['Hard@' + str(t)] += found
                else:
                    gt['Easy@' + str(t)] += 1
                    pred['Easy@' + str(t)] += found
                if unique:
                    gt['Unique@' + str(t)] += 1
                    pred['Unique@' + str(t)] += found
                else:
                    gt['Multi@' + str(t)] += 1
                    pred['Multi@' + str(t)] += found

                gt['Overall@' + str(t)] += 1
                pred['Overall@' + str(t)] += found

        header = ['Type']
        header.extend(object_types)
        ret_dict = {}

        for t in self.iou_thr:
            if part == "Overall":
                table_columns = [[f'AP@{t:.2f}']]
            else:
                table_columns = [[f'{part}_AP@{t:.2f}']]

            for object_type in object_types:
                metric = object_type + '@' + str(t)
                value = pred[metric] / max(gt[metric], 1)
                ret_dict[metric] = value
                table_columns.append([f'{value:.4f}'])

            table_data = [header]
            table_rows = list(zip(*table_columns))
            table_data += table_rows
            table = AsciiTable(table_data)
            table.inner_footing_row_border = True
            print_log('\n' + table.table, logger=logger)

        return ret_dict

    def compute_metrics(self, results: list) -> Dict[str, float]:
        """Compute the metrics from processed results after all batches have
        been processed.

        Args:
            results (list): The processed results of each batch.

        Returns:
            Dict[str, float]: The computed metrics. The keys are the names of
            the metrics, and the values are corresponding results.
        """
        logger: MMLogger = MMLogger.get_current_instance()  # noqa
        annotations, preds = zip(*results)
        ret_dict = {}
        if self.format_only:
            # preds is a list of dict
            results = []
            for pred in preds:
                result = dict()
                # convert the Euler boxes to the numpy array to save
                bboxes_3d = pred['bboxes_3d'].tensor
                scores_3d = pred['scores_3d']
                # Note: hard-code save top-20 predictions
                # eval top-10 predictions during the test phase by default
                box_index = scores_3d.argsort(dim=-1, descending=True)[:20]
                top_bboxes_3d = bboxes_3d[box_index]
                top_scores_3d = scores_3d[box_index]
                result['bboxes_3d'] = top_bboxes_3d.numpy()
                result['scores_3d'] = top_scores_3d.numpy()
                results.append(result)
            mmengine.dump(results,
                          os.path.join(self.result_dir, 'test_results.json'))
            for x in results:
                x["bboxes_3d"] = x["bboxes_3d"].tolist()
                x["scores_3d"] = x["scores_3d"].tolist()
            submission = {
                'method': 'xxx',
                'team': 'xxx',
                'authors': ['xxx'],
                'e-mail': 'xxx',
                'institution': 'xxx',
                'country': 'xxx',
                'results': results,
            }
            submission.update(self.submit_info)
            pickle.dump(submission, open(os.path.join(self.result_dir, 'submission.pkl'), "wb"))
            return ret_dict

        ret_dict = self.ground_eval(annotations, preds)
        for part in ["scannet", "3rscan", "matterport3d", "arkit"]:
            ann = [x for x in annotations if part in x["scan_id"]]
            pred = [y for x,y in zip(annotations, preds) if part in x["scan_id"]]
            if len(pred) > 0:
                part_ret_dict= self.ground_eval(ann, pred, part=part)
                part_ret_dict = {f"{part}--{k}": v for k, v in part_ret_dict.items()}
                ret_dict.update(part_ret_dict)

        return ret_dict


================================================
FILE: bip3d/grid_mask.py
================================================
import torch
import torch.nn as nn
import numpy as np
from PIL import Image


class Grid(object):
    def __init__(
        self, use_h, use_w, rotate=1, offset=False, ratio=0.5, mode=0, prob=1.0
    ):
        self.use_h = use_h
        self.use_w = use_w
        self.rotate = rotate
        self.offset = offset
        self.ratio = ratio
        self.mode = mode
        self.st_prob = prob
        self.prob = prob

    def set_prob(self, epoch, max_epoch):
        self.prob = self.st_prob * epoch / max_epoch

    def __call__(self, img, label):
        if np.random.rand() > self.prob:
            return img, label
        h = img.size(1)
        w = img.size(2)
        self.d1 = 2
        self.d2 = min(h, w)
        hh = int(1.5 * h)
        ww = int(1.5 * w)
        d = np.random.randint(self.d1, self.d2)
        if self.ratio == 1:
            self.l = np.random.randint(1, d)
        else:
            self.l = min(max(int(d * self.ratio + 0.5), 1), d - 1)
        mask = np.ones((hh, ww), np.float32)
        st_h = np.random.randint(d)
        st_w = np.random.randint(d)
        if self.use_h:
            for i in range(hh // d):
                s = d * i + st_h
                t = min(s + self.l, hh)
                mask[s:t, :] *= 0
        if self.use_w:
            for i in range(ww // d):
                s = d * i + st_w
                t = min(s + self.l, ww)
                mask[:, s:t] *= 0

        r = np.random.randint(self.rotate)
        mask = Image.fromarray(np.uint8(mask))
        mask = mask.rotate(r)
        mask = np.asarray(mask)
        mask = mask[
            (hh - h) // 2 : (hh - h) // 2 + h,
            (ww - w) // 2 : (ww - w) // 2 + w,
        ]

        mask = torch.from_numpy(mask).float()
        if self.mode == 1:
            mask = 1 - mask

        mask = mask.expand_as(img)
        if self.offset:
            offset = torch.from_numpy(2 * (np.random.rand(h, w) - 0.5)).float()
            offset = (1 - mask) * offset
            img = img * mask + offset
        else:
            img = img * mask

        return img, label


class GridMask(object):
    def __init__(
        self, use_h, use_w, rotate=1, offset=False, ratio=0.5, mode=0, prob=1.0
    ):
        super(GridMask, self).__init__()
        self.use_h = use_h
        self.use_w = use_w
        self.rotate = rotate
        self.offset = offset
        self.ratio = ratio
        self.mode = mode
        self.st_prob = prob
        self.prob = prob

    def set_prob(self, epoch, max_epoch):
        self.prob = self.st_prob * epoch / max_epoch  # + 1.#0.5

    def __call__(self, x, offset=None):
        if np.random.rand() > self.prob:
            return x
        n, c, h, w = x.size()
        x = x.view(-1, h, w)
        hh = int(1.5 * h)
        ww = int(1.5 * w)
        d = np.random.randint(2, h)
        self.l = min(max(int(d * self.ratio + 0.5), 1), d - 1)
        mask = np.ones((hh, ww), np.float32)
        st_h = np.random.randint(d)
        st_w = np.random.randint(d)
        if self.use_h:
            for i in range(hh // d):
                s = d * i + st_h
                t = min(s + self.l, hh)
                mask[s:t, :] *= 0
        if self.use_w:
            for i in range(ww // d):
                s = d * i + st_w
                t = min(s + self.l, ww)
                mask[:, s:t] *= 0

        r = np.random.randint(self.rotate)
        mask = Image.fromarray(np.uint8(mask))
        mask = mask.rotate(r)
        mask = np.asarray(mask)
        mask = mask[
            (hh - h) // 2 : (hh - h) // 2 + h,
            (ww - w) // 2 : (ww - w) // 2 + w,
        ]

        mask = torch.tensor(mask).float().cuda()
        if self.mode == 1:
            mask = 1 - mask
        mask = mask.expand_as(x)
        if offset is not None:
            x = x.view(n, c, h, w)
            mask = mask.view(n, c, h, w)
            x = x * mask + offset * (1 - mask)
            return x
        elif self.offset:
            offset = (
                torch.from_numpy(2 * (np.random.rand(h, w) - 0.5))
                .float()
                .cuda()
            )
            x = x * mask + offset * (1 - mask)
        else:
            x = x * mask

        return x.view(n, c, h, w)


================================================
FILE: bip3d/models/__init__.py
================================================
from .structure import *
from .feature_enhancer import *
from .spatial_enhancer import *
from .bbox3d_decoder import *
from .instance_bank import *
from .target import *
from .data_preprocessors import *
from .deformable_aggregation import *
from .bert import *


================================================
FILE: bip3d/models/base_target.py
================================================
from torch import nn
from abc import ABC, abstractmethod


__all__ = ["BaseTargetWithDenoising"]


class BaseTargetWithDenoising(ABC, nn.Module):
    def __init__(self, num_dn=0, num_temp_dn_groups=0):
        super(BaseTargetWithDenoising, self).__init__()
        self.num_dn = num_dn
        self.num_temp_dn_groups = num_temp_dn_groups
        self.dn_metas = None

    @abstractmethod
    def sample(self, cls_pred, box_pred, cls_target, box_target):
        """
        Perform Hungarian matching between predictions and ground truth,
        returning the matched ground truth corresponding to the predictions
        along with the corresponding regression weights.
        """

    def get_dn_anchors(self, cls_target, box_target, *args, **kwargs):
        """
        Generate noisy instances for the current frame, with a total of
        'self.num_dn_groups' groups.
        """
        return None

    def update_dn(self, instance_feature, anchor, *args, **kwargs):
        """
        Insert the previously saved 'self.dn_metas' into the noisy instances
        of the current frame.
        """

    def cache_dn(
        self,
        dn_instance_feature,
        dn_anchor,
        dn_cls_target,
        valid_mask,
        dn_id_target,
    ):
        """
        Randomly save information for 'self.num_temp_dn_groups' groups of
        temporal noisy instances to 'self.dn_metas'.
        """
        if self.num_temp_dn_groups < 0:
            return
        self.dn_metas = dict(dn_anchor=dn_anchor[:, : self.num_temp_dn_groups])


================================================
FILE: bip3d/models/bbox3d_decoder.py
================================================
import math
from typing import List, Optional, Tuple, Union

import numpy as np
import torch
from torch import nn
from torch.cuda.amp.autocast_mode import autocast

from mmcv.cnn import Linear, Scale
from mmcv.ops import nms3d_normal, nms3d
from mmengine.model import BaseModel
from mmengine.structures import InstanceData
from mmdet.models.layers import SinePositionalEncoding
from mmdet.models.layers.transformer.deformable_detr_layers import (
    DeformableDetrTransformerEncoder as DDTE,
)
from mmdet.models.layers.transformer.utils import MLP

from mmdet.utils import reduce_mean
from mmdet.models.detectors.glip import create_positive_map_label_to_token
from mmdet.models.dense_heads.atss_vlfusion_head import (
    convert_grounding_to_cls_scores,
)

from bip3d.registry import MODELS, TASK_UTILS
from bip3d.structures.bbox_3d.utils import rotation_3d_in_euler
from .utils import (
    deformable_format,
    wasserstein_distance,
    permutation_corner_distance,
    center_distance,
    get_positive_map,
    get_entities,
    linear_act_ln,
)

__all__ = ["BBox3DDecoder"]


X, Y, Z, W, L, H, ALPHA, BETA, GAMMA = range(9)


def decode_box(box, min_size=None, max_size=None):
    size = box[..., 3:6].exp()
    # size = box[..., 3:6]
    if min_size is not None or max_size is not None:
        size = size.clamp(min=min_size, max=max_size)
    box = torch.cat(
        [box[..., :3], size, box[..., 6:]],
        dim=-1,
    )
    return box


@MODELS.register_module()
class BBox3DDecoder(BaseModel):
    def __init__(
        self,
        instance_bank: dict,
        anchor_encoder: dict,
        graph_model: dict,
        norm_layer: dict,
        ffn: dict,
        deformable_model: dict,
        refine_layer: dict,
        num_decoder: int = 6,
        num_single_frame_decoder: int = -1,
        temp_graph_model: dict = None,
        text_cross_attn: dict = None,
        loss_cls: dict = None,
        loss_reg: dict = None,
        post_processor: dict = None,
        sampler: dict = None,
        gt_cls_key: str = "gt_labels_3d",
        gt_reg_key: str = "gt_bboxes_3d",
        gt_id_key: str = "instance_id",
        with_instance_id: bool = True,
        task_prefix: str = "det",
        reg_weights: List = None,
        operation_order: Optional[List[str]] = None,
        cls_threshold_to_reg: float = -1,
        look_forward_twice: bool = False,
        init_cfg: dict = None,
        **kwargs,
    ):
        super().__init__(init_cfg)
        self.num_decoder = num_decoder
        self.num_single_frame_decoder = num_single_frame_decoder
        self.gt_cls_key = gt_cls_key
        self.gt_reg_key = gt_reg_key
        self.gt_id_key = gt_id_key
        self.with_instance_id = with_instance_id
        self.task_prefix = task_prefix
        self.cls_threshold_to_reg = cls_threshold_to_reg
        self.look_forward_twice = look_forward_twice

        if reg_weights is None:
            self.reg_weights = [1.0] * 9
        else:
            self.reg_weights = reg_weights

        if operation_order is None:
            operation_order = [
                "gnn",
                "norm",
                "text_cross_attn",
                "norm",
                "deformable",
                "norm",
                "ffn",
                "norm",
                "refine",
            ] * num_decoder
        self.operation_order = operation_order

        # =========== build modules ===========
        def build(cfg, registry=MODELS):
            if cfg is None:
                return None
            return registry.build(cfg)

        self.instance_bank = build(instance_bank)
        self.anchor_encoder = build(anchor_encoder)
        self.sampler = build(sampler, TASK_UTILS)
        self.post_processor = build(post_processor, TASK_UTILS)
        self.loss_cls = build(loss_cls)
        self.loss_reg = build(loss_reg)
        self.op_config_map = {
            "temp_gnn": temp_graph_model,
            "gnn": graph_model,
            "norm": norm_layer,
            "ffn": ffn,
            "deformable": deformable_model,
            "text_cross_attn": text_cross_attn,
            "refine": refine_layer,
        }
        self.layers = nn.ModuleList(
            [
                build(self.op_config_map.get(op, None))
                for op in self.operation_order
            ]
        )
        self.embed_dims = self.instance_bank.embed_dims
        self.norm = nn.LayerNorm(self.embed_dims)

    def init_weights(self):
        from mmengine.model import constant_init
        for p in self.parameters():
            if p.dim() > 1:
                nn.init.xavier_uniform_(p)
        for i, op in enumerate(self.operation_order):
            if op == "refine":
                m = self.layers[i]
                constant_init(m.layers[-2], 0, bias=0)
                constant_init(m.layers[-1], 1)
                nn.init.constant_(m.layers[-2].bias.data[2:], 0.0)

    def forward(
        self,
        feature_maps,
        text_dict=None,
        batch_inputs=None,
        depth_prob=None,
        **kwargs,
    ):
        batch_size = feature_maps[0].shape[0]
        feature_maps = list(deformable_format(feature_maps))

        # ========= get instance info ============
        if (
            self.sampler.dn_metas is not None
            and self.sampler.dn_metas["dn_anchor"].shape[0] != batch_size
        ):
            self.sampler.dn_metas = None
        (
            instance_feature,
            anchor,
            temp_instance_feature,
            temp_anchor,
            time_interval,
        ) = self.instance_bank.get(
            batch_size, batch_inputs, dn_metas=self.sampler.dn_metas
        )

        # ========= prepare for denosing training ============
        # 1. get dn metas: noisy-anchors and corresponding GT
        # 2. concat learnable instances and noisy instances
        # 3. get attention mask
        attn_mask = None
        dn_metas = None
        temp_dn_reg_target = None
        if self.training and hasattr(self.sampler, "get_dn_anchors"):
            dn_metas = self.sampler.get_dn_anchors(
                batch_inputs[self.gt_cls_key],
                batch_inputs[self.gt_reg_key],
                text_dict=text_dict,
                label=batch_inputs["gt_labels_3d"],
            )
        if dn_metas is not None:
            (
                dn_anchor,
                dn_reg_target,
                dn_cls_target,
                dn_attn_mask,
                valid_mask,
                dn_query,
            ) = dn_metas
            num_dn_anchor = dn_anchor.shape[1]
            if dn_anchor.shape[-1] != anchor.shape[-1]:
                remain_state_dims = anchor.shape[-1] - dn_anchor.shape[-1]
                dn_anchor = torch.cat(
                    [
                        dn_anchor,
                        dn_anchor.new_zeros(
                            batch_size, num_dn_anchor, remain_state_dims
                        ),
                    ],
                    dim=-1,
                )
            anchor = torch.cat([anchor, dn_anchor], dim=1)
            if dn_query is None:
                dn_query = instance_feature.new_zeros(
                    batch_size, num_dn_anchor, instance_feature.shape[-1]
                ),
            instance_feature = torch.cat(
                [instance_feature, dn_query], dim=1,
            )
            num_instance = instance_feature.shape[1]
            num_free_instance = num_instance - num_dn_anchor
            attn_mask = anchor.new_ones(
                (num_instance, num_instance), dtype=torch.bool
            )
            attn_mask[:num_free_instance, :num_free_instance] = False
            attn_mask[num_free_instance:, num_free_instance:] = dn_attn_mask
        else:
            num_dn_anchor = None
            num_free_instance = None

        anchor_embed = self.anchor_encoder(anchor)
        if temp_anchor is not None:
            temp_anchor_embed = self.anchor_encoder(temp_anchor)
        else:
            temp_anchor_embed = None

        # =================== forward the layers ====================
        prediction = []
        classification = []
        quality = []
        _anchor = None
        for i, (op, layer) in enumerate(
            zip(self.operation_order, self.layers)
        ):
            if self.layers[i] is None:
                continue
            elif op == "temp_gnn":
                instance_feature = layer(
                    query=instance_feature,
                    key=temp_instance_feature,
                    value=temp_instance_feature,
                    query_pos=anchor_embed,
                    key_pos=temp_anchor_embed,
                    attn_mask=(
                        attn_mask if temp_instance_feature is None else None
                    ),
                )
            elif op == "gnn":
                instance_feature = layer(
                    query=instance_feature,
                    key=instance_feature,
                    value=instance_feature,
                    query_pos=anchor_embed,
                    key_pos=anchor_embed,
                    attn_mask=attn_mask,
                )
            elif op == "norm" or op == "ffn":
                instance_feature = layer(instance_feature)
            elif op == "deformable":
                instance_feature = layer(
                    instance_feature,
                    anchor,
                    anchor_embed,
                    feature_maps,
                    batch_inputs,
                    depth_prob=depth_prob,
                )
            elif op == "text_cross_attn":
                text_feature = text_dict["embedded"]
                instance_feature = layer(
                    query=instance_feature,
                    key=text_feature,
                    value=text_feature,
                    query_pos=anchor_embed,
                    key_padding_mask=~text_dict["text_token_mask"],
                    key_pos=0,
                )
            elif op == "refine":
                _instance_feature = self.norm(instance_feature)
                if self.look_forward_twice:
                    if _anchor is None:
                        _anchor = anchor.clone()
                    _anchor, cls, qt = layer(
                        _instance_feature,
                        _anchor,
                        anchor_embed,
                        time_interval=time_interval,
                        text_feature=text_feature,
                        text_token_mask=text_dict["text_token_mask"],
                    )
                    prediction.append(_anchor)
                    anchor = layer(
                        instance_feature,
                        anchor,
                        anchor_embed,
                        time_interval=time_interval,
                    )[0]
                    anchor_embed = self.anchor_encoder(anchor)
                    _anchor = anchor
                    anchor = anchor.detach()
                else:
                    anchor, cls, qt = layer(
                        _instance_feature,
                        anchor,
                        anchor_embed,
                        time_interval=time_interval,
                        text_feature=text_feature,
                        text_token_mask=text_dict["text_token_mask"],
                    )
                    anchor_embed = self.anchor_encoder(anchor)
                    prediction.append(anchor)
                classification.append(cls)
                quality.append(qt)

                if len(prediction) == self.num_single_frame_decoder:
                    instance_feature, anchor = self.instance_bank.update(
                        instance_feature, anchor if _anchor is None else _anchor,
                        cls, num_dn_anchor
                    )
                    anchor_embed = self.anchor_encoder(anchor)
                    if self.look_forward_twice:
                        _anchor = anchor
                        anchor = anchor.detach()
                    if dn_metas is not None:
                        num_instance = instance_feature.shape[1]
                        attn_mask = anchor.new_ones(
                            (num_instance, num_instance), dtype=torch.bool
                        )
                        attn_mask[:-num_dn_anchor, :-num_dn_anchor] = False
                        attn_mask[-num_dn_anchor:, -num_dn_anchor:] = dn_attn_mask

                if (
                    len(prediction) > self.num_single_frame_decoder
                    and temp_anchor_embed is not None
                ):
                    temp_anchor_embed = anchor_embed[
                        :, : self.instance_bank.num_temp_instances
                    ]
            else:
                raise NotImplementedError(f"{op} is not supported.")

        output = {}

        # split predictions of learnable instances and noisy instances
        if dn_metas is not None:
            dn_classification = [
                x[:, -num_dn_anchor:] for x in classification
            ]
            classification = [x[:, :-num_dn_anchor] for x in classification]
            dn_prediction = [x[:, -num_dn_anchor:] for x in prediction]
            prediction = [x[:, :-num_dn_anchor] for x in prediction]
            quality = [
                x[:, :-num_dn_anchor] if x is not None else None
                for x in quality
            ]
            output.update(
                {
                    "dn_prediction": dn_prediction,
                    "dn_classification": dn_classification,
                    "dn_reg_target": dn_reg_target,
                    "dn_cls_target": dn_cls_target,
                    "dn_valid_mask": valid_mask,
                }
            )
            if temp_dn_reg_target is not None:
                output.update(
                    {
                        "temp_dn_reg_target": temp_dn_reg_target,
                        "temp_dn_cls_target": temp_dn_cls_target,
                        "temp_dn_valid_mask": temp_valid_mask,
                        # "dn_id_target": dn_id_target,
                    }
                )
                dn_cls_target = temp_dn_cls_target
                valid_mask = temp_valid_mask
            dn_instance_feature = instance_feature[:, -num_dn_anchor:]
            dn_anchor = anchor[:, -num_dn_anchor:]
            instance_feature = instance_feature[:, :-num_dn_anchor]
            anchor_embed = anchor_embed[:, :-num_dn_anchor]
            anchor = anchor[:, :-num_dn_anchor]
            cls = cls[:, :-num_dn_anchor]

        output.update(
            {
                "classification": classification,
                "prediction": prediction,
                "quality": quality,
                "instance_feature": instance_feature,
                "anchor_embed": anchor_embed,
            }
        )

        # cache current instances for temporal modeling
        self.instance_bank.cache(
            instance_feature, anchor, cls, batch_inputs, feature_maps
        )
        if self.with_instance_id:
            instance_id = self.instance_bank.get_instance_id(
                cls, anchor, self.decoder.score_threshold
            )
            output["instance_id"] = instance_id
        return output

    def loss(self, model_outs, data, text_dict=None):
        # ===================== prediction losses ======================
        cls_scores = model_outs["classification"]
        reg_preds = model_outs["prediction"]
        quality = model_outs["quality"]
        output = {}
        for decoder_idx, (cls, reg, qt) in enumerate(
            zip(cls_scores, reg_preds, quality)
        ):
            reg = reg[..., : len(self.reg_weights)]
            reg = decode_box(reg)
            cls_target, reg_target, reg_weights, ignore_mask = self.sampler.sample(
                cls,
                reg,
                data[self.gt_cls_key],
                data[self.gt_reg_key],
                text_dict=text_dict,
                ignore_mask=data.get("ignore_mask"),
            )
            reg_target = reg_target[..., : len(self.reg_weights)]
            reg_target_full = reg_target.clone()
            mask = torch.logical_not(torch.all(reg_target == 0, dim=-1))
            mask = mask.reshape(-1)
            if ignore_mask is not None:
                ignore_mask = ~ignore_mask.reshape(-1)
                mask = torch.logical_and(mask, ignore_mask)
                ignore_mask = ignore_mask.tile(1, cls.shape[-1])
                
            mask_valid = mask.clone()

            num_pos = max(
                reduce_mean(torch.sum(mask).to(dtype=reg.dtype)), 1.0
            )

            cls = cls.flatten(end_dim=1)
            cls_target = cls_target.flatten(end_dim=1)
            token_mask = torch.logical_not(cls.isinf())
            cls = cls[token_mask]
            cls_target = cls_target[token_mask]

            if ignore_mask is None:
                cls_loss = self.loss_cls(cls, cls_target, avg_factor=num_pos)
            else:
                ignore_mask = ignore_mask[token_mask]
                cls_loss = self.loss_cls(
                    cls[ignore_mask],
                    cls_target[ignore_mask],
                    avg_factor=num_pos,
                    weight=cls_mask[ignore_mask],
                )

            reg_weights = reg_weights * reg.new_tensor(self.reg_weights)
            reg_target = reg_target.flatten(end_dim=1)[mask]
            reg = reg.flatten(end_dim=1)[mask]
            reg_weights = reg_weights.flatten(end_dim=1)[mask]
            reg_target = torch.where(
                reg_target.isnan(), reg.new_tensor(0.0), reg_target
            )
            # cls_target = cls_target[mask]
            if qt is not None:
                qt = qt.flatten(end_dim=1)[mask]

            reg_loss = self.loss_reg(
                reg,
                reg_target,
                weight=reg_weights,
                avg_factor=num_pos,
                prefix=f"{self.task_prefix}_",
                suffix=f"_{decoder_idx}",
                quality=qt,
                # cls_target=cls_target,
            )

            output[f"{self.task_prefix}_loss_cls_{decoder_idx}"] = cls_loss
            output.update(reg_loss)

        if "dn_prediction" not in model_outs:
            return output

        # ===================== denoising losses ======================
        dn_cls_scores = model_outs["dn_classification"]
        dn_reg_preds = model_outs["dn_prediction"]

        (
            dn_valid_mask,
            dn_cls_target,
            dn_reg_target,
            dn_pos_mask,
            reg_weights,
            num_dn_pos,
        ) = self.prepare_for_dn_loss(model_outs)

        for decoder_idx, (cls, reg) in enumerate(
            zip(dn_cls_scores, dn_reg_preds)
        ):
            if (
                "temp_dn_valid_mask" in model_outs
                and decoder_idx == self.num_single_frame_decoder
            ):
                (
                    dn_valid_mask,
                    dn_cls_target,
                    dn_reg_target,
                    dn_pos_mask,
                    reg_weights,
                    num_dn_pos,
                ) = self.prepare_for_dn_loss(model_outs, prefix="temp_")

            cls = cls.flatten(end_dim=1)[dn_valid_mask]
            mask = torch.logical_not(cls.isinf())
            cls_loss = self.loss_cls(
                cls[mask],
                dn_cls_target[mask],
                avg_factor=num_dn_pos,
            )

            reg = reg.flatten(end_dim=1)[dn_valid_mask][dn_pos_mask][
                ..., : len(self.reg_weights)
            ]
            reg = decode_box(reg)
            reg_loss = self.loss_reg(
                reg,
                dn_reg_target,
                avg_factor=num_dn_pos,
                weight=reg_weights,
                prefix=f"{self.task_prefix}_",
                suffix=f"_dn_{decoder_idx}",
            )
            output[f"{self.task_prefix}_loss_cls_dn_{decoder_idx}"] = cls_loss
            output.update(reg_loss)
        return output

    def prepare_for_dn_loss(self, model_outs, prefix=""):
        dn_valid_mask = model_outs[f"{prefix}dn_valid_mask"].flatten(end_dim=1)
        dn_cls_target = model_outs[f"{prefix}dn_cls_target"].flatten(
            end_dim=1
        )[dn_valid_mask]
        dn_reg_target = model_outs[f"{prefix}dn_reg_target"].flatten(
            end_dim=1
        )[dn_valid_mask][..., : len(self.reg_weights)]
        dn_pos_mask = dn_cls_target.sum(dim=-1) > 0
        dn_reg_target = dn_reg_target[dn_pos_mask]
        reg_weights = dn_reg_target.new_tensor(self.reg_weights)[None].tile(
            dn_reg_target.shape[0], 1
        )
        num_dn_pos = max(
            reduce_mean(torch.sum(dn_valid_mask).to(dtype=reg_weights.dtype)),
            1.0,
        )
        return (
            dn_valid_mask,
            dn_cls_target,
            dn_reg_target,
            dn_pos_mask,
            reg_weights,
            num_dn_pos,
        )

    def post_process(
        self,
        model_outs,
        text_dict,
        batch_inputs,
        batch_data_samples,
        results_in_data_samples=True,
    ):
        results = self.post_processor(
            model_outs["classification"],
            model_outs["prediction"],
            model_outs.get("instance_id"),
            model_outs.get("quality"),
            text_dict=text_dict,
            batch_inputs=batch_inputs,
        )
        if results_in_data_samples:
            for i, ret in enumerate(results):
                instances = InstanceData()
                for k, v in ret.items():
                    if k == "bboxes_3d":
                        type = batch_data_samples[i].metainfo["box_type_3d"]
                        v = type(
                            v,
                            box_dim=v.shape[1],
                            origin=(0.5, 0.5, 0.5),
                        )
                    instances.__setattr__(k, v)
                batch_data_samples[i].pred_instances_3d = instances
            return batch_data_samples
        return results


@MODELS.register_module()
class GroundingRefineClsHead(BaseModel):
    def __init__(
        self,
        embed_dims=256,
        output_dim=9,
        scale=None,
        cls_layers=False,
        cls_bias=True,
    ):
        super().__init__()
        self.embed_dims = embed_dims
        self.output_dim = output_dim
        self.refine_state = list(range(output_dim))
        self.scale = scale
        self.layers = nn.Sequential(
            *linear_act_ln(embed_dims, 2, 2),
            # MLP(embed_dims, embed_dims, embed_dims, 2),
            # nn.LayerNorm(self.embed_dims),
            # MLP(embed_dims, embed_dims, embed_dims, 2),
            Linear(self.embed_dims, self.output_dim),
            Scale([1.0] * self.output_dim),
        )
        if cls_layers:
            self.cls_layers = nn.Sequential(
                MLP(embed_dims, embed_dims, embed_dims, 2),
                nn.LayerNorm(self.embed_dims),
            )
        else:
            self.cls_layers = nn.Identity()
        if cls_bias:
            bias_value = -math.log((1 - 0.01) / 0.01)
            self.bias = nn.Parameter(
                torch.Tensor([bias_value]), requires_grad=True
            )
        else:
            self.bias = None

    def forward(
        self,
        instance_feature: torch.Tensor,
        anchor: torch.Tensor = None,
        anchor_embed: torch.Tensor = None,
        time_interval: torch.Tensor = 1.0,
        text_feature=None,
        text_token_mask=None,
        **kwargs,
    ):
        if anchor_embed is not None:
            feature = instance_feature + anchor_embed
        else:
            feature = instance_feature
        output = self.layers(feature)
        if self.scale is not None:
            output = output * output.new_tensor(self.scale)
        if anchor is not None:
            output = output + anchor

        if text_feature is not None:
            cls = self.cls_layers(
                instance_feature) @ text_feature.transpose(-1, -2)
            cls = cls / math.sqrt(instance_feature.shape[-1])
            if self.bias is not None:
                cls = cls + self.bias
            if text_token_mask is not None:
                cls.masked_fill_(~text_token_mask[:, None, :], float("-inf"))
        else:
            cls = None
        return output, cls, None


@MODELS.register_module()
class DoF9BoxLoss(nn.Module):
    def __init__(
        self,
        loss_weight_wd=1.0,
        loss_weight_pcd=0.0,
        loss_weight_cd=0.8,
        decode_pred=False,
    ):
        super().__init__()
        self.loss_weight_wd = loss_weight_wd
        self.loss_weight_pcd = loss_weight_pcd
        self.loss_weight_cd = loss_weight_cd
        self.decode_pred = decode_pred

    def forward(
        self,
        box,
        box_target,
        weight=None,
        avg_factor=None,
        prefix="",
        suffix="",
        **kwargs,
    ):
        if box_target.shape[0] == 0:
            loss = box.sum() * 0
            return {f"{prefix}loss_box{suffix}": loss}
        if self.decode_pred:
            box = decode_box(box)
        loss = 0
        if self.loss_weight_wd > 0:
            loss += self.loss_weight_wd * wasserstein_distance(box, box_target)
        if self.loss_weight_pcd > 0:
            loss += self.loss_weight_pcd * permutation_corner_distance(
                box, box_target
            )
        if self.loss_weight_cd > 0:
            loss += self.loss_weight_cd * center_distance(box, box_target)

        if avg_factor is None:
            loss = loss.mean()
        else:
            loss = loss.sum() / avg_factor
        output = {f"{prefix}loss_box{suffix}": loss}
        return output


@TASK_UTILS.register_module()
class GroundingBox3DPostProcess:
    def __init__(
        self,

Download .txt

gitextract_nz633hq0/

├── .gitignore
├── LICENSE
├── README.md
├── bip3d/
│   ├── __init__.py
│   ├── converter/
│   │   ├── extract_occupancy_ann.py
│   │   ├── generate_image_3rscan.py
│   │   ├── generate_image_matterport3d.py
│   │   └── generate_image_scannet.py
│   ├── datasets/
│   │   ├── __init__.py
│   │   ├── embodiedscan_det_grounding_dataset.py
│   │   ├── transforms/
│   │   │   ├── __init__.py
│   │   │   ├── augmentation.py
│   │   │   ├── formatting.py
│   │   │   ├── loading.py
│   │   │   ├── multiview.py
│   │   │   └── transform.py
│   │   └── utils.py
│   ├── eval/
│   │   ├── __init__.py
│   │   ├── indoor_eval.py
│   │   └── metrics/
│   │       ├── __init__.py
│   │       ├── det_metric.py
│   │       └── grounding_metric.py
│   ├── grid_mask.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── base_target.py
│   │   ├── bbox3d_decoder.py
│   │   ├── bert.py
│   │   ├── data_preprocessors/
│   │   │   ├── __init__.py
│   │   │   ├── custom_data_preprocessor.py
│   │   │   └── utils.py
│   │   ├── deformable_aggregation.py
│   │   ├── feature_enhancer.py
│   │   ├── instance_bank.py
│   │   ├── spatial_enhancer.py
│   │   ├── structure.py
│   │   ├── target.py
│   │   └── utils.py
│   ├── ops/
│   │   ├── __init__.py
│   │   ├── deformable_aggregation.py
│   │   ├── gcc.sh
│   │   ├── setup.py
│   │   └── src/
│   │       ├── deformable_aggregation.cpp
│   │       ├── deformable_aggregation_cuda.cu
│   │       ├── deformable_aggregation_with_depth.cpp
│   │       └── deformable_aggregation_with_depth_cuda.cu
│   ├── registry.py
│   ├── structures/
│   │   ├── __init__.py
│   │   ├── bbox_3d/
│   │   │   ├── __init__.py
│   │   │   ├── base_box3d.py
│   │   │   ├── box_3d_mode.py
│   │   │   ├── coord_3d_mode.py
│   │   │   ├── euler_box3d.py
│   │   │   ├── euler_depth_box3d.py
│   │   │   └── utils.py
│   │   ├── ops/
│   │   │   ├── __init__.py
│   │   │   ├── box_np_ops.py
│   │   │   ├── iou3d_calculator.py
│   │   │   └── transforms.py
│   │   └── points/
│   │       ├── __init__.py
│   │       ├── base_points.py
│   │       ├── cam_points.py
│   │       ├── depth_points.py
│   │       └── lidar_points.py
│   └── utils/
│       ├── __init__.py
│       ├── array_converter.py
│       ├── default_color_map.py
│       ├── dist_utils.py
│       ├── line_mesh.py
│       └── typing_config.py
├── configs/
│   ├── __init__.py
│   ├── bip3d_det.py
│   ├── bip3d_det_grounding.py
│   ├── bip3d_det_rgb.py
│   ├── bip3d_grounding.py
│   ├── bip3d_grounding_rgb.py
│   └── default_runtime.py
├── docs/
│   └── quick_start.md
├── engine.sh
├── requirements.txt
├── test.sh
└── tools/
    ├── anchor_bbox3d_kmeans.py
    ├── benchmark.py
    ├── ckpt_rename.py
    ├── dist_test.sh
    ├── dist_train.sh
    ├── test.py
    └── train.py

Download .txt

SYMBOL INDEX (451 symbols across 54 files)

FILE: bip3d/converter/extract_occupancy_ann.py
  function extract_occupancy (line 8) | def extract_occupancy(dataset, src, dst):

FILE: bip3d/converter/generate_image_3rscan.py
  function process_scene (line 9) | def process_scene(path, scene_name):

FILE: bip3d/converter/generate_image_matterport3d.py
  function process_scene (line 9) | def process_scene(path, output_folder, scene_name):

FILE: bip3d/converter/generate_image_scannet.py
  class RGBDFrame (line 22) | class RGBDFrame:
    method load (line 25) | def load(self, file_handle):
    method decompress_depth (line 41) | def decompress_depth(self, compression_type):
    method decompress_color (line 46) | def decompress_color(self, compression_type):
  class SensorData (line 52) | class SensorData:
    method __init__ (line 58) | def __init__(self, filename, fast=False):
    method load (line 62) | def load(self, filename, fast):
    method export_depth_images (line 105) | def export_depth_images(self, output_path):
    method export_color_images (line 118) | def export_color_images(self, output_path):
    method index_to_str (line 130) | def index_to_str(index):
    method save_mat_to_file (line 135) | def save_mat_to_file(matrix, filename):
    method export_poses (line 141) | def export_poses(self, output_path):
    method export_intrinsics (line 151) | def export_intrinsics(self, output_path):
    method export_depth_intrinsics (line 158) | def export_depth_intrinsics(self, output_path):
  function process_scene (line 166) | def process_scene(path, fast, idx):
  function process_directory (line 186) | def process_directory(path, fast, nproc):

FILE: bip3d/datasets/embodiedscan_det_grounding_dataset.py
  class EmbodiedScanDetGroundingDataset (line 90) | class EmbodiedScanDetGroundingDataset(BaseDataset):
    method __init__ (line 91) | def __init__(
    method process_metainfo (line 147) | def process_metainfo(self):
    method parse_data_info (line 164) | def parse_data_info(self, info: dict):
    method parse_ann_info (line 220) | def parse_ann_info(self, info: dict):
    method _get_axis_align_matrix (line 285) | def _get_axis_align_matrix(info: dict):
    method _ids2masks (line 303) | def _ids2masks(self, ids, mask_length):
    method _remove_dontcare (line 312) | def _remove_dontcare(self, ann_info: dict):
    method load_data_list (line 343) | def load_data_list(self):
    method _get_axis_align_matrix (line 395) | def _get_axis_align_matrix(info: dict):
    method _is_view_dep (line 405) | def _is_view_dep(text):
    method convert_info_to_scan (line 414) | def convert_info_to_scan(self):
    method load_language_data (line 421) | def load_language_data(self):
    method convert_to_continuous (line 442) | def convert_to_continuous(self):
    method get_data_info_continuous (line 469) | def get_data_info_continuous(self, data_info):
    method merge_grounding_data (line 513) | def merge_grounding_data(self, data_infos):
    method get_data_info_grounding (line 550) | def get_data_info_grounding(self, data_info):
    method get_data_info (line 607) | def get_data_info(self, idx):
  function index (line 617) | def index(input, idx):

FILE: bip3d/datasets/transforms/augmentation.py
  class RandomFlip3D (line 12) | class RandomFlip3D(RandomFlip):
    method __init__ (line 59) | def __init__(self,
    method transform (line 88) | def transform(self, input_dict: dict) -> dict:
    method random_flip_data_3d (line 142) | def random_flip_data_3d(self,
    method random_flip_data_2d (line 174) | def random_flip_data_2d(self,
    method _flip_on_direction (line 204) | def _flip_on_direction(self, results: dict) -> None:
    method _transform_lidar2cam (line 231) | def _transform_lidar2cam(self, results: dict) -> None:
    method __repr__ (line 245) | def __repr__(self) -> str:
  class GlobalRotScaleTrans (line 254) | class GlobalRotScaleTrans(BaseTransform):
    method __init__ (line 290) | def __init__(self,
    method transform (line 323) | def transform(self, input_dict: dict) -> dict:
    method _trans_bbox_points (line 351) | def _trans_bbox_points(self, input_dict: dict) -> None:
    method _rot_bbox_points (line 370) | def _rot_bbox_points(self, input_dict: dict) -> None:
    method _scale_bbox_points (line 411) | def _scale_bbox_points(self, input_dict: dict) -> None:
    method _random_scale (line 437) | def _random_scale(self, input_dict: dict) -> None:
    method _transform_lidar2cam (line 451) | def _transform_lidar2cam(self, input_dict: dict) -> None:
    method __repr__ (line 472) | def __repr__(self) -> str:
  class ResizeCropFlipImage (line 483) | class ResizeCropFlipImage(object):
    method __init__ (line 484) | def __init__(self, data_aug_conf, test_mode=False):
    method get_augmentation (line 488) | def get_augmentation(self):
    method __call__ (line 535) | def __call__(self, results):
    method _transform (line 545) | def _transform(self, img, aug_configs):

FILE: bip3d/datasets/transforms/formatting.py
  function to_tensor (line 15) | def to_tensor(
  class Pack3DDetInputs (line 48) | class Pack3DDetInputs(BaseTransform):
    method __init__ (line 64) | def __init__(
    method _remove_prefix (line 84) | def _remove_prefix(self, key: str) -> str:
    method transform (line 89) | def transform(self, results: Union[dict,
    method pack_single_results (line 124) | def pack_single_results(self, results: dict) -> dict:
    method __repr__ (line 292) | def __repr__(self) -> str:

FILE: bip3d/datasets/transforms/loading.py
  class LoadDepthFromFile (line 13) | class LoadDepthFromFile(BaseTransform):
    method __init__ (line 41) | def __init__(self,
    method transform (line 53) | def transform(self, results: dict):
    method __repr__ (line 82) | def __repr__(self):
  class LoadAnnotations3D (line 97) | class LoadAnnotations3D(LoadAnnotations):
    method __init__ (line 180) | def __init__(self,
    method _load_bboxes_3d (line 222) | def _load_bboxes_3d(self, results: dict):
    method _load_bboxes_depth (line 236) | def _load_bboxes_depth(self, results: dict):
    method _load_labels_3d (line 250) | def _load_labels_3d(self, results: dict):
    method _load_attr_labels (line 263) | def _load_attr_labels(self, results: dict):
    method _load_depth_map (line 275) | def _load_depth_map(self, results: dict):
    method _load_masks_3d (line 301) | def _load_masks_3d(self, results: dict):
    method _load_semantic_seg_3d (line 327) | def _load_semantic_seg_3d(self, results: dict):
    method _load_panoptic_3d (line 361) | def _load_panoptic_3d(self, results: dict):
    method _load_visible_instance_masks (line 401) | def _load_visible_instance_masks(self, results: dict):
    method _load_occupancy (line 416) | def _load_occupancy(self, results: dict):
    method _load_visible_occupancy_masks (line 430) | def _load_visible_occupancy_masks(self, results: dict):
    method _load_bboxes (line 445) | def _load_bboxes(self, results: dict):
    method _load_labels (line 460) | def _load_labels(self, results: dict):
    method transform (line 471) | def transform(self, results: dict):
    method __repr__ (line 506) | def __repr__(self):

FILE: bip3d/datasets/transforms/multiview.py
  class MultiViewPipeline (line 14) | class MultiViewPipeline(BaseTransform):
    method __init__ (line 16) | def __init__(
    method transform (line 33) | def transform(self, results: dict):

FILE: bip3d/datasets/transforms/transform.py
  class CategoryGroundingDataPrepare (line 18) | class CategoryGroundingDataPrepare(BaseTransform):
    method __init__ (line 19) | def __init__(
    method transform (line 39) | def transform(self, results):
  class CamIntrisicStandardization (line 127) | class CamIntrisicStandardization(BaseTransform):
    method __init__ (line 128) | def __init__(self, dst_intrinsic, dst_wh):
    method transform (line 143) | def transform(self, results):
  class CustomResize (line 174) | class CustomResize(Resize):
    method _resize_img (line 175) | def _resize_img(self, results: dict, key="img"):
    method transform (line 205) | def transform(self, results: dict):
  class DepthProbLabelGenerator (line 222) | class DepthProbLabelGenerator(BaseTransform):
    method __init__ (line 223) | def __init__(
    method transform (line 237) | def transform(self, input_dict):

FILE: bip3d/datasets/utils.py
  function sample (line 5) | def sample(total_n, sample_n, fix_interval=True):
  function xyzrpy_to_matrix (line 22) | def xyzrpy_to_matrix(input):

FILE: bip3d/eval/indoor_eval.py
  function average_precision (line 10) | def average_precision(recalls, precisions, mode='area'):
  function eval_det_cls (line 58) | def eval_det_cls(pred, gt, iou_thr=None, area_range=None):
  function eval_map_recall (line 194) | def eval_map_recall(pred, gt, ovthresh=None, area_range=None):
  function save_pr_curve (line 233) | def save_pr_curve(recs, precs, metrics):
  function indoor_eval (line 248) | def indoor_eval(gt_annos,

FILE: bip3d/eval/metrics/det_metric.py
  class IndoorDetMetric (line 22) | class IndoorDetMetric(BaseMetric):
    method __init__ (line 37) | def __init__(self,
    method process (line 55) | def process(self, data_batch: dict, data_samples: Sequence[dict]):
    method compute_metrics (line 77) | def compute_metrics(self, results: list, part="Overall", split=True):
    method evaluate (line 112) | def evaluate(self, size: int):
  class Indoor2DMetric (line 237) | class Indoor2DMetric(BaseMetric):
    method __init__ (line 252) | def __init__(self,
    method process (line 260) | def process(self, data_batch: dict, data_samples: Sequence[dict]) -> N...
    method compute_metrics (line 289) | def compute_metrics(self, results: list) -> Dict[str, float]:

FILE: bip3d/eval/metrics/grounding_metric.py
  class GroundingMetric (line 16) | class GroundingMetric(BaseMetric):
    method __init__ (line 37) | def __init__(self,
    method process (line 53) | def process(self, data_batch: dict, data_samples: Sequence[dict]) -> N...
    method ground_eval (line 75) | def ground_eval(self, gt_annos, det_annos, logger=None, part="Overall"):
    method compute_metrics (line 163) | def compute_metrics(self, results: list) -> Dict[str, float]:

FILE: bip3d/grid_mask.py
  class Grid (line 7) | class Grid(object):
    method __init__ (line 8) | def __init__(
    method set_prob (line 20) | def set_prob(self, epoch, max_epoch):
    method __call__ (line 23) | def __call__(self, img, label):
  class GridMask (line 75) | class GridMask(object):
    method __init__ (line 76) | def __init__(
    method set_prob (line 89) | def set_prob(self, epoch, max_epoch):
    method __call__ (line 92) | def __call__(self, x, offset=None):

FILE: bip3d/models/base_target.py
  class BaseTargetWithDenoising (line 8) | class BaseTargetWithDenoising(ABC, nn.Module):
    method __init__ (line 9) | def __init__(self, num_dn=0, num_temp_dn_groups=0):
    method sample (line 16) | def sample(self, cls_pred, box_pred, cls_target, box_target):
    method get_dn_anchors (line 23) | def get_dn_anchors(self, cls_target, box_target, *args, **kwargs):
    method update_dn (line 30) | def update_dn(self, instance_feature, anchor, *args, **kwargs):
    method cache_dn (line 36) | def cache_dn(

FILE: bip3d/models/bbox3d_decoder.py
  function decode_box (line 43) | def decode_box(box, min_size=None, max_size=None):
  class BBox3DDecoder (line 56) | class BBox3DDecoder(BaseModel):
    method __init__ (line 57) | def __init__(
    method init_weights (line 146) | def init_weights(self):
    method forward (line 158) | def forward(
    method loss (line 422) | def loss(self, model_outs, data, text_dict=None):
    method prepare_for_dn_loss (line 554) | def prepare_for_dn_loss(self, model_outs, prefix=""):
    method post_process (line 580) | def post_process(
  class GroundingRefineClsHead (line 614) | class GroundingRefineClsHead(BaseModel):
    method __init__ (line 615) | def __init__(
    method forward (line 651) | def forward(
  class DoF9BoxLoss (line 685) | class DoF9BoxLoss(nn.Module):
    method __init__ (line 686) | def __init__(
    method forward (line 699) | def forward(
  class GroundingBox3DPostProcess (line 733) | class GroundingBox3DPostProcess:
    method __init__ (line 734) | def __init__(
    method __call__ (line 745) | def __call__(
  class DoF9BoxEncoder (line 861) | class DoF9BoxEncoder(nn.Module):
    method __init__ (line 862) | def __init__(
    method forward (line 889) | def forward(self, box_3d: torch.Tensor):
  class SparseBox3DKeyPointsGenerator (line 902) | class SparseBox3DKeyPointsGenerator(nn.Module):
    method __init__ (line 903) | def __init__(
    method forward (line 919) | def forward(
    method anchor_projection (line 977) | def anchor_projection(
    method distance (line 1025) | def distance(anchor):

FILE: bip3d/models/bert.py
  function generate_masks_with_special_tokens_and_transfer_map (line 19) | def generate_masks_with_special_tokens_and_transfer_map(
  class BertModel (line 86) | class BertModel(BaseModel):
    method __init__ (line 110) | def __init__(
    method forward (line 163) | def forward(self, captions: Sequence[str], **kwargs) -> dict:
  class BertEncoder (line 205) | class BertEncoder(nn.Module):
    method __init__ (line 218) | def __init__(
    method forward (line 240) | def forward(self, x) -> dict:

FILE: bip3d/models/data_preprocessors/custom_data_preprocessor.py
  class CustomDet3DDataPreprocessor (line 23) | class CustomDet3DDataPreprocessor(DetDataPreprocessor):
    method __init__ (line 89) | def __init__(
    method forward (line 134) | def forward(self, data: Union[dict, List[dict]], training: bool = False):
    method simple_process (line 160) | def simple_process(self, data: dict, training: bool = False):
    method process_camera_params (line 268) | def process_camera_params(self, data_samples):
    method preprocess_img (line 323) | def preprocess_img(self, _batch_img: Tensor):
    method collate_data (line 341) | def collate_data(self, data: dict):
    method _get_pad_shape (line 428) | def _get_pad_shape(self, data: dict):

FILE: bip3d/models/data_preprocessors/utils.py
  function multiview_img_stack_batch (line 9) | def multiview_img_stack_batch(

FILE: bip3d/models/deformable_aggregation.py
  class DeformableFeatureAggregation (line 29) | class DeformableFeatureAggregation(BaseModule):
    method __init__ (line 30) | def __init__(
    method init_weight (line 115) | def init_weight(self):
    method get_spatial_shape_3D (line 119) | def get_spatial_shape_3D(self, spatial_shape, depth_dim):
    method forward (line 124) | def forward(
    method _get_weights (line 203) | def _get_weights(
    method project_points (line 252) | def project_points(key_points, projection_mat, image_wh=None):
    method feature_sampling (line 273) | def feature_sampling(
    method multi_view_level_fusion (line 305) | def multi_view_level_fusion(

FILE: bip3d/models/feature_enhancer.py
  class TextImageDeformable2DEnhancer (line 23) | class TextImageDeformable2DEnhancer(BaseModel):
    method __init__ (line 24) | def __init__(
    method forward (line 60) | def forward(
    method get_2d_position_embed (line 143) | def get_2d_position_embed(self, feature_maps):

FILE: bip3d/models/instance_bank.py
  function topk (line 11) | def topk(confidence, k, *inputs):
  class InstanceBank (line 24) | class InstanceBank(nn.Module):
    method __init__ (line 25) | def __init__(
    method init_weight (line 73) | def init_weight(self):
    method reset (line 78) | def reset(self):
    method bbox_transform (line 88) | def bbox_transform(self, bbox, matrix):
    method get (line 109) | def get(self, batch_size, metas=None, dn_metas=None):
    method update (line 159) | def update(self, instance_feature, anchor, confidence, num_dn=None):
    method cache (line 192) | def cache(
    method get_instance_id (line 228) | def get_instance_id(self, confidence, anchor=None, threshold=None):
    method update_instance_id (line 248) | def update_instance_id(self, instance_id=None, confidence=None):

FILE: bip3d/models/spatial_enhancer.py
  class DepthFusionSpatialEnhancer (line 13) | class DepthFusionSpatialEnhancer(BaseModel):
    method __init__ (line 14) | def __init__(
    method forward (line 58) | def forward(
    method get_pts (line 104) | def get_pts(self, spatial_shapes, image_wh, projection_mat, device, dt...
    method depth_prob_loss (line 138) | def depth_prob_loss(self, depth_prob, batch_inputs):

FILE: bip3d/models/structure.py
  class BIP3D (line 12) | class BIP3D(BaseDetector):
    method __init__ (line 13) | def __init__(
    method init_weights (line 65) | def init_weights(self):
    method extract_feat (line 78) | def extract_feat(self, batch_inputs_dict, batch_data_samples):
    method extract_text_feature (line 113) | def extract_text_feature(self, batch_inputs_dict):
    method loss (line 121) | def loss(self, batch_inputs, batch_data_samples):
    method predict (line 130) | def predict(self, batch_inputs, batch_data_samples):
    method _forward (line 137) | def _forward(self, batch_inputs, batch_data_samples):

FILE: bip3d/models/target.py
  class Grounding3DTarget (line 24) | class Grounding3DTarget(BaseTargetWithDenoising):
    method __init__ (line 25) | def __init__(
    method encode_reg_target (line 66) | def encode_reg_target(self, box_target, device=None):
    method sample (line 83) | def sample(
    method _cls_cost (line 132) | def _cls_cost(self, cls_pred, token_positive_maps, text_token_mask):
    method _box_cost (line 158) | def _box_cost(self, box_pred, box_target):
    method get_dn_anchors (line 183) | def get_dn_anchors(self, char_positives, box_target, text_dict, label=...

FILE: bip3d/models/utils.py
  function deformable_format (line 6) | def deformable_format(
  function bbox_to_corners (line 54) | def bbox_to_corners(bbox, permutation=False):
  function wasserstein_distance (line 107) | def wasserstein_distance(source, target):
  function permutation_corner_distance (line 126) | def permutation_corner_distance(source, target):
  function center_distance (line 136) | def center_distance(source, target):
  function get_positive_map (line 140) | def get_positive_map(char_positive, text_dict):
  function get_entities (line 182) | def get_entities(text, char_positive, sep_token="[SEP]"):
  function linear_act_ln (line 198) | def linear_act_ln(

FILE: bip3d/ops/deformable_aggregation.py
  class DeformableAggregationFunction (line 9) | class DeformableAggregationFunction(Function):
    method forward (line 12) | def forward(
    method backward (line 45) | def backward(ctx, grad_output):
  class DeformableAggregationWithDepthFunction (line 82) | class DeformableAggregationWithDepthFunction(Function):
    method forward (line 85) | def forward(
    method backward (line 121) | def backward(ctx, grad_output):
  function deformable_aggregation_func (line 164) | def deformable_aggregation_func(
  function feature_maps_format (line 193) | def feature_maps_format(feature_maps, inverse=False):

FILE: bip3d/ops/setup.py
  function make_cuda_ext (line 12) | def make_cuda_ext(

FILE: bip3d/ops/src/deformable_aggregation.cpp
  function deformable_aggregation_forward (line 46) | at::Tensor deformable_aggregation_forward(
  function deformable_aggregation_backward (line 78) | void deformable_aggregation_backward(
  function PYBIND11_MODULE (line 118) | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {

FILE: bip3d/ops/src/deformable_aggregation_with_depth.cpp
  function deformable_aggregation_with_depth_forward (line 48) | at::Tensor deformable_aggregation_with_depth_forward(
  function deformable_aggregation_with_depth_backward (line 82) | void deformable_aggregation_with_depth_backward(
  function PYBIND11_MODULE (line 124) | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {

FILE: bip3d/structures/bbox_3d/base_box3d.py
  class BaseInstance3DBoxes (line 14) | class BaseInstance3DBoxes:
    method __init__ (line 42) | def __init__(
    method shape (line 82) | def shape(self) -> torch.Size:
    method volume (line 87) | def volume(self) -> Tensor:
    method dims (line 92) | def dims(self) -> Tensor:
    method yaw (line 97) | def yaw(self) -> Tensor:
    method height (line 102) | def height(self) -> Tensor:
    method top_height (line 107) | def top_height(self) -> Tensor:
    method bottom_height (line 112) | def bottom_height(self) -> Tensor:
    method center (line 117) | def center(self) -> Tensor:
    method bottom_center (line 136) | def bottom_center(self) -> Tensor:
    method gravity_center (line 141) | def gravity_center(self) -> Tensor:
    method corners (line 150) | def corners(self) -> Tensor:
    method bev (line 155) | def bev(self) -> Tensor:
    method nearest_bev (line 161) | def nearest_bev(self) -> Tensor:
    method in_range_bev (line 180) | def in_range_bev(
    method rotate (line 205) | def rotate(
    method flip (line 228) | def flip(
    method translate (line 248) | def translate(self, trans_vector: Union[Tensor, np.ndarray]) -> None:
    method in_range_3d (line 259) | def in_range_3d(
    method convert_to (line 286) | def convert_to(self,
    method scale (line 309) | def scale(self, scale_factor: float) -> None:
    method limit_yaw (line 318) | def limit_yaw(self, offset: float = 0.5, period: float = np.pi) -> None:
    method nonempty (line 327) | def nonempty(self, threshold: float = 0.0) -> Tensor:
    method __getitem__ (line 348) | def __getitem__(
    method __len__ (line 382) | def __len__(self) -> int:
    method __repr__ (line 386) | def __repr__(self) -> str:
    method cat (line 391) | def cat(cls, boxes_list: Sequence['BaseInstance3DBoxes']
    method numpy (line 413) | def numpy(self) -> np.ndarray:
    method to (line 417) | def to(self, device: Union[str, torch.device], *args,
    method cpu (line 433) | def cpu(self) -> 'BaseInstance3DBoxes':
    method cuda (line 444) | def cuda(self, *args, **kwargs) -> 'BaseInstance3DBoxes':
    method clone (line 455) | def clone(self) -> 'BaseInstance3DBoxes':
    method detach (line 467) | def detach(self) -> 'BaseInstance3DBoxes':
    method device (line 480) | def device(self) -> torch.device:
    method __iter__ (line 484) | def __iter__(self) -> Iterator[Tensor]:
    method height_overlaps (line 493) | def height_overlaps(cls, boxes1: 'BaseInstance3DBoxes',
    method new_box (line 525) | def new_box(

FILE: bip3d/structures/bbox_3d/box_3d_mode.py
  class Box3DMode (line 14) | class Box3DMode(IntEnum):
    method convert (line 67) | def convert(

FILE: bip3d/structures/bbox_3d/coord_3d_mode.py
  class Coord3DMode (line 17) | class Coord3DMode(IntEnum):
    method convert (line 68) | def convert(input: Union[Sequence[float], np.ndarray, Tensor,
    method convert_box (line 131) | def convert_box(
    method convert_point (line 171) | def convert_point(

FILE: bip3d/structures/bbox_3d/euler_box3d.py
  class EulerInstance3DBoxes (line 12) | class EulerInstance3DBoxes(BaseInstance3DBoxes):
    method __init__ (line 24) | def __init__(self, tensor, box_dim=9, origin=(0.5, 0.5, 0.5)):
    method get_corners (line 60) | def get_corners(self, tensor1):
    method overlaps (line 104) | def overlaps(cls, boxes1, boxes2, mode='iou', eps=1e-4):
    method gravity_center (line 138) | def gravity_center(self):
    method corners (line 143) | def corners(self):
    method transform (line 186) | def transform(self, matrix):
    method scale (line 208) | def scale(self, scale_factor: float) -> None:
    method rotate (line 216) | def rotate(self, angle, points=None):
    method flip (line 263) | def flip(self, direction='X'):

FILE: bip3d/structures/bbox_3d/euler_depth_box3d.py
  class EulerDepthInstance3DBoxes (line 10) | class EulerDepthInstance3DBoxes(EulerInstance3DBoxes):
    method __init__ (line 41) | def __init__(self,
    method flip (line 49) | def flip(self, bev_direction='horizontal', points=None):
    method convert_to (line 80) | def convert_to(self, dst, rt_mat=None):
    method points_in_boxes_part (line 100) | def points_in_boxes_part(self, points, boxes_override=None):
    method points_in_boxes_all (line 132) | def points_in_boxes_all(self, points, boxes_override=None):

FILE: bip3d/structures/bbox_3d/utils.py
  function limit_period (line 14) | def limit_period(val: Union[np.ndarray, Tensor],
  function rotation_3d_in_euler (line 33) | def rotation_3d_in_euler(points, angles, return_mat=False, clockwise=Fal...
  function rotation_3d_in_axis (line 93) | def rotation_3d_in_axis(
  function xywhr2xyxyr (line 189) | def xywhr2xyxyr(
  function get_box_type (line 211) | def get_box_type(box_type: str) -> Tuple[type, int]:
  function points_cam2img (line 247) | def points_cam2img(points_3d: Union[Tensor, np.ndarray],
  function batch_points_cam2img (line 293) | def batch_points_cam2img(points_3d, proj_mat, with_depth=False):
  function points_img2cam (line 339) | def points_img2cam(
  function mono_cam_box2vis (line 374) | def mono_cam_box2vis(cam_box):
  function get_proj_mat_by_coord_type (line 420) | def get_proj_mat_by_coord_type(img_meta: dict, coord_type: str) -> Tensor:
  function yaw2local (line 437) | def yaw2local(yaw: Tensor, loc: Tensor) -> Tensor:
  function get_lidar2img (line 459) | def get_lidar2img(cam2img: Tensor, lidar2cam: Tensor) -> Tensor:

FILE: bip3d/structures/ops/box_np_ops.py
  function camera_to_lidar (line 14) | def camera_to_lidar(points, r_rect, velo2cam):
  function box_camera_to_lidar (line 37) | def box_camera_to_lidar(data, r_rect, velo2cam):
  function corners_nd (line 63) | def corners_nd(dims, origin=0.5):
  function center_to_corner_box2d (line 96) | def center_to_corner_box2d(centers, dims, angles=None, origin=0.5):
  function depth_to_points (line 123) | def depth_to_points(depth, trunc_pixel):
  function depth_to_lidar_points (line 147) | def depth_to_lidar_points(depth, trunc_pixel, P2, r_rect, velo2cam):
  function center_to_corner_box3d (line 171) | def center_to_corner_box3d(centers,
  function box2d_to_corner_jit (line 204) | def box2d_to_corner_jit(boxes):
  function corner_to_standup_nd_jit (line 235) | def corner_to_standup_nd_jit(boxes_corner):
  function corner_to_surfaces_3d_jit (line 256) | def corner_to_surfaces_3d_jit(corners):
  function rotation_points_single_angle (line 279) | def rotation_points_single_angle(points, angle, axis=0):
  function box3d_to_bbox (line 311) | def box3d_to_bbox(box3d, P2):
  function corner_to_surfaces_3d (line 333) | def corner_to_surfaces_3d(corners):
  function points_in_rbbox (line 355) | def points_in_rbbox(points, rbbox, z_axis=2, origin=(0.5, 0.5, 0)):
  function minmax_to_corner_2d (line 384) | def minmax_to_corner_2d(minmax_box):
  function create_anchors_3d_range (line 399) | def create_anchors_3d_range(feature_size,
  function center_to_minmax_2d (line 458) | def center_to_minmax_2d(centers, dims, origin=0.5):
  function rbbox2d_to_near_bbox (line 477) | def rbbox2d_to_near_bbox(rbboxes):
  function iou_jit (line 497) | def iou_jit(boxes, query_boxes, mode='iou', eps=0.0):
  function projection_matrix_to_CRT_kitti (line 538) | def projection_matrix_to_CRT_kitti(proj):
  function remove_outside_points (line 565) | def remove_outside_points(points, rect, Trv2c, P2, image_shape):
  function get_frustum (line 596) | def get_frustum(bbox_image, C, near_clip=0.001, far_clip=100):
  function surface_equ_3d (line 629) | def surface_equ_3d(polygon_surfaces):
  function _points_in_convex_polygon_3d_jit (line 654) | def _points_in_convex_polygon_3d_jit(points, polygon_surfaces, normal_ve...
  function points_in_convex_polygon_3d_jit (line 690) | def points_in_convex_polygon_3d_jit(points,
  function points_in_convex_polygon_jit (line 720) | def points_in_convex_polygon_jit(points, polygon, clockwise=False):
  function boxes3d_to_corners3d_lidar (line 765) | def boxes3d_to_corners3d_lidar(boxes3d, bottom_center=True):

FILE: bip3d/structures/ops/iou3d_calculator.py
  class BboxOverlapsNearest3D (line 10) | class BboxOverlapsNearest3D(object):
    method __init__ (line 21) | def __init__(self, coordinate='lidar'):
    method __call__ (line 25) | def __call__(self, bboxes1, bboxes2, mode='iou', is_aligned=False):
    method __repr__ (line 50) | def __repr__(self):
  class BboxOverlaps3D (line 58) | class BboxOverlaps3D(object):
    method __init__ (line 66) | def __init__(self, coordinate):
    method __call__ (line 70) | def __call__(self, bboxes1, bboxes2, mode='iou'):
    method __repr__ (line 92) | def __repr__(self):
  function bbox_overlaps_nearest_3d (line 99) | def bbox_overlaps_nearest_3d(bboxes1,
  function bbox_overlaps_3d (line 150) | def bbox_overlaps_3d(bboxes1, bboxes2, mode='iou', coordinate='camera'):
  class AxisAlignedBboxOverlaps3D (line 182) | class AxisAlignedBboxOverlaps3D(object):
    method __call__ (line 185) | def __call__(self, bboxes1, bboxes2, mode='iou', is_aligned=False):
    method __repr__ (line 206) | def __repr__(self):
  function axis_aligned_bbox_overlaps_3d (line 212) | def axis_aligned_bbox_overlaps_3d(bboxes1,

FILE: bip3d/structures/ops/transforms.py
  function bbox3d_mapping_back (line 5) | def bbox3d_mapping_back(bboxes, scale_factor, flip_horizontal, flip_vert...
  function bbox3d2roi (line 27) | def bbox3d2roi(bbox_list):
  function bbox3d2result (line 51) | def bbox3d2result(bboxes, scores, labels, attrs=None):

FILE: bip3d/structures/points/__init__.py
  function get_points_type (line 10) | def get_points_type(points_type: str) -> type:

FILE: bip3d/structures/points/base_points.py
  class BasePoints (line 14) | class BasePoints:
    method __init__ (line 34) | def __init__(self,
    method coord (line 58) | def coord(self) -> Tensor:
    method coord (line 63) | def coord(self, tensor: Union[Tensor, np.ndarray]) -> None:
    method height (line 79) | def height(self) -> Union[Tensor, None]:
    method height (line 89) | def height(self, tensor: Union[Tensor, np.ndarray]) -> None:
    method color (line 115) | def color(self) -> Union[Tensor, None]:
    method color (line 125) | def color(self, tensor: Union[Tensor, np.ndarray]) -> None:
    method shape (line 154) | def shape(self) -> torch.Size:
    method shuffle (line 158) | def shuffle(self) -> Tensor:
    method rotate (line 168) | def rotate(self,
    method flip (line 206) | def flip(self, bev_direction: str = 'horizontal') -> None:
    method translate (line 215) | def translate(self, trans_vector: Union[Tensor, np.ndarray]) -> None:
    method in_range_3d (line 236) | def in_range_3d(
    method bev (line 263) | def bev(self) -> Tensor:
    method in_range_bev (line 267) | def in_range_bev(
    method convert_to (line 287) | def convert_to(self,
    method scale (line 308) | def scale(self, scale_factor: float) -> None:
    method __getitem__ (line 316) | def __getitem__(
    method __len__ (line 395) | def __len__(self) -> int:
    method __repr__ (line 399) | def __repr__(self) -> str:
    method cat (line 404) | def cat(cls, points_list: Sequence['BasePoints']) -> 'BasePoints':
    method numpy (line 425) | def numpy(self) -> np.ndarray:
    method to (line 429) | def to(self, device: Union[str, torch.device], *args,
    method cpu (line 444) | def cpu(self) -> 'BasePoints':
    method cuda (line 455) | def cuda(self, *args, **kwargs) -> 'BasePoints':
    method clone (line 466) | def clone(self) -> 'BasePoints':
    method detach (line 477) | def detach(self) -> 'BasePoints':
    method device (line 489) | def device(self) -> torch.device:
    method __iter__ (line 493) | def __iter__(self) -> Iterator[Tensor]:
    method new_point (line 501) | def new_point(

FILE: bip3d/structures/points/cam_points.py
  class CameraPoints (line 10) | class CameraPoints(BasePoints):
    method __init__ (line 30) | def __init__(self,
    method flip (line 39) | def flip(self, bev_direction: str = 'horizontal') -> None:
    method bev (line 53) | def bev(self) -> Tensor:
    method convert_to (line 57) | def convert_to(self,

FILE: bip3d/structures/points/depth_points.py
  class DepthPoints (line 10) | class DepthPoints(BasePoints):
    method __init__ (line 30) | def __init__(self,
    method flip (line 39) | def flip(self, bev_direction: str = 'horizontal') -> None:
    method convert_to (line 52) | def convert_to(self,

FILE: bip3d/structures/points/lidar_points.py
  class LiDARPoints (line 10) | class LiDARPoints(BasePoints):
    method __init__ (line 30) | def __init__(self,
    method flip (line 39) | def flip(self, bev_direction: str = 'horizontal') -> None:
    method convert_to (line 52) | def convert_to(self,

FILE: bip3d/utils/array_converter.py
  function array_converter (line 12) | def array_converter(to_torch: bool = True,
  class ArrayConverter (line 204) | class ArrayConverter:
    method __init__ (line 215) | def __init__(self,
    method set_template (line 220) | def set_template(self, array: TemplateArrayType) -> None:
    method convert (line 261) | def convert(
    method recover (line 326) | def recover(

FILE: bip3d/utils/dist_utils.py
  function reduce_mean (line 4) | def reduce_mean(tensor):

FILE: bip3d/utils/line_mesh.py
  function align_vector_to_another (line 24) | def align_vector_to_another(a=np.array([0, 0, 1]), b=np.array([1, 0, 0])):
  function normalized (line 35) | def normalized(a, axis=-1, order=2):
  class LineMesh (line 42) | class LineMesh(object):
    method __init__ (line 44) | def __init__(self, points, lines=None, colors=[0, 1, 0], radius=0.15):
    method lines_from_ordered_points (line 70) | def lines_from_ordered_points(points):
    method create_line_mesh (line 74) | def create_line_mesh(self):
    method add_line (line 110) | def add_line(self, vis):
    method remove_line (line 115) | def remove_line(self, vis):
  function main (line 123) | def main():

FILE: bip3d/utils/typing_config.py
  class Det3DDataElement (line 11) | class Det3DDataElement(BaseDataElement):
    method gt_instances_3d (line 14) | def gt_instances_3d(self) -> InstanceData:
    method gt_instances_3d (line 18) | def gt_instances_3d(self, value: InstanceData) -> None:
    method gt_instances_3d (line 22) | def gt_instances_3d(self) -> None:
    method pred_instances_3d (line 26) | def pred_instances_3d(self) -> InstanceData:
    method pred_instances_3d (line 30) | def pred_instances_3d(self, value: InstanceData) -> None:
    method pred_instances_3d (line 34) | def pred_instances_3d(self) -> None:
  class PointData (line 43) | class PointData(BaseDataElement):
    method __setattr__ (line 77) | def __setattr__(self, name: str, value: Sized) -> None:
    method __getitem__ (line 98) | def __getitem__(self, item: IndexType) -> 'PointData':
    method __len__ (line 185) | def __len__(self) -> int:

FILE: tools/anchor_bbox3d_kmeans.py
  function sample (line 10) | def sample(ids, n):
  function kmeans (line 25) | def kmeans(

FILE: tools/benchmark.py
  function parse_args (line 16) | def parse_args():
  function inference_benchmark (line 72) | def inference_benchmark(args, cfg, distributed, logger):
  function dataloader_benchmark (line 85) | def dataloader_benchmark(args, cfg, distributed, logger):
  function dataset_benchmark (line 97) | def dataset_benchmark(args, cfg, distributed, logger):
  function main (line 108) | def main():

FILE: tools/ckpt_rename.py
  function get_renamed_ckpt (line 4) | def get_renamed_ckpt(file, output="./"):

FILE: tools/test.py
  function wait_before_import_config (line 5) | def wait_before_import_config():
  function wait_after_import_config (line 9) | def wait_after_import_config():
  function parse_args (line 31) | def parse_args():
  function trigger_visualization_hook (line 93) | def trigger_visualization_hook(cfg, args):
  function main (line 122) | def main():

FILE: tools/train.py
  function wait_before_import_config (line 5) | def wait_before_import_config():
  function wait_after_import_config (line 9) | def wait_after_import_config():
  function parse_args (line 33) | def parse_args():
  function main (line 80) | def main():

Download .json

Condensed preview — 87 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (622K chars).

[
  {
    "path": ".gitignore",
    "chars": 1623,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# Distribution / packaging\n.Python\nbuild/\ndev"
  },
  {
    "path": "LICENSE",
    "chars": 1073,
    "preview": "MIT License\n\nCopyright (c) 2024 Horizon Robotics\n\nPermission is hereby granted, free of charge, to any person obtaining "
  },
  {
    "path": "README.md",
    "chars": 10222,
    "preview": "# BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence\n\n<div align=\"center\" class=\"authors\">\n    <a hre"
  },
  {
    "path": "bip3d/__init__.py",
    "chars": 24,
    "preview": "from .registry import *\n"
  },
  {
    "path": "bip3d/converter/extract_occupancy_ann.py",
    "chars": 1739,
    "preview": "import os\nimport shutil\nfrom argparse import ArgumentParser\n\nfrom tqdm import tqdm\n\n\ndef extract_occupancy(dataset, src,"
  },
  {
    "path": "bip3d/converter/generate_image_3rscan.py",
    "chars": 925,
    "preview": "import os\nimport zipfile\nfrom argparse import ArgumentParser\nfrom functools import partial\n\nimport mmengine\n\n\ndef proces"
  },
  {
    "path": "bip3d/converter/generate_image_matterport3d.py",
    "chars": 1528,
    "preview": "import os\nimport zipfile\nfrom argparse import ArgumentParser\nfrom functools import partial\n\nimport mmengine\n\n\ndef proces"
  },
  {
    "path": "bip3d/converter/generate_image_scannet.py",
    "chars": 8513,
    "preview": "# Modified from https://github.com/ScanNet/ScanNet/blob/master/SensReader/python/SensorData.py # noqa\nimport os\nimport s"
  },
  {
    "path": "bip3d/datasets/__init__.py",
    "chars": 178,
    "preview": "from .embodiedscan_det_grounding_dataset import EmbodiedScanDetGroundingDataset\nfrom .transforms import *  # noqa: F401,"
  },
  {
    "path": "bip3d/datasets/embodiedscan_det_grounding_dataset.py",
    "chars": 25713,
    "preview": "import math\nimport pickle\nimport copy\nimport tqdm\nimport os\nimport warnings\nfrom typing import Callable, List, Optional,"
  },
  {
    "path": "bip3d/datasets/transforms/__init__.py",
    "chars": 378,
    "preview": "from .augmentation import (\n    GlobalRotScaleTrans,\n    RandomFlip3D,\n    ResizeCropFlipImage,\n)\nfrom .formatting impor"
  },
  {
    "path": "bip3d/datasets/transforms/augmentation.py",
    "chars": 22841,
    "preview": "from typing import List, Union\n\nimport cv2\nimport numpy as np\nfrom mmcv.transforms import BaseTransform\nfrom mmdet.datas"
  },
  {
    "path": "bip3d/datasets/transforms/formatting.py",
    "chars": 12002,
    "preview": "from typing import List, Sequence, Union\n\nimport mmengine\nimport numpy as np\nimport torch\nfrom mmcv.transforms import Ba"
  },
  {
    "path": "bip3d/datasets/transforms/loading.py",
    "chars": 20913,
    "preview": "from typing import Optional\n\nimport mmcv\nimport mmengine\nimport numpy as np\nfrom mmcv.transforms import BaseTransform\nfr"
  },
  {
    "path": "bip3d/datasets/transforms/multiview.py",
    "chars": 4929,
    "preview": "import copy\nimport random\n\nimport numpy as np\nimport torch\nfrom mmcv.transforms import BaseTransform, Compose\n\nfrom bip3"
  },
  {
    "path": "bip3d/datasets/transforms/transform.py",
    "chars": 10876,
    "preview": "from typing import List, Optional, Tuple, Union\nimport cv2\nimport copy\n\nimport numpy as np\nfrom numpy import random\nimpo"
  },
  {
    "path": "bip3d/datasets/utils.py",
    "chars": 769,
    "preview": "import numpy as np\nfrom scipy.spatial.transform import Rotation\n\n\ndef sample(total_n, sample_n, fix_interval=True):\n    "
  },
  {
    "path": "bip3d/eval/__init__.py",
    "chars": 104,
    "preview": "from .metrics import GroundingMetric, IndoorDetMetric\n\n__all__ = ['IndoorDetMetric', 'GroundingMetric']\n"
  },
  {
    "path": "bip3d/eval/indoor_eval.py",
    "chars": 17396,
    "preview": "# Copyright (c) OpenRobotLab. All rights reserved.\nimport numpy as np\nimport torch\nfrom mmengine.logging import print_lo"
  },
  {
    "path": "bip3d/eval/metrics/__init__.py",
    "chars": 184,
    "preview": "from .det_metric import IndoorDetMetric\nfrom .grounding_metric import GroundingMetric\n# from .occupancy_metric import Oc"
  },
  {
    "path": "bip3d/eval/metrics/det_metric.py",
    "chars": 13422,
    "preview": "# Copyright (c) OpenRobotLab. All rights reserved.\nimport logging\nfrom collections import OrderedDict\nfrom typing import"
  },
  {
    "path": "bip3d/eval/metrics/grounding_metric.py",
    "chars": 8964,
    "preview": "# Copyright (c) OpenRobotLab. All rights reserved.\nimport os\nfrom typing import Dict, List, Optional, Sequence\nimport pi"
  },
  {
    "path": "bip3d/grid_mask.py",
    "chars": 4246,
    "preview": "import torch\nimport torch.nn as nn\nimport numpy as np\nfrom PIL import Image\n\n\nclass Grid(object):\n    def __init__(\n    "
  },
  {
    "path": "bip3d/models/__init__.py",
    "chars": 262,
    "preview": "from .structure import *\nfrom .feature_enhancer import *\nfrom .spatial_enhancer import *\nfrom .bbox3d_decoder import *\nf"
  },
  {
    "path": "bip3d/models/base_target.py",
    "chars": 1554,
    "preview": "from torch import nn\nfrom abc import ABC, abstractmethod\n\n\n__all__ = [\"BaseTargetWithDenoising\"]\n\n\nclass BaseTargetWithD"
  },
  {
    "path": "bip3d/models/bbox3d_decoder.py",
    "chars": 36422,
    "preview": "import math\nfrom typing import List, Optional, Tuple, Union\n\nimport numpy as np\nimport torch\nfrom torch import nn\nfrom t"
  },
  {
    "path": "bip3d/models/bert.py",
    "chars": 9681,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nfrom collections import OrderedDict\nfrom typing import Sequence\n\nimport "
  },
  {
    "path": "bip3d/models/data_preprocessors/__init__.py",
    "chars": 109,
    "preview": "from .custom_data_preprocessor import CustomDet3DDataPreprocessor\n\n__all__ = [\"CustomDet3DDataPreprocessor\"]\n"
  },
  {
    "path": "bip3d/models/data_preprocessors/custom_data_preprocessor.py",
    "chars": 19709,
    "preview": "import math\nfrom numbers import Number\nfrom typing import Dict, List, Optional, Sequence, Tuple, Union\n\nimport numpy as "
  },
  {
    "path": "bip3d/models/data_preprocessors/utils.py",
    "chars": 2846,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nfrom typing import List, Union\n\nimport torch\nimport torch.nn.functional "
  },
  {
    "path": "bip3d/models/deformable_aggregation.py",
    "chars": 11310,
    "preview": "from typing import List, Optional, Tuple\n\nimport numpy as np\nimport torch\nimport torch.nn as nn\nfrom torch.cuda.amp.auto"
  },
  {
    "path": "bip3d/models/feature_enhancer.py",
    "chars": 5416,
    "preview": "import torch\nfrom torch import nn\n\nfrom mmengine.model import BaseModel\nfrom mmdet.models.layers.transformer.utils impor"
  },
  {
    "path": "bip3d/models/instance_bank.py",
    "chars": 9346,
    "preview": "import torch\nfrom torch import nn\nimport torch.nn.functional as F\nimport numpy as np\n\nfrom bip3d.registry import MODELS\n"
  },
  {
    "path": "bip3d/models/spatial_enhancer.py",
    "chars": 5025,
    "preview": "import torch\nfrom torch import nn\n\nfrom mmengine.model import BaseModel\nfrom mmdet.models.layers.transformer.utils impor"
  },
  {
    "path": "bip3d/models/structure.py",
    "chars": 6260,
    "preview": "from torch import nn\n\nfrom mmdet.models.detectors import BaseDetector\nfrom mmdet.models.detectors.deformable_detr import"
  },
  {
    "path": "bip3d/models/target.py",
    "chars": 12291,
    "preview": "import torch\nfrom torch import nn\nimport numpy as np\nimport torch.nn.functional as F\nfrom scipy.optimize import linear_s"
  },
  {
    "path": "bip3d/models/utils.py",
    "chars": 7862,
    "preview": "import torch\nfrom torch import nn\nfrom pytorch3d.transforms import euler_angles_to_matrix\n\n\ndef deformable_format(\n    f"
  },
  {
    "path": "bip3d/ops/__init__.py",
    "chars": 97,
    "preview": "from .deformable_aggregation import (\n    deformable_aggregation_func,\n    feature_maps_format\n)\n"
  },
  {
    "path": "bip3d/ops/deformable_aggregation.py",
    "chars": 7378,
    "preview": "import torch\nfrom torch.autograd.function import Function, once_differentiable\nfrom torch.cuda.amp import custom_fwd, cu"
  },
  {
    "path": "bip3d/ops/gcc.sh",
    "chars": 617,
    "preview": "export CC=/horizon-bucket/robot_lab/users/xuewu.lin/devtoolset-9/root/usr/bin/gcc\nexport CXX=/horizon-bucket/robot_lab/u"
  },
  {
    "path": "bip3d/ops/setup.py",
    "chars": 2009,
    "preview": "import os\n\nimport torch\nfrom setuptools import setup\nfrom torch.utils.cpp_extension import (\n    BuildExtension,\n    Cpp"
  },
  {
    "path": "bip3d/ops/src/deformable_aggregation.cpp",
    "chars": 4083,
    "preview": "#include <torch/extension.h>\n#include <c10/cuda/CUDAGuard.h>\n\nvoid deformable_aggregation(\n  float* output,\n  const floa"
  },
  {
    "path": "bip3d/ops/src/deformable_aggregation_cuda.cu",
    "chars": 10754,
    "preview": "#include <ATen/ATen.h>\n#include <ATen/cuda/CUDAContext.h>\n#include <cuda.h>\n#include <cuda_runtime.h>\n\n#include <THC/THC"
  },
  {
    "path": "bip3d/ops/src/deformable_aggregation_with_depth.cpp",
    "chars": 4359,
    "preview": "#include <torch/extension.h>\n#include <c10/cuda/CUDAGuard.h>\n\nvoid deformable_aggregation_with_depth(\n  float* output,\n "
  },
  {
    "path": "bip3d/ops/src/deformable_aggregation_with_depth_cuda.cu",
    "chars": 16470,
    "preview": "#include <ATen/ATen.h>\n#include <ATen/cuda/CUDAContext.h>\n#include <cuda.h>\n#include <cuda_runtime.h>\n\n#include <THC/THC"
  },
  {
    "path": "bip3d/registry.py",
    "chars": 1045,
    "preview": "from mmengine import DATASETS as MMENGINE_DATASETS\nfrom mmengine import METRICS as MMENGINE_METRICS\nfrom mmengine import"
  },
  {
    "path": "bip3d/structures/__init__.py",
    "chars": 731,
    "preview": "# Copyright (c) OpenRobotLab. All rights reserved.\nfrom .bbox_3d import (BaseInstance3DBoxes, Box3DMode, Coord3DMode,\n  "
  },
  {
    "path": "bip3d/structures/bbox_3d/__init__.py",
    "chars": 878,
    "preview": "# Copyright (c) OpenRobotLab. All rights reserved.\nfrom .base_box3d import BaseInstance3DBoxes\nfrom .box_3d_mode import "
  },
  {
    "path": "bip3d/structures/bbox_3d/base_box3d.py",
    "chars": 20615,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nfrom abc import abstractmethod\nfrom typing import Iterator, Optional, Se"
  },
  {
    "path": "bip3d/structures/bbox_3d/box_3d_mode.py",
    "chars": 9630,
    "preview": "# Copyright (c) OpenRobotLab. All rights reserved.\nfrom enum import IntEnum, unique\nfrom typing import Optional, Sequenc"
  },
  {
    "path": "bip3d/structures/bbox_3d/coord_3d_mode.py",
    "chars": 11027,
    "preview": "# Copyright (c) OpenRobotLab. All rights reserved.\nfrom enum import IntEnum, unique\nfrom typing import Optional, Sequenc"
  },
  {
    "path": "bip3d/structures/bbox_3d/euler_box3d.py",
    "chars": 11166,
    "preview": "# Copyright (c) OpenRobotLab. All rights reserved.\nimport numpy as np\nimport torch\nfrom pytorch3d.ops import box3d_overl"
  },
  {
    "path": "bip3d/structures/bbox_3d/euler_depth_box3d.py",
    "chars": 6365,
    "preview": "# Copyright (c) OpenRobotLab. All rights reserved.\nimport numpy as np\nimport torch\nfrom mmcv.ops import points_in_boxes_"
  },
  {
    "path": "bip3d/structures/bbox_3d/utils.py",
    "chars": 17178,
    "preview": "# Copyright (c) OpenRobotLab. All rights reserved.\nfrom logging import warning\nfrom typing import Tuple, Union\n\nimport n"
  },
  {
    "path": "bip3d/structures/ops/__init__.py",
    "chars": 2210,
    "preview": "# Copyright (c) OpenRobotLab. All rights reserved.\n# yapf:disable\nfrom .box_np_ops import (box2d_to_corner_jit, box3d_to"
  },
  {
    "path": "bip3d/structures/ops/box_np_ops.py",
    "chars": 31639,
    "preview": "# Copyright (c) OpenRobotLab. All rights reserved.\n# TODO: clean the functions in this file and move the APIs into box b"
  },
  {
    "path": "bip3d/structures/ops/iou3d_calculator.py",
    "chars": 12895,
    "preview": "# Copyright (c) OpenRobotLab. All rights reserved.\nimport torch\nfrom mmdet.structures.bbox import bbox_overlaps\n\nfrom bi"
  },
  {
    "path": "bip3d/structures/ops/transforms.py",
    "chars": 2445,
    "preview": "# Copyright (c) OpenRobotLab. All rights reserved.\nimport torch\n\n\ndef bbox3d_mapping_back(bboxes, scale_factor, flip_hor"
  },
  {
    "path": "bip3d/structures/points/__init__.py",
    "chars": 995,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nfrom .base_points import BasePoints\nfrom .cam_points import CameraPoints"
  },
  {
    "path": "bip3d/structures/points/base_points.py",
    "chars": 20370,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nimport warnings\nfrom abc import abstractmethod\nfrom typing import Iterat"
  },
  {
    "path": "bip3d/structures/points/cam_points.py",
    "chars": 3210,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nfrom typing import Optional, Sequence, Union\n\nimport numpy as np\nfrom to"
  },
  {
    "path": "bip3d/structures/points/depth_points.py",
    "chars": 3071,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nfrom typing import Optional, Sequence, Union\n\nimport numpy as np\nfrom to"
  },
  {
    "path": "bip3d/structures/points/lidar_points.py",
    "chars": 3071,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nfrom typing import Optional, Sequence, Union\n\nimport numpy as np\nfrom to"
  },
  {
    "path": "bip3d/utils/__init__.py",
    "chars": 238,
    "preview": "from .array_converter import ArrayConverter, array_converter\nfrom .typing_config import ConfigType, OptConfigType, OptMu"
  },
  {
    "path": "bip3d/utils/array_converter.py",
    "chars": 14263,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nimport functools\nfrom inspect import getfullargspec\nfrom typing import C"
  },
  {
    "path": "bip3d/utils/default_color_map.py",
    "chars": 8687,
    "preview": "DEFAULT_COLOR_MAP = {\n    'floor': [255, 193, 193],\n    'wall': [137, 54, 74],\n    'chair': [153, 69, 1],\n    'cabinet':"
  },
  {
    "path": "bip3d/utils/dist_utils.py",
    "chars": 322,
    "preview": "import torch.distributed as dist\n\n\ndef reduce_mean(tensor):\n    \"\"\"\"Obtain the mean of tensor on different GPUs.\"\"\"\n    "
  },
  {
    "path": "bip3d/utils/line_mesh.py",
    "chars": 5690,
    "preview": "\"\"\"Adapted from https://github.com/isl-\norg/Open3D/pull/738#issuecomment-564785941.\n\nModule which creates mesh lines fro"
  },
  {
    "path": "bip3d/utils/typing_config.py",
    "chars": 8263,
    "preview": "from collections.abc import Sized\nfrom typing import Dict, List, Optional, Tuple, Union\n\nimport numpy as np\nimport torch"
  },
  {
    "path": "configs/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "configs/bip3d_det.py",
    "chars": 12662,
    "preview": "_base_ = [\"./default_runtime.py\"]\n\nimport os\nfrom bip3d.datasets.embodiedscan_det_grounding_dataset import (\n    class_n"
  },
  {
    "path": "configs/bip3d_det_grounding.py",
    "chars": 2069,
    "preview": "_base_ = [\"./bip3d_grounding.py\"]\nfrom mmengine import Config\nimport os\ndet_config = Config.fromfile(\"configs/bip3d_det."
  },
  {
    "path": "configs/bip3d_det_rgb.py",
    "chars": 194,
    "preview": "_base_ = [\"./bip3d_det.py\"]\n\nmodel = dict(\n    backbone_3d=None,\n    neck_3d=None,\n    spatial_enhancer=dict(with_featur"
  },
  {
    "path": "configs/bip3d_grounding.py",
    "chars": 14521,
    "preview": "_base_ = [\"./default_runtime.py\"]\n\nimport os\nfrom bip3d.datasets.embodiedscan_det_grounding_dataset import (\n    class_n"
  },
  {
    "path": "configs/bip3d_grounding_rgb.py",
    "chars": 238,
    "preview": "_base_ = [\"./bip3d_grounding.py\"]\n\nmodel = dict(\n    backbone_3d=None,\n    neck_3d=None,\n    spatial_enhancer=dict(with_"
  },
  {
    "path": "configs/default_runtime.py",
    "chars": 750,
    "preview": "default_scope = \"bip3d\"\n\ndefault_hooks = dict(\n    timer=dict(type=\"IterTimerHook\"),\n    logger=dict(\n        type=\"Logg"
  },
  {
    "path": "docs/quick_start.md",
    "chars": 3526,
    "preview": "# Quick Start\n\n### Set up python environment\n```bash\nvirtualenv mm_bip3d --python=python3.8\nsource mm_bip3d/bin/activate"
  },
  {
    "path": "engine.sh",
    "chars": 457,
    "preview": "export PYTHONPATH=./:$PYTHONPATH\nCONFIG=configs/bip3d_det.py\nif ! [[ -z $1 ]]; then\n    echo $1\n    CONFIG=$1\nfi\nnvcc -V"
  },
  {
    "path": "requirements.txt",
    "chars": 323,
    "preview": "einops==0.8.0\nhuggingface-hub==0.26.5\nmmcv==2.1.0\nmmdet==3.3.0\nmmengine==0.10.4\nninja==1.11.1.3\nnumpy==1.26.4\nopencv-pyt"
  },
  {
    "path": "test.sh",
    "chars": 448,
    "preview": "export PYTHONPATH=./:$PYTHONPATH\nCONFIG=$1\nCKPT=$2\n\nnvcc -V\nwhich nvcc\n\ngpus=(${CUDA_VISIBLE_DEVICES//,/ })\ngpu_num=${#g"
  },
  {
    "path": "tools/anchor_bbox3d_kmeans.py",
    "chars": 2016,
    "preview": "import torch\nimport pickle\nimport tqdm\nfrom sklearn.cluster import KMeans\nimport numpy as np\n\nfrom bip3d.structures.bbox"
  },
  {
    "path": "tools/benchmark.py",
    "chars": 4269,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nimport argparse\nimport os\n\nfrom mmengine import MMLogger\nfrom mmengine.c"
  },
  {
    "path": "tools/ckpt_rename.py",
    "chars": 3834,
    "preview": "import os\nimport torch\n\ndef get_renamed_ckpt(file, output=\"./\"):\n    ckpt_rename = dict()\n    ckpt = torch.load(file)\n  "
  },
  {
    "path": "tools/dist_test.sh",
    "chars": 273,
    "preview": "#!/usr/bin/env bash\n\nCONFIG=$1\nCHECKPOINT=$2\nGPUS=$3\nPORT=${PORT:-11000}\n\nPYTHONPATH=\"$(dirname $0)/..\":$PYTHONPATH \\\npy"
  },
  {
    "path": "tools/dist_train.sh",
    "chars": 248,
    "preview": "#!/usr/bin/env bash\n\nCONFIG=$1\nGPUS=$2\nPORT=${PORT:-12050}\n\nPYTHONPATH=\"$(dirname $0)/..\":$PYTHONPATH \\\npython3 -m torch"
  },
  {
    "path": "tools/test.py",
    "chars": 6468,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nimport time\nimport os\n\ndef wait_before_import_config():\n    t = int(os.e"
  },
  {
    "path": "tools/train.py",
    "chars": 5624,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nimport time\nimport os\n\ndef wait_before_import_config():\n    t = int(os.e"
  }
]

About this extraction

This page contains the full source code of the HorizonRobotics/BIP3D GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 87 files (582.4 KB), approximately 148.5k tokens, and a symbol index with 451 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo