Repository: NVlabs/PoseCNN-PyTorch
Branch: main
Commit: f7d28f2abd38
Files: 121
Total size: 1.2 MB
Directory structure:
gitextract_vctsyxbk/
├── .gitignore
├── .gitmodules
├── LICENSE.md
├── README.md
├── build.sh
├── experiments/
│ ├── cfgs/
│ │ ├── dex_ycb.yml
│ │ ├── ycb_object.yml
│ │ ├── ycb_object_detection.yml
│ │ ├── ycb_object_self_supervision.yml
│ │ └── ycb_video.yml
│ └── scripts/
│ ├── demo.sh
│ ├── dex_ycb_test_s0.sh
│ ├── dex_ycb_test_s1.sh
│ ├── dex_ycb_test_s2.sh
│ ├── dex_ycb_test_s3.sh
│ ├── dex_ycb_train_s0.sh
│ ├── dex_ycb_train_s1.sh
│ ├── dex_ycb_train_s2.sh
│ ├── dex_ycb_train_s3.sh
│ ├── ros_ycb_object_test.sh
│ ├── ros_ycb_object_test_detection.sh
│ ├── ycb_object_test.sh
│ ├── ycb_object_train.sh
│ ├── ycb_object_train_detection.sh
│ ├── ycb_object_train_self_supervision.sh
│ ├── ycb_video_test.sh
│ └── ycb_video_train.sh
├── lib/
│ ├── datasets/
│ │ ├── __init__.py
│ │ ├── background.py
│ │ ├── dex_ycb.py
│ │ ├── factory.py
│ │ ├── imdb.py
│ │ ├── ycb_object.py
│ │ ├── ycb_self_supervision.py
│ │ └── ycb_video.py
│ ├── fcn/
│ │ ├── __init__.py
│ │ ├── config.py
│ │ ├── render_utils.py
│ │ ├── test_common.py
│ │ ├── test_dataset.py
│ │ ├── test_imageset.py
│ │ └── train.py
│ ├── layers/
│ │ ├── ROIAlign_cuda.cu
│ │ ├── __init__.py
│ │ ├── backproject_kernel.cu
│ │ ├── hard_label.py
│ │ ├── hard_label_kernel.cu
│ │ ├── hough_voting.py
│ │ ├── hough_voting_kernel.cu
│ │ ├── point_matching_loss.py
│ │ ├── point_matching_loss_kernel.cu
│ │ ├── pose_target_layer.py
│ │ ├── posecnn_layers.cpp
│ │ ├── roi_align.py
│ │ ├── roi_pooling.py
│ │ ├── roi_pooling_kernel.cu
│ │ ├── roi_target_layer.py
│ │ ├── sdf_matching_loss.py
│ │ ├── sdf_matching_loss_kernel.cu
│ │ └── setup.py
│ ├── networks/
│ │ ├── PoseCNN.py
│ │ └── __init__.py
│ ├── sdf/
│ │ ├── __init__.py
│ │ ├── _init_paths.py
│ │ ├── multi_sdf_optimizer.py
│ │ ├── sdf_optimizer.py
│ │ ├── sdf_utils.py
│ │ └── test_sdf_optimizer.py
│ └── utils/
│ ├── __init__.py
│ ├── bbox.pyx
│ ├── bbox_transform.py
│ ├── blob.py
│ ├── nms.py
│ ├── pose_error.py
│ ├── se3.py
│ ├── segmentation_evaluation.py
│ └── setup.py
├── requirement.txt
├── ros/
│ ├── _init_paths.py
│ ├── collect_images_realsense.py
│ ├── posecnn.rviz
│ └── test_images.py
├── tools/
│ ├── _init_paths.py
│ ├── test_images.py
│ ├── test_net.py
│ └── train_net.py
└── ycb_render/
├── CMakeLists.txt
├── __init__.py
├── cpp/
│ ├── query_devices.cpp
│ ├── test_device.cpp
│ └── ycb_renderer.cpp
├── get_available_devices.py
├── glad/
│ ├── EGL/
│ │ └── eglplatform.h
│ ├── KHR/
│ │ └── khrplatform.h
│ ├── egl.c
│ ├── gl.c
│ ├── glad/
│ │ ├── egl.h
│ │ ├── gl.h
│ │ └── glx.h
│ ├── glx_dyn.c
│ └── linmath.h
├── glutils/
│ ├── __init__.py
│ ├── _trackball.py
│ ├── glcontext.py
│ ├── glrenderer.py
│ ├── meshutil.py
│ ├── trackball.py
│ └── utils.py
├── setup.py
├── shaders/
│ ├── frag.shader
│ ├── frag_blinnphong.shader
│ ├── frag_mat.shader
│ ├── frag_simple.shader
│ ├── frag_textureless.shader
│ ├── vert.shader
│ ├── vert_blinnphong.shader
│ ├── vert_mat.shader
│ ├── vert_simple.shader
│ └── vert_textureless.shader
├── visualize_sim.py
└── ycb_renderer.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
*.mex*
*.pyc
*.tgz
*.so
*.o
output*
lib/synthesize/build/*
lib/utils/bbox.c
data/
data_self/
docker/
ngc/
.idea/
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
.python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# celery beat schedule file
celerybeat-schedule
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
.idea/
.
*.png
results/
================================================
FILE: .gitmodules
================================================
[submodule "ycb_render/pybind11"]
path = ycb_render/pybind11
url = https://github.com/pybind/pybind11.git
================================================
FILE: LICENSE.md
================================================
# NVIDIA Source Code License for PoseCNN-PyTorch: A PyTorch Implementation of the PoseCNN Framework for 6D Object Pose Estimation
## 1. Definitions
“Licensor” means any person or entity that distributes its Work.
“Software” means the original work of authorship made available under this License.
“Work” means the Software and any additions to or derivative works of the Software that are made available under this License.
“Nvidia Processors” means any central processing unit (CPU), graphics processing unit (GPU), field-programmable gate array (FPGA), application-specific integrated circuit (ASIC) or any combination thereof designed, made, sold, or provided by Nvidia or its affiliates.
The terms “reproduce,” “reproduction,” “derivative works,” and “distribution” have the meaning as provided under U.S. copyright law; provided, however, that for the purposes of this License, derivative works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work.
Works, including the Software, are “made available” under this License by including in or with the Work either (a) a copyright notice referencing the applicability of this License to the Work, or (b) a copy of this License.
## 2. License Grants
### 2.1 Copyright Grant.
Subject to the terms and conditions of this License, each Licensor grants to you a perpetual, worldwide, non-exclusive, royalty-free, copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense and distribute its Work and any resulting derivative works in any form.
## 3. Limitations
### 3.1 Redistribution.
You may reproduce or distribute the Work only if (a) you do so under this License, (b) you include a complete copy of this License with your distribution, and (c) you retain without modification any copyright, patent, trademark, or attribution notices that are present in the Work.
### 3.2 Derivative Works.
You may specify that additional or different terms apply to the use, reproduction, and distribution of your derivative works of the Work (“Your Terms”) only if (a) Your Terms provide that the use limitation in Section 3.3 applies to your derivative works, and (b) you identify the specific derivative works that are subject to Your Terms. Notwithstanding Your Terms, this License (including the redistribution requirements in Section 3.1) will continue to apply to the Work itself.
### 3.3 Use Limitation.
The Work and any derivative works thereof only may be used or intended for use non-commercially. The Work or derivative works thereof may be used or intended for use by Nvidia or its affiliates commercially or non-commercially. As used herein, “non-commercially” means for research or evaluation purposes only.
### 3.4 Patent Claims.
If you bring or threaten to bring a patent claim against any Licensor (including any claim, cross-claim or counterclaim in a lawsuit) to enforce any patents that you allege are infringed by any Work, then your rights under this License from such Licensor (including the grants in Sections 2.1 and 2.2) will terminate immediately.
### 3.5 Trademarks.
This License does not grant any rights to use any Licensor’s or its affiliates’ names, logos, or trademarks, except as necessary to reproduce the notices described in this License.
### 3.6 Termination.
If you violate any term of this License, then your rights under this License (including the grants in Sections 2.1 and 2.2) will terminate immediately.
## 4. Disclaimer of Warranty.
THE WORK IS PROVIDED “AS IS” WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF M ERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON-INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER THIS LICENSE.
## 5. Limitation of Liability.
EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF OR RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK (INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION, LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY OTHER COMM ERCIAL DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
================================================
FILE: README.md
================================================
# PoseCNN-PyTorch: A PyTorch Implementation of the PoseCNN Framework for 6D Object Pose Estimation
### Introduction
We implement PoseCNN in PyTorch in this project.
PoseCNN is an end-to-end Convolutional Neural Network for 6D object pose estimation. PoseCNN estimates the 3D translation of an object by localizing its center in the image and predicting its distance from the camera. The 3D rotation of the object is estimated by regressing to a quaternion representation. [arXiv](https://arxiv.org/abs/1711.00199), [Project](https://rse-lab.cs.washington.edu/projects/posecnn/)
Rotation regression in PoseCNN cannot handle symmetric objects very well. Check [PoseRBPF](https://github.com/NVlabs/PoseRBPF) for a better solution for symmetric objects.
The code also supports pose refinement by matching segmented 3D point cloud of an object to its SDF.

### License
PoseCNN-PyTorch is released under the NVIDIA Source Code License (refer to the LICENSE file for details).
### Citation
If you find the package is useful in your research, please consider citing:
@inproceedings{xiang2018posecnn,
Author = {Yu Xiang and Tanner Schmidt and Venkatraman Narayanan and Dieter Fox},
Title = {{PoseCNN}: A Convolutional Neural Network for {6D} Object Pose Estimation in Cluttered Scenes},
booktitle = {Robotics: Science and Systems (RSS)},
Year = {2018}
}
### Required environment
- Ubuntu 16.04 or above
- PyTorch 0.4.1 or above
- CUDA 9.1 or above
### Installation
Use python3. If ROS is needed, compile with python2.
1. Install [PyTorch](https://pytorch.org/)
2. Install Eigen from the Github source code [here](https://github.com/eigenteam/eigen-git-mirror)
3. Install Sophus from the Github source code [here](https://github.com/yuxng/Sophus)
4. Install python packages
```Shell
pip install -r requirement.txt
```
5. Initialize the submodules in ycb_render
```Shell
git submodule update --init --recursive
```
6. Compile the new layers under $ROOT/lib/layers we introduce in PoseCNN
```Shell
cd $ROOT/lib/layers
sudo python setup.py install
```
7. Compile cython components
```Shell
cd $ROOT/lib/utils
python setup.py build_ext --inplace
```
8. Compile the ycb_render in $ROOT/ycb_render
```Shell
cd $ROOT/ycb_render
sudo python setup.py develop
```
### Download
- 3D models of YCB Objects we used [here](https://drive.google.com/file/d/1PTNmhd-eSq0fwSPv0nvQN8h_scR1v-UJ/view?usp=sharing) (3G). Save under $ROOT/data or use a symbol link.
- Our pre-trained checkpoints [here](https://drive.google.com/file/d/1-ECAkkTRfa1jJ9YBTzf04wxCGw6-m5d4/view?usp=sharing) (4G). Save under $ROOT/data or use a symbol link.
- Our real-world images with pose annotations for 20 YCB objects collected via robot interation [here](https://drive.google.com/file/d/1cQH_dnDzyrI0MWNx8st4lht_q0F6cUrE/view?usp=sharing) (53G). Check our ICRA 2020 [paper](https://arxiv.org/abs/1909.10159) for details.
### Running the demo
1. Download 3D models and our pre-trained checkpoints first.
2. run the following script
```Shell
./experiments/scripts/demo.sh
```

### Training your own models with synthetic data for YCB objects
1. Download background images, and save to $ROOT/data or use symbol links.
- Our own images [here](https://drive.google.com/file/d/1Q5VTKHEEejT2lAKwefG00eWcrnNnpieC/view?usp=sharing) (7G)
- COCO 2014 images [here](https://cocodataset.org/#download)
- Or use your own background images
2. Download pretrained VGG16 weights: [here](https://drive.google.com/file/d/1tTd64s1zNnjONlXvTFDZAf4E68Pupc_S/view?usp=sharing) (528M). Put the weight file to $ROOT/data/checkpoints. If our pre-trained models are already downloaded, the VGG16 checkpoint should be in $ROOT/data/checkpoints already.
3. Training and testing for 20 YCB objects with synthetic data. Modify the configuration file for training on a subset of these objects.
```Shell
cd $ROOT
# multi-gpu training, use 1 GPU or 2 GPUs since batch size is set to 2
./experiments/scripts/ycb_object_train.sh
# testing on synthetic data, $GPU_ID can be 0, 1, etc.
./experiments/scripts/ycb_object_test.sh $GPU_ID
```
### Training and testing on the YCB-Video dataset
1. Download the YCB-Video dataset from [here](https://rse-lab.cs.washington.edu/projects/posecnn/).
2. Create a symlink for the YCB-Video dataset
```Shell
cd $ROOT/data/YCB_Video
ln -s $ycb_data data
```
3. Training and testing on the YCB-Video dataset
```Shell
cd $ROOT
# multi-gpu training, use 1 GPU or 2 GPUs since batch size is set to 2
./experiments/scripts/ycb_video_train.sh
# testing, $GPU_ID can be 0, 1, etc.
./experiments/scripts/ycb_video_test.sh $GPU_ID
```
### Training and testing on the DexYCB dataset
1. Download the DexYCB dataset from [here](https://dex-ycb.github.io/).
2. Create a symlink for the DexYCB dataset
```Shell
cd $ROOT/data/DEX_YCB
ln -s $dex_ycb_data data
```
3. Training and testing on the DexYCB dataset
```Shell
cd $ROOT
# multi-gpu training for different splits, use 1 GPU or 2 GPUs since batch size is set to 2
./experiments/scripts/dex_ycb_train_s0.sh
./experiments/scripts/dex_ycb_train_s1.sh
./experiments/scripts/dex_ycb_train_s2.sh
./experiments/scripts/dex_ycb_train_s3.sh
# testing, $GPU_ID can be 0, 1, etc.
# our trained models are in checkpoints.zip
./experiments/scripts/dex_ycb_test_s0.sh $GPU_ID $EPOCH
./experiments/scripts/dex_ycb_test_s1.sh $GPU_ID $EPOCH
./experiments/scripts/dex_ycb_test_s2.sh $GPU_ID $EPOCH
./experiments/scripts/dex_ycb_test_s3.sh $GPU_ID $EPOCH
```
### Running with ROS on a Realsense Camera for real-world pose estimation
- Python2 is needed for ROS.
- Make sure our pretrained checkpoints are downloaded.
```Shell
# start realsense
roslaunch realsense2_camera rs_aligned_depth.launch tf_prefix:=measured/camera
# start rviz
rosrun rviz rviz -d ./ros/posecnn.rviz
# run posecnn for detection only (20 objects), $GPU_ID can be 0, 1, etc.
./experiments/scripts/ros_ycb_object_test_detection.sh $GPU_ID
# run full posecnn (20 objects), $GPU_ID can be 0, 1, etc.
./experiments/scripts/ros_ycb_object_test.sh $GPU_ID
```
Our example:

================================================
FILE: build.sh
================================================
cd lib/layers/;
python3 setup.py build develop;
cd ../utils;
python3 setup.py build_ext --inplace;
cd ../../ycb_render;
python3 setup.py develop
================================================
FILE: experiments/cfgs/dex_ycb.yml
================================================
EXP_DIR: dex_ycb
INPUT: COLOR
TRAIN:
TRAINABLE: True
WEIGHT_DECAY: 0.0001
LEARNING_RATE: 0.001
MILESTONES: !!python/tuple [12]
MOMENTUM: 0.9
BETA: 0.999
GAMMA: 0.1
SCALES_BASE: !!python/tuple [1.0]
IMS_PER_BATCH: 2
MAX_ITERS_PER_EPOCH: 20000
NUM_UNITS: 64
HARD_LABEL_THRESHOLD: 0.9
HARD_LABEL_SAMPLING: 0.0
HARD_ANGLE: 5.0
HOUGH_LABEL_THRESHOLD: 100
HOUGH_VOTING_THRESHOLD: 10
HOUGH_SKIP_PIXELS: 10
FG_THRESH: 0.5
FG_THRESH_POSE: 0.5
CLASSES: !!python/tuple [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21] # no large clamp
SNAPSHOT_INFIX: dex_ycb
SNAPSHOT_EPOCHS: 1
SNAPSHOT_PREFIX: vgg16
USE_FLIPPED: False
CHROMATIC: True
ADD_NOISE: True
VISUALIZE: False
VERTEX_REG: True
POSE_REG: True
FREEZE_LAYERS: False
VERTEX_W: 10.0
VERTEX_W_INSIDE: 10.0
# synthetic data
SYNTHESIZE: False
SYN_RATIO: 5
SYN_BACKGROUND_SPECIFIC: False
SYN_BACKGROUND_SUBTRACT_MEAN: True
SYN_SAMPLE_OBJECT: True
SYN_SAMPLE_POSE: False
SYN_MIN_OBJECT: 5
SYN_MAX_OBJECT: 8
SYN_TNEAR: 0.5
SYN_TFAR: 1.6
SYN_BOUND: 0.15
SYN_STD_ROTATION: 15
SYN_STD_TRANSLATION: 0.05
TEST:
SINGLE_FRAME: True
HOUGH_LABEL_THRESHOLD: 200
HOUGH_VOTING_THRESHOLD: 10
NUM_SDF_ITERATIONS_TRACKING: 50
SDF_TRANSLATION_REG: 1000.0
SDF_ROTATION_REG: 10.0
IMS_PER_BATCH: 1
HOUGH_SKIP_PIXELS: 10
DET_THRESHOLD: 0.1
SCALES_BASE: !!python/tuple [1.0]
VISUALIZE: False
SYNTHESIZE: False
POSE_REFINE: True
ROS_CAMERA: camera
CLASSES: !!python/tuple [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21] # no large clamp
================================================
FILE: experiments/cfgs/ycb_object.yml
================================================
EXP_DIR: ycb_object
INPUT: COLOR
TRAIN:
TRAINABLE: True
WEIGHT_DECAY: 0.0001
LEARNING_RATE: 0.001
MILESTONES: !!python/tuple [3]
MOMENTUM: 0.9
BETA: 0.999
GAMMA: 0.1
SCALES_BASE: !!python/tuple [1.0]
IMS_PER_BATCH: 2
NUM_UNITS: 64
HARD_LABEL_THRESHOLD: 0.9
HARD_LABEL_SAMPLING: 0.0
HARD_ANGLE: 5.0
HOUGH_LABEL_THRESHOLD: 100
HOUGH_VOTING_THRESHOLD: 10
HOUGH_SKIP_PIXELS: 10
FG_THRESH: 0.5
FG_THRESH_POSE: 0.5
CLASSES: !!python/tuple [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21] # no large clamp
SYMMETRY: !!python/tuple [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1]
SNAPSHOT_INFIX: ycb_object
SNAPSHOT_EPOCHS: 1
SNAPSHOT_PREFIX: vgg16
USE_FLIPPED: False
CHROMATIC: True
ADD_NOISE: True
VISUALIZE: False
VERTEX_REG: True
POSE_REG: True
SLIM: False
# synthetic data
SYNTHESIZE: True
SYNNUM: 40000
SYN_RATIO: 5
SYN_BACKGROUND_SPECIFIC: True
SYN_BACKGROUND_SUBTRACT_MEAN: True
SYN_SAMPLE_OBJECT: True
SYN_SAMPLE_POSE: False
SYN_MIN_OBJECT: 5
SYN_MAX_OBJECT: 8
SYN_TNEAR: 0.5
SYN_TFAR: 1.6
SYN_BOUND: 0.3
SYN_STD_ROTATION: 15
SYN_STD_TRANSLATION: 0.05
TEST:
SINGLE_FRAME: True
HOUGH_LABEL_THRESHOLD: 400
HOUGH_VOTING_THRESHOLD: 10
NUM_SDF_ITERATIONS_TRACKING: 50
SDF_TRANSLATION_REG: 1000.0
SDF_ROTATION_REG: 10.0
IMS_PER_BATCH: 1
HOUGH_SKIP_PIXELS: 10
DET_THRESHOLD: 0.2
SCALES_BASE: !!python/tuple [1.0]
VISUALIZE: True
SYNTHESIZE: True
POSE_REFINE: True
ROS_CAMERA: D435
CLASSES: !!python/tuple [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21] # no large clamp
SYMMETRY: !!python/tuple [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1]
================================================
FILE: experiments/cfgs/ycb_object_detection.yml
================================================
EXP_DIR: ycb_object
INPUT: COLOR
TRAIN:
TRAINABLE: True
WEIGHT_DECAY: 0.0001
LEARNING_RATE: 0.001
MILESTONES: !!python/tuple [3]
MOMENTUM: 0.9
BETA: 0.999
GAMMA: 0.1
SCALES_BASE: !!python/tuple [1.0]
IMS_PER_BATCH: 2
NUM_UNITS: 64
HARD_LABEL_THRESHOLD: 0.9
HARD_LABEL_SAMPLING: 0.0
HARD_ANGLE: 5.0
HOUGH_LABEL_THRESHOLD: 100
HOUGH_VOTING_THRESHOLD: 10
HOUGH_SKIP_PIXELS: 10
FG_THRESH: 0.5
FG_THRESH_POSE: 0.5
CLASSES: !!python/tuple [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21] # no large clamp
SYMMETRY: !!python/tuple [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1]
SNAPSHOT_INFIX: ycb_object_detection
SNAPSHOT_EPOCHS: 1
SNAPSHOT_PREFIX: vgg16
USE_FLIPPED: False
CHROMATIC: True
ADD_NOISE: True
VISUALIZE: False
VERTEX_REG: True
POSE_REG: False # no rotation regression
SLIM: True
# synthetic data
SYNTHESIZE: True
SYNNUM: 40000
SYN_RATIO: 5
SYN_BACKGROUND_SPECIFIC: True
SYN_BACKGROUND_SUBTRACT_MEAN: True
SYN_SAMPLE_OBJECT: True
SYN_SAMPLE_POSE: False
SYN_MIN_OBJECT: 5
SYN_MAX_OBJECT: 8
SYN_TNEAR: 0.5
SYN_TFAR: 1.6
SYN_BOUND: 0.3
SYN_STD_ROTATION: 15
SYN_STD_TRANSLATION: 0.05
TEST:
SINGLE_FRAME: True
HOUGH_LABEL_THRESHOLD: 400
HOUGH_VOTING_THRESHOLD: 10
IMS_PER_BATCH: 1
HOUGH_SKIP_PIXELS: 10
DET_THRESHOLD: 0.2
SCALES_BASE: !!python/tuple [1.0]
VISUALIZE: False
SYNTHESIZE: True
POSE_REFINE: False
ROS_CAMERA: D435
CLASSES: !!python/tuple [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21] # no large clamp
SYMMETRY: !!python/tuple [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1]
================================================
FILE: experiments/cfgs/ycb_object_self_supervision.yml
================================================
EXP_DIR: ycb_self_supervision
INPUT: COLOR
TRAIN:
TRAINABLE: True
WEIGHT_DECAY: 0.0001
LEARNING_RATE: 0.0001
MILESTONES: !!python/tuple [10000000]
MOMENTUM: 0.9
BETA: 0.999
GAMMA: 0.1
SCALES_BASE: !!python/tuple [1.0]
IMS_PER_BATCH: 2
NUM_UNITS: 64
HARD_LABEL_THRESHOLD: 0.9
HARD_LABEL_SAMPLING: 0.0
HARD_ANGLE: 5.0
HOUGH_LABEL_THRESHOLD: 100
HOUGH_VOTING_THRESHOLD: 10
HOUGH_SKIP_PIXELS: 10
FG_THRESH: 0.5
FG_THRESH_POSE: 0.5
CLASSES: !!python/tuple [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21] # no large clamp
SYMMETRY: !!python/tuple [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1]
SNAPSHOT_INFIX: ycb_object_self_supervision
SNAPSHOT_EPOCHS: 1
SNAPSHOT_PREFIX: vgg16
USE_FLIPPED: False
CHROMATIC: True
ADD_NOISE: True
VISUALIZE: False
VERTEX_REG: True
POSE_REG: True
SLIM: False
MAX_ITERS_PER_EPOCH: 20000
# synthetic data
SYNTHESIZE: True
SYNNUM: 40000
SYN_RATIO: 3
SYN_BACKGROUND_SPECIFIC: True
SYN_BACKGROUND_SUBTRACT_MEAN: True
SYN_SAMPLE_OBJECT: True
SYN_SAMPLE_POSE: False
SYN_MIN_OBJECT: 5
SYN_MAX_OBJECT: 8
SYN_TNEAR: 0.5
SYN_TFAR: 1.6
SYN_BOUND: 0.3
SYN_STD_ROTATION: 15
SYN_STD_TRANSLATION: 0.05
TEST:
SINGLE_FRAME: True
HOUGH_LABEL_THRESHOLD: 100
HOUGH_VOTING_THRESHOLD: 10
NUM_SDF_ITERATIONS_TRACKING: 50
SDF_TRANSLATION_REG: 1000.0
SDF_ROTATION_REG: 10.0
IMS_PER_BATCH: 1
HOUGH_SKIP_PIXELS: 10
DET_THRESHOLD: 0.2
SCALES_BASE: !!python/tuple [1.0]
VISUALIZE: True
SYNTHESIZE: False
POSE_REFINE: True
ROS_CAMERA: D435
CLASSES: !!python/tuple [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21]
SYMMETRY: !!python/tuple [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1]
================================================
FILE: experiments/cfgs/ycb_video.yml
================================================
EXP_DIR: ycb_video
INPUT: COLOR
TRAIN:
TRAINABLE: True
WEIGHT_DECAY: 0.0001
LEARNING_RATE: 0.001
MILESTONES: !!python/tuple [12]
MOMENTUM: 0.9
BETA: 0.999
GAMMA: 0.1
SCALES_BASE: !!python/tuple [1.0]
IMS_PER_BATCH: 2
MAX_ITERS_PER_EPOCH: 20000
NUM_UNITS: 64
HARD_LABEL_THRESHOLD: 0.9
HARD_LABEL_SAMPLING: 0.0
HARD_ANGLE: 5.0
HOUGH_LABEL_THRESHOLD: 100
HOUGH_VOTING_THRESHOLD: 10
HOUGH_SKIP_PIXELS: 10
FG_THRESH: 0.5
FG_THRESH_POSE: 0.5
CLASSES: !!python/tuple [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]
SNAPSHOT_INFIX: ycb_video
SNAPSHOT_EPOCHS: 1
SNAPSHOT_PREFIX: vgg16
USE_FLIPPED: False
CHROMATIC: True
ADD_NOISE: True
VISUALIZE: False
VERTEX_REG: True
POSE_REG: True
FREEZE_LAYERS: False
VERTEX_W: 10.0
VERTEX_W_INSIDE: 10.0
# synthetic data
SYNTHESIZE: False
SYN_RATIO: 5
SYN_BACKGROUND_SPECIFIC: False
SYN_BACKGROUND_SUBTRACT_MEAN: True
SYN_SAMPLE_OBJECT: True
SYN_SAMPLE_POSE: False
SYN_MIN_OBJECT: 5
SYN_MAX_OBJECT: 8
SYN_TNEAR: 0.5
SYN_TFAR: 1.6
SYN_BOUND: 0.15
SYN_STD_ROTATION: 15
SYN_STD_TRANSLATION: 0.05
TEST:
SINGLE_FRAME: True
HOUGH_LABEL_THRESHOLD: 200
HOUGH_VOTING_THRESHOLD: 10
NUM_SDF_ITERATIONS_TRACKING: 50
SDF_TRANSLATION_REG: 100.0
SDF_ROTATION_REG: 1.0
IMS_PER_BATCH: 1
HOUGH_SKIP_PIXELS: 10
DET_THRESHOLD: 0.1
SCALES_BASE: !!python/tuple [1.0]
VISUALIZE: False
SYNTHESIZE: False
POSE_REFINE: True
ROS_CAMERA: camera
CLASSES: !!python/tuple [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]
# CLASSES: !!python/tuple [0, 1, 2, 3, 4]
================================================
FILE: experiments/scripts/demo.sh
================================================
#!/bin/bash
set -x
set -e
export PYTHONUNBUFFERED="True"
export CUDA_VISIBLE_DEVICES=0
time ./tools/test_images.py --gpu 0 \
--imgdir data/demo/ \
--meta data/demo/meta.yml \
--color *color.png \
--network posecnn \
--pretrained data/checkpoints/ycb_object/vgg16_ycb_object_self_supervision_epoch_8.checkpoint.pth \
--dataset ycb_object_test \
--cfg experiments/cfgs/ycb_object.yml
================================================
FILE: experiments/scripts/dex_ycb_test_s0.sh
================================================
#!/bin/bash
set -x
set -e
export CUDA_VISIBLE_DEVICES=$1
time ./tools/test_net.py --gpu $1 \
--network posecnn \
--pretrained output/dex_ycb/dex_ycb_s0_train/vgg16_dex_ycb_epoch_$2.checkpoint.pth \
--dataset dex_ycb_s0_test \
--cfg experiments/cfgs/dex_ycb.yml
================================================
FILE: experiments/scripts/dex_ycb_test_s1.sh
================================================
#!/bin/bash
set -x
set -e
export CUDA_VISIBLE_DEVICES=$1
time ./tools/test_net.py --gpu $1 \
--network posecnn \
--pretrained output/dex_ycb/dex_ycb_s1_train/vgg16_dex_ycb_epoch_$2.checkpoint.pth \
--dataset dex_ycb_s1_test \
--cfg experiments/cfgs/dex_ycb.yml
================================================
FILE: experiments/scripts/dex_ycb_test_s2.sh
================================================
#!/bin/bash
set -x
set -e
export CUDA_VISIBLE_DEVICES=$1
time ./tools/test_net.py --gpu $1 \
--network posecnn \
--pretrained output/dex_ycb/dex_ycb_s2_train/vgg16_dex_ycb_epoch_$2.checkpoint.pth \
--dataset dex_ycb_s2_test \
--cfg experiments/cfgs/dex_ycb.yml
================================================
FILE: experiments/scripts/dex_ycb_test_s3.sh
================================================
#!/bin/bash
set -x
set -e
export CUDA_VISIBLE_DEVICES=$1
time ./tools/test_net.py --gpu $1 \
--network posecnn \
--pretrained output/dex_ycb/dex_ycb_s3_train/vgg16_dex_ycb_epoch_$2.checkpoint.pth \
--dataset dex_ycb_s3_test \
--cfg experiments/cfgs/dex_ycb.yml
================================================
FILE: experiments/scripts/dex_ycb_train_s0.sh
================================================
#!/bin/bash
set -x
set -e
time ./tools/train_net.py \
--network posecnn \
--pretrained data/checkpoints/vgg16-397923af.pth \
--dataset dex_ycb_s0_train \
--cfg experiments/cfgs/dex_ycb.yml \
--solver sgd \
--epochs 16
================================================
FILE: experiments/scripts/dex_ycb_train_s1.sh
================================================
#!/bin/bash
set -x
set -e
time ./tools/train_net.py \
--network posecnn \
--pretrained data/checkpoints/vgg16-397923af.pth \
--dataset dex_ycb_s1_train \
--cfg experiments/cfgs/dex_ycb.yml \
--solver sgd \
--epochs 16
================================================
FILE: experiments/scripts/dex_ycb_train_s2.sh
================================================
#!/bin/bash
set -x
set -e
time ./tools/train_net.py \
--network posecnn \
--pretrained data/checkpoints/vgg16-397923af.pth \
--dataset dex_ycb_s2_train \
--cfg experiments/cfgs/dex_ycb.yml \
--solver sgd \
--epochs 16
================================================
FILE: experiments/scripts/dex_ycb_train_s3.sh
================================================
#!/bin/bash
set -x
set -e
time ./tools/train_net.py \
--network posecnn \
--pretrained data/checkpoints/vgg16-397923af.pth \
--dataset dex_ycb_s3_train \
--cfg experiments/cfgs/dex_ycb.yml \
--solver sgd \
--epochs 16
================================================
FILE: experiments/scripts/ros_ycb_object_test.sh
================================================
#!/bin/bash
set -x
set -e
export PYTHONUNBUFFERED="True"
export CUDA_VISIBLE_DEVICES=$1
time ./ros/test_images.py --gpu $1 \
--instance 0 \
--network posecnn \
--pretrained data/checkpoints/ycb_object/vgg16_ycb_object_self_supervision_epoch_8.checkpoint.pth \
--dataset ycb_object_test \
--cfg experiments/cfgs/ycb_object.yml
================================================
FILE: experiments/scripts/ros_ycb_object_test_detection.sh
================================================
#!/bin/bash
set -x
set -e
export PYTHONUNBUFFERED="True"
export CUDA_VISIBLE_DEVICES=$1
time ./ros/test_images.py --gpu $1 \
--instance 0 \
--network posecnn \
--pretrained data/checkpoints/ycb_object/vgg16_ycb_object_detection_self_supervision_epoch_8.checkpoint.pth \
--dataset ycb_object_test \
--cfg experiments/cfgs/ycb_object_detection.yml
================================================
FILE: experiments/scripts/ycb_object_test.sh
================================================
#!/bin/bash
set -x
set -e
export PYTHONUNBUFFERED="True"
export CUDA_VISIBLE_DEVICES=$1
time ./tools/test_net.py --gpu $1 \
--network posecnn \
--pretrained output/ycb_object/ycb_object_train/vgg16_ycb_object_epoch_$2.checkpoint.pth \
--dataset ycb_object_test \
--cfg experiments/cfgs/ycb_object.yml
================================================
FILE: experiments/scripts/ycb_object_train.sh
================================================
#!/bin/bash
set -x
set -e
./tools/train_net.py \
--network posecnn \
--pretrained data/checkpoints/vgg16-397923af.pth \
--dataset ycb_object_train \
--cfg experiments/cfgs/ycb_object.yml \
--solver sgd \
--epochs 16
================================================
FILE: experiments/scripts/ycb_object_train_detection.sh
================================================
#!/bin/bash
set -x
set -e
./tools/train_net.py \
--network posecnn \
--pretrained data/checkpoints/vgg16-397923af.pth \
--dataset ycb_object_train \
--cfg experiments/cfgs/ycb_object_detection.yml \
--solver sgd \
--epochs 16
================================================
FILE: experiments/scripts/ycb_object_train_self_supervision.sh
================================================
#!/bin/bash
set -x
set -e
./tools/train_net.py \
--network posecnn \
--pretrained output/ycb_object/ycb_object_train/vgg16_ycb_object_epoch_16.checkpoint.pth \
--dataset ycb_self_supervision_all \
--cfg experiments/cfgs/ycb_object_self_supervision.yml \
--solver sgd \
--epochs 8
================================================
FILE: experiments/scripts/ycb_video_test.sh
================================================
#!/bin/bash
set -x
set -e
export PYTHONUNBUFFERED="True"
export CUDA_VISIBLE_DEVICES=$1
time ./tools/test_net.py --gpu $1 \
--network posecnn \
--pretrained output/ycb_video/ycb_video_train/vgg16_ycb_video_epoch_$2.checkpoint.pth \
--dataset ycb_video_keyframe \
--cfg experiments/cfgs/ycb_video.yml
================================================
FILE: experiments/scripts/ycb_video_train.sh
================================================
#!/bin/bash
set -x
set -e
time ./tools/train_net.py \
--network posecnn \
--pretrained data/checkpoints/vgg16-397923af.pth \
--dataset ycb_video_train \
--cfg experiments/cfgs/ycb_video.yml \
--solver sgd \
--epochs 16
================================================
FILE: lib/datasets/__init__.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
from .imdb import imdb
from .ycb_video import YCBVideo
from .ycb_self_supervision import YCBSelfSupervision
from .ycb_object import YCBObject
from .dex_ycb import DexYCBDataset
from .background import BackgroundDataset
import os.path as osp
ROOT_DIR = osp.join(osp.dirname(__file__), '..', '..')
================================================
FILE: lib/datasets/background.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import torch
import torchvision
import torch.utils.data as data
import os, math
import os.path as osp
from os.path import *
import numpy as np
import numpy.random as npr
import cv2
import datasets
from fcn.config import cfg
from utils.blob import chromatic_transform, add_noise, add_noise_depth
class BackgroundDataset(data.Dataset, datasets.imdb):
def __init__(self, name):
self._name = name
self.files_color = []
self.files_depth = []
if name == 'coco':
background_dir = os.path.join(self.cache_path, '../coco/train2014/train2014')
for filename in os.listdir(background_dir):
self.files_color.append(os.path.join(background_dir, filename))
self.files_color.sort()
elif name == 'texture':
background_dir = os.path.join(self.cache_path, '../textures')
for filename in os.listdir(background_dir):
self.files_color.append(os.path.join(background_dir, filename))
self.files_color.sort()
elif name == 'nvidia':
allencenter = os.path.join(self.cache_path, '../backgrounds/nvidia')
subdirs = os.listdir(allencenter)
for i in range(len(subdirs)):
subdir = subdirs[i]
files = os.listdir(os.path.join(allencenter, subdir))
for j in range(len(files)):
filename = os.path.join(allencenter, subdir, files[j])
self.files_color.append(filename)
self.files_color.sort()
elif name == 'table':
allencenter = os.path.join(self.cache_path, '../backgrounds/table')
subdirs = os.listdir(allencenter)
for i in range(len(subdirs)):
subdir = subdirs[i]
files = os.listdir(os.path.join(allencenter, subdir))
for j in range(len(files)):
filename = os.path.join(allencenter, subdir, files[j])
self.files_color.append(filename)
self.files_color.sort()
elif name == 'isaac':
allencenter = os.path.join(self.cache_path, '../backgrounds/isaac')
subdirs = os.listdir(allencenter)
for i in range(len(subdirs)):
subdir = subdirs[i]
files = os.listdir(os.path.join(allencenter, subdir))
for j in range(len(files)):
filename = os.path.join(allencenter, subdir, files[j])
self.files_color.append(filename)
self.files_color.sort()
elif name == 'rgbd':
comotion = os.path.join(self.cache_path, '../backgrounds/rgbd')
subdirs = os.listdir(comotion)
for i in range(len(subdirs)):
subdir = subdirs[i]
files = os.listdir(os.path.join(comotion, subdir))
for j in range(len(files)):
filename = os.path.join(comotion, subdir, files[j])
if 'depth.png' in filename:
self.files_depth.append(filename)
else:
self.files_color.append(filename)
self.files_color.sort()
self.files_depth.sort()
self._intrinsic_matrix = np.array([[524.7917885754071, 0, 332.5213232846151],
[0, 489.3563960810721, 281.2339855172282],
[0, 0, 1]])
self.num = len(self.files_color)
self.subtract_mean = cfg.TRAIN.SYN_BACKGROUND_SUBTRACT_MEAN
if cfg.TRAIN.SYN_CROP:
self._height = cfg.TRAIN.SYN_CROP_SIZE
self._width = cfg.TRAIN.SYN_CROP_SIZE
else:
self._height = cfg.TRAIN.SYN_HEIGHT
self._width = cfg.TRAIN.SYN_WIDTH
self._pixel_mean = cfg.PIXEL_MEANS
print('{} background images'.format(self.num))
def __len__(self):
return self.num
def __getitem__(self, idx):
filename_color = self.files_color[idx]
if self.name == 'rgbd':
filename_depth = self.files_depth[idx]
else:
filename_depth = None
return self.load(filename_color, filename_depth)
def load(self, filename_color, filename_depth):
if filename_depth is None:
background_depth = np.zeros((3, self._height, self._width), dtype=np.float32)
mask_depth = np.zeros((self._height, self._width), dtype=np.float32)
if filename_depth is None and np.random.rand(1) < cfg.TRAIN.SYN_BACKGROUND_CONSTANT_PROB: # only for rgb cases
# constant background image
background_color = np.ones((self._height, self._width, 3), dtype=np.uint8)
color = np.random.randint(256, size=3)
background_color[:, :, 0] = color[0]
background_color[:, :, 1] = color[1]
background_color[:, :, 2] = color[2]
else:
background_color = cv2.imread(filename_color, cv2.IMREAD_UNCHANGED)
if filename_depth is not None:
background_depth = cv2.imread(filename_depth, cv2.IMREAD_UNCHANGED)
try:
# randomly crop a region as background
bw = background_color.shape[1]
bh = background_color.shape[0]
x1 = npr.randint(0, int(bw/3))
y1 = npr.randint(0, int(bh/3))
x2 = npr.randint(int(2*bw/3), bw)
y2 = npr.randint(int(2*bh/3), bh)
background_color = background_color[y1:y2, x1:x2]
background_color = cv2.resize(background_color, (self._width, self._height), interpolation=cv2.INTER_LINEAR)
if len(background_color.shape) != 3:
background_color = cv2.cvtColor(background_color, cv2.COLOR_GRAY2RGB)
if filename_depth is not None:
background_depth = background_depth[y1:y2, x1:x2]
background_depth = cv2.resize(background_depth, (self._width, self._height), interpolation=cv2.INTER_NEAREST)
background_depth = self.backproject(background_depth, self._intrinsic_matrix, 1000.0)
except:
background_color = np.zeros((self._height, self._width, 3), dtype=np.uint8)
print('bad background_color image', filename_color)
if filename_depth is not None:
background_depth = np.zeros((self._height, self._width, 3), dtype=np.float32)
print('bad depth background image')
if len(background_color.shape) != 3:
background_color = np.zeros((self._height, self._width, 3), dtype=np.uint8)
print('bad background_color image', filename_color)
if filename_depth is not None:
if len(background_depth.shape) != 3:
background_depth = np.zeros((self._height, self._width, 3), dtype=np.float32)
print('bad depth background image')
z_im = background_depth[:, :, 2]
mask_depth = z_im > 0.0
mask_depth = mask_depth.astype(np.float32)
if np.random.rand(1) > 0.1:
background_depth = add_noise_depth(background_depth)
background_depth = background_depth.transpose(2, 0, 1).astype(np.float32)
if np.random.rand(1) > 0.1:
background_color = chromatic_transform(background_color)
if np.random.rand(1) > 0.1:
background_color = add_noise(background_color)
background_color = background_color.astype(np.float32)
if self.subtract_mean:
background_color -= self._pixel_mean
background_color = background_color.transpose(2, 0, 1) / 255.0
sample = {'background_color': background_color,
'background_depth': background_depth,
'mask_depth': mask_depth}
return sample
================================================
FILE: lib/datasets/dex_ycb.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import os
import sys
import yaml
import numpy as np
import torch
import torch.utils.data as data
import numpy as np
import numpy.random as npr
import cv2
import copy
import glob
import scipy
import datasets
from fcn.config import cfg
from utils.blob import pad_im, chromatic_transform, add_noise
from transforms3d.quaternions import mat2quat, quat2mat
from utils.se3 import *
from utils.pose_error import *
from utils.cython_bbox import bbox_overlaps
_SUBJECTS = [
'20200709-subject-01',
'20200813-subject-02',
'20200820-subject-03',
'20200903-subject-04',
'20200908-subject-05',
'20200918-subject-06',
'20200928-subject-07',
'20201002-subject-08',
'20201015-subject-09',
'20201022-subject-10',
]
_SERIALS = [
'836212060125',
'839512060362',
'840412060917',
'841412060263',
'932122060857',
'932122060861',
'932122061900',
'932122062010',
]
_YCB_CLASSES = {
1: '002_master_chef_can',
2: '003_cracker_box',
3: '004_sugar_box',
4: '005_tomato_soup_can',
5: '006_mustard_bottle',
6: '007_tuna_fish_can',
7: '008_pudding_box',
8: '009_gelatin_box',
9: '010_potted_meat_can',
10: '011_banana',
11: '019_pitcher_base',
12: '021_bleach_cleanser',
13: '024_bowl',
14: '025_mug',
15: '035_power_drill',
16: '036_wood_block',
17: '037_scissors',
18: '040_large_marker',
19: '051_large_clamp',
20: '052_extra_large_clamp',
21: '061_foam_brick',
}
_MANO_JOINTS = [
'wrist',
'thumb_mcp',
'thumb_pip',
'thumb_dip',
'thumb_tip',
'index_mcp',
'index_pip',
'index_dip',
'index_tip',
'middle_mcp',
'middle_pip',
'middle_dip',
'middle_tip',
'ring_mcp',
'ring_pip',
'ring_dip',
'ring_tip',
'little_mcp',
'little_pip',
'little_dip',
'little_tip'
]
_MANO_JOINT_CONNECT = [
[0, 1], [ 1, 2], [ 2, 3], [ 3, 4],
[0, 5], [ 5, 6], [ 6, 7], [ 7, 8],
[0, 9], [ 9, 10], [10, 11], [11, 12],
[0, 13], [13, 14], [14, 15], [15, 16],
[0, 17], [17, 18], [18, 19], [19, 20],
]
_BOP_EVAL_SUBSAMPLING_FACTOR = 4
class DexYCBDataset(data.Dataset, datasets.imdb):
def __init__(self, setup, split):
self._setup = setup
self._split = split
self._color_format = "color_{:06d}.jpg"
self._depth_format = "aligned_depth_to_color_{:06d}.png"
self._label_format = "labels_{:06d}.npz"
self._height = 480
self._width = 640
# paths
self._name = 'dex_ycb_' + setup + '_' + split
self._image_set = split
self._dex_ycb_path = self._get_default_path()
path = os.path.join(self._dex_ycb_path, 'data')
self._data_dir = path
self._calib_dir = os.path.join(self._data_dir, "calibration")
self._model_dir = os.path.join(self._data_dir, "models")
self._obj_file = {
k: os.path.join(self._model_dir, v, "textured_simple.obj")
for k, v in _YCB_CLASSES.items()
}
# define all the classes
self._classes_all = ('__background__', '002_master_chef_can', '003_cracker_box', '004_sugar_box', '005_tomato_soup_can', '006_mustard_bottle', \
'007_tuna_fish_can', '008_pudding_box', '009_gelatin_box', '010_potted_meat_can', '011_banana', '019_pitcher_base', \
'021_bleach_cleanser', '024_bowl', '025_mug', '035_power_drill', '036_wood_block', '037_scissors', '040_large_marker', \
'051_large_clamp', '052_extra_large_clamp', '061_foam_brick')
self._num_classes_all = len(self._classes_all)
self._class_colors_all = [(255, 255, 255), (255, 0, 0), (0, 255, 0), (0, 0, 255), (255, 255, 0), (255, 0, 255), (0, 255, 255), \
(128, 0, 0), (0, 128, 0), (0, 0, 128), (128, 128, 0), (128, 0, 128), (0, 128, 128), \
(64, 0, 0), (0, 64, 0), (0, 0, 64), (64, 64, 0), (64, 0, 64), (0, 64, 64),
(192, 0, 0), (0, 192, 0), (0, 0, 192)]
self._symmetry_all = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1]).astype(np.float32)
self._extents_all = self._load_object_extents()
# select a subset of classes
self._classes = [self._classes_all[i] for i in cfg.TRAIN.CLASSES]
self._classes_test = [self._classes_all[i] for i in cfg.TEST.CLASSES]
self._num_classes = len(self._classes)
self._class_colors = [self._class_colors_all[i] for i in cfg.TRAIN.CLASSES]
self._symmetry = self._symmetry_all[cfg.TRAIN.CLASSES]
self._symmetry_test = self._symmetry_all[cfg.TEST.CLASSES]
self._extents = self._extents_all[cfg.TRAIN.CLASSES]
self._extents_test = self._extents_all[cfg.TEST.CLASSES]
self._pixel_mean = cfg.PIXEL_MEANS / 255.0
# train classes
self._points, self._points_all, self._point_blob = \
self._load_object_points(self._classes, self._extents, self._symmetry)
# test classes
self._points_test, self._points_all_test, self._point_blob_test = \
self._load_object_points(self._classes_test, self._extents_test, self._symmetry_test)
# 3D model paths
self.model_mesh_paths = ['{}/{}/textured_simple.obj'.format(self._model_dir, cls) for cls in self._classes_all[1:]]
self.model_sdf_paths = ['{}/{}/textured_simple_low_res.pth'.format(self._model_dir, cls) for cls in self._classes_all[1:]]
self.model_texture_paths = ['{}/{}/texture_map.png'.format(self._model_dir, cls) for cls in self._classes_all[1:]]
self.model_colors = [np.array(self._class_colors_all[i]) / 255.0 for i in range(1, len(self._classes_all))]
self.model_mesh_paths_target = ['{}/{}/textured_simple.obj'.format(self._model_dir, cls) for cls in self._classes[1:]]
self.model_sdf_paths_target = ['{}/{}/textured_simple.sdf'.format(self._model_dir, cls) for cls in self._classes[1:]]
self.model_texture_paths_target = ['{}/{}/texture_map.png'.format(self._model_dir, cls) for cls in self._classes[1:]]
self.model_colors_target = [np.array(self._class_colors_all[i]) / 255.0 for i in cfg.TRAIN.CLASSES[1:]]
# Seen subjects, camera views, grasped objects.
if self._setup == 's0':
if self._split == 'train':
subject_ind = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
serial_ind = [0, 1, 2, 3, 4, 5, 6, 7]
sequence_ind = [i for i in range(100) if i % 5 != 4]
if self._split == 'val':
subject_ind = [0, 1]
serial_ind = [0, 1, 2, 3, 4, 5, 6, 7]
sequence_ind = [i for i in range(100) if i % 5 == 4]
if self._split == 'test':
subject_ind = [2, 3, 4, 5, 6, 7, 8, 9]
serial_ind = [0, 1, 2, 3, 4, 5, 6, 7]
sequence_ind = [i for i in range(100) if i % 5 == 4]
# Unseen subjects.
if self._setup == 's1':
if self._split == 'train':
subject_ind = [0, 1, 2, 3, 4, 5, 9]
serial_ind = [0, 1, 2, 3, 4, 5, 6, 7]
sequence_ind = list(range(100))
if self._split == 'val':
subject_ind = [6]
serial_ind = [0, 1, 2, 3, 4, 5, 6, 7]
sequence_ind = list(range(100))
if self._split == 'test':
subject_ind = [7, 8]
serial_ind = [0, 1, 2, 3, 4, 5, 6, 7]
sequence_ind = list(range(100))
# Unseen camera views.
if self._setup == 's2':
if self._split == 'train':
subject_ind = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
serial_ind = [0, 1, 2, 3, 4, 5]
sequence_ind = list(range(100))
if self._split == 'val':
subject_ind = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
serial_ind = [6]
sequence_ind = list(range(100))
if self._split == 'test':
subject_ind = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
serial_ind = [7]
sequence_ind = list(range(100))
# Unseen grasped objects.
if self._setup == 's3':
if self._split == 'train':
subject_ind = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
serial_ind = [0, 1, 2, 3, 4, 5, 6, 7]
sequence_ind = [
i for i in range(100) if i // 5 not in (3, 7, 11, 15, 19)
]
if self._split == 'val':
subject_ind = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
serial_ind = [0, 1, 2, 3, 4, 5, 6, 7]
sequence_ind = [i for i in range(100) if i // 5 in (3, 19)]
if self._split == 'test':
subject_ind = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
serial_ind = [0, 1, 2, 3, 4, 5, 6, 7]
sequence_ind = [i for i in range(100) if i // 5 in (7, 11, 15)]
self._subjects = [_SUBJECTS[i] for i in subject_ind]
self._serials = [_SERIALS[i] for i in serial_ind]
self._intrinsics = []
for s in self._serials:
intr_file = os.path.join(self._calib_dir, "intrinsics", "{}_{}x{}.yml".format(s, self._width, self._height))
with open(intr_file, 'r') as f:
intr = yaml.load(f, Loader=yaml.FullLoader)
intr = intr['color']
self._intrinsics.append(intr)
# build mapping
self._sequences = []
self._mapping = []
self._ycb_ids = []
offset = 0
for n in self._subjects:
seq = sorted(os.listdir(os.path.join(self._data_dir, n)))
seq = [os.path.join(n, s) for s in seq]
assert len(seq) == 100
seq = [seq[i] for i in sequence_ind]
self._sequences += seq
for i, q in enumerate(seq):
meta_file = os.path.join(self._data_dir, q, "meta.yml")
with open(meta_file, 'r') as f:
meta = yaml.load(f, Loader=yaml.FullLoader)
c = np.arange(len(self._serials))
f = np.arange(meta['num_frames'])
f, c = np.meshgrid(f, c)
c = c.ravel()
f = f.ravel()
s = (offset + i) * np.ones_like(c)
m = np.vstack((s, c, f)).T
self._mapping.append(m)
self._ycb_ids.append(meta['ycb_ids'])
offset += len(seq)
self._mapping = np.vstack(self._mapping)
# sample a subset for training
if split == 'train':
self._mapping = self._mapping[::10]
# dataset size
self._size = len(self._mapping)
print('dataset %s with images %d' % (self._name, self._size))
if cfg.MODE == 'TRAIN' and self._size > cfg.TRAIN.MAX_ITERS_PER_EPOCH * cfg.TRAIN.IMS_PER_BATCH:
self._size = cfg.TRAIN.MAX_ITERS_PER_EPOCH * cfg.TRAIN.IMS_PER_BATCH
def __len__(self):
return self._size
def get_bop_id_from_idx(self, idx):
s, c, f = map(lambda x: x.item(), self._mapping[idx])
scene_id = s * len(self._serials) + c
im_id = f
return scene_id, im_id
def __getitem__(self, idx):
s, c, f = self._mapping[idx]
is_testing = f % _BOP_EVAL_SUBSAMPLING_FACTOR == 0
d = os.path.join(self._data_dir, self._sequences[s], self._serials[c])
roidb = {
'color_file': os.path.join(d, self._color_format.format(f)),
'depth_file': os.path.join(d, self._depth_format.format(f)),
'label_file': os.path.join(d, self._label_format.format(f)),
'intrinsics': self._intrinsics[c],
'ycb_ids': self._ycb_ids[s],
}
# Get the input image blob
random_scale_ind = npr.randint(0, high=len(cfg.TRAIN.SCALES_BASE))
im_blob, im_depth, im_scale, height, width = self._get_image_blob(roidb['color_file'], roidb['depth_file'], random_scale_ind)
# build the label blob
label_blob, mask, meta_data_blob, pose_blob, gt_boxes, vertex_targets, vertex_weights \
= self._get_label_blob(roidb, self._num_classes, im_scale, height, width)
is_syn = 0
im_info = np.array([im_blob.shape[1], im_blob.shape[2], im_scale, is_syn], dtype=np.float32)
scene_id, im_id = self.get_bop_id_from_idx(idx)
video_id = '%04d' % (scene_id)
image_id = '%06d' % (im_id)
sample = {'image_color': im_blob,
'im_depth': im_depth,
'label': label_blob,
'mask': mask,
'meta_data': meta_data_blob,
'poses': pose_blob,
'extents': self._extents,
'points': self._point_blob,
'symmetry': self._symmetry,
'gt_boxes': gt_boxes,
'im_info': im_info,
'video_id': video_id,
'image_id': image_id}
if cfg.TRAIN.VERTEX_REG:
sample['vertex_targets'] = vertex_targets
sample['vertex_weights'] = vertex_weights
if self._split == 'test':
sample['is_testing'] = is_testing
return sample
def _get_image_blob(self, color_file, depth_file, scale_ind):
# rgba
rgba = pad_im(cv2.imread(color_file, cv2.IMREAD_UNCHANGED), 16)
if rgba.shape[2] == 4:
im = np.copy(rgba[:,:,:3])
alpha = rgba[:,:,3]
I = np.where(alpha == 0)
im[I[0], I[1], :] = 0
else:
im = rgba
im_scale = cfg.TRAIN.SCALES_BASE[scale_ind]
if im_scale != 1.0:
im = cv2.resize(im, None, None, fx=im_scale, fy=im_scale, interpolation=cv2.INTER_LINEAR)
height = im.shape[0]
width = im.shape[1]
# chromatic transform
if cfg.TRAIN.CHROMATIC and cfg.MODE == 'TRAIN' and np.random.rand(1) > 0.1:
im = chromatic_transform(im)
if cfg.TRAIN.ADD_NOISE and cfg.MODE == 'TRAIN' and np.random.rand(1) > 0.1:
im = add_noise(im)
im_tensor = torch.from_numpy(im) / 255.0
im_tensor -= self._pixel_mean
image_blob = im_tensor.permute(2, 0, 1).float()
# depth image
im_depth = pad_im(cv2.imread(depth_file, cv2.IMREAD_UNCHANGED), 16)
if im_scale != 1.0:
im_depth = cv2.resize(im_depth, None, None, fx=im_scale, fy=im_scale, interpolation=cv2.INTER_NEAREST)
im_depth = im_depth.astype('float') / 1000.0
return image_blob, im_depth, im_scale, height, width
def _get_label_blob(self, roidb, num_classes, im_scale, height, width):
""" build the label blob """
# parse data
cls_indexes = roidb['ycb_ids']
classes = np.array(cfg.TRAIN.CLASSES)
fx = roidb['intrinsics']['fx']
fy = roidb['intrinsics']['fy']
px = roidb['intrinsics']['ppx']
py = roidb['intrinsics']['ppy']
intrinsic_matrix = np.eye(3, dtype=np.float32)
intrinsic_matrix[0, 0] = fx
intrinsic_matrix[1, 1] = fy
intrinsic_matrix[0, 2] = px
intrinsic_matrix[1, 2] = py
label = np.load(roidb['label_file'])
# read label image
im_label = label['seg']
if im_scale != 1.0:
im_label = cv2.resize(im_label, None, None, fx=im_scale, fy=im_scale, interpolation=cv2.INTER_NEAREST)
label_blob = np.zeros((num_classes, height, width), dtype=np.float32)
label_blob[0, :, :] = 1.0
for i in range(1, num_classes):
I = np.where(im_label == classes[i])
if len(I[0]) > 0:
label_blob[i, I[0], I[1]] = 1.0
label_blob[0, I[0], I[1]] = 0.0
# foreground mask
seg = torch.from_numpy((im_label != 0).astype(np.float32))
mask = seg.unsqueeze(0).repeat((3, 1, 1)).float()
# poses
poses = label['pose_y']
if len(poses.shape) == 2:
poses = np.reshape(poses, (1, 3, 4))
num = poses.shape[0]
assert num == len(cls_indexes), 'number of poses not equal to number of objects'
pose_blob = np.zeros((num_classes, 9), dtype=np.float32)
gt_boxes = np.zeros((num_classes, 5), dtype=np.float32)
center = np.zeros((num, 2), dtype=np.float32)
count = 0
for i in range(num):
cls = int(cls_indexes[i])
ind = np.where(classes == cls)[0]
if len(ind) > 0:
R = poses[i, :, :3]
T = poses[i, :, 3]
pose_blob[count, 0] = 1
pose_blob[count, 1] = ind
qt = mat2quat(R)
# egocentric to allocentric
qt_allocentric = egocentric2allocentric(qt, T)
if qt_allocentric[0] < 0:
qt_allocentric = -1 * qt_allocentric
pose_blob[count, 2:6] = qt_allocentric
pose_blob[count, 6:] = T
# compute center
center[i, 0] = fx * T[0] / T[2] + px
center[i, 1] = fy * T[1] / T[2] + py
# compute box
x3d = np.ones((4, self._points_all.shape[1]), dtype=np.float32)
x3d[0, :] = self._points_all[ind,:,0]
x3d[1, :] = self._points_all[ind,:,1]
x3d[2, :] = self._points_all[ind,:,2]
RT = np.zeros((3, 4), dtype=np.float32)
RT[:3, :3] = quat2mat(qt)
RT[:, 3] = T
x2d = np.matmul(intrinsic_matrix, np.matmul(RT, x3d))
x2d[0, :] = np.divide(x2d[0, :], x2d[2, :])
x2d[1, :] = np.divide(x2d[1, :], x2d[2, :])
gt_boxes[count, 0] = np.min(x2d[0, :]) * im_scale
gt_boxes[count, 1] = np.min(x2d[1, :]) * im_scale
gt_boxes[count, 2] = np.max(x2d[0, :]) * im_scale
gt_boxes[count, 3] = np.max(x2d[1, :]) * im_scale
gt_boxes[count, 4] = ind
count += 1
# construct the meta data
"""
format of the meta_data
intrinsic matrix: meta_data[0 ~ 8]
inverse intrinsic matrix: meta_data[9 ~ 17]
"""
K = np.matrix(intrinsic_matrix) * im_scale
K[2, 2] = 1
Kinv = np.linalg.pinv(K)
meta_data_blob = np.zeros(18, dtype=np.float32)
meta_data_blob[0:9] = K.flatten()
meta_data_blob[9:18] = Kinv.flatten()
# vertex regression target
if cfg.TRAIN.VERTEX_REG:
vertex_targets, vertex_weights = self._generate_vertex_targets(im_label,
cls_indexes, center, poses, classes, num_classes)
else:
vertex_targets = []
vertex_weights = []
return label_blob, mask, meta_data_blob, pose_blob, gt_boxes, vertex_targets, vertex_weights
# compute the voting label image in 2D
def _generate_vertex_targets(self, im_label, cls_indexes, center, poses, classes, num_classes):
width = im_label.shape[1]
height = im_label.shape[0]
vertex_targets = np.zeros((3 * num_classes, height, width), dtype=np.float32)
vertex_weights = np.zeros((3 * num_classes, height, width), dtype=np.float32)
c = np.zeros((2, 1), dtype=np.float32)
for i in range(1, num_classes):
y, x = np.where(im_label == classes[i])
I = np.where(im_label == classes[i])
ind = np.where(cls_indexes == classes[i])[0]
if len(x) > 0 and len(ind) > 0:
c[0] = center[ind, 0]
c[1] = center[ind, 1]
if isinstance(poses, list):
z = poses[int(ind)][2]
else:
if len(poses.shape) == 3:
z = poses[ind, 2, 3]
else:
z = poses[ind, -1]
R = np.tile(c, (1, len(x))) - np.vstack((x, y))
# compute the norm
N = np.linalg.norm(R, axis=0) + 1e-10
# normalization
R = np.divide(R, np.tile(N, (2,1)))
# assignment
vertex_targets[3*i+0, y, x] = R[0,:]
vertex_targets[3*i+1, y, x] = R[1,:]
vertex_targets[3*i+2, y, x] = math.log(z)
vertex_weights[3*i+0, y, x] = cfg.TRAIN.VERTEX_W_INSIDE
vertex_weights[3*i+1, y, x] = cfg.TRAIN.VERTEX_W_INSIDE
vertex_weights[3*i+2, y, x] = cfg.TRAIN.VERTEX_W_INSIDE
return vertex_targets, vertex_weights
def _get_default_path(self):
"""
Return the default path where YCB_Video is expected to be installed.
"""
return os.path.join(datasets.ROOT_DIR, 'data', 'DEX_YCB')
def _load_object_extents(self):
extents = np.zeros((self._num_classes_all, 3), dtype=np.float32)
for i in range(1, self._num_classes_all):
point_file = os.path.join(self._model_dir, self._classes_all[i], 'points.xyz')
print(point_file)
assert os.path.exists(point_file), 'Path does not exist: {}'.format(point_file)
points = np.loadtxt(point_file)
extents[i, :] = 2 * np.max(np.absolute(points), axis=0)
return extents
def _load_object_points(self, classes, extents, symmetry):
points = [[] for _ in range(len(classes))]
num = np.inf
num_classes = len(classes)
for i in range(1, num_classes):
point_file = os.path.join(self._model_dir, classes[i], 'points.xyz')
print(point_file)
assert os.path.exists(point_file), 'Path does not exist: {}'.format(point_file)
points[i] = np.loadtxt(point_file)
if points[i].shape[0] < num:
num = points[i].shape[0]
points_all = np.zeros((num_classes, num, 3), dtype=np.float32)
for i in range(1, num_classes):
points_all[i, :, :] = points[i][:num, :]
# rescale the points
point_blob = points_all.copy()
for i in range(1, num_classes):
# compute the rescaling factor for the points
weight = 10.0 / np.amax(extents[i, :])
if weight < 10:
weight = 10
if symmetry[i] > 0:
point_blob[i, :, :] = 4 * weight * point_blob[i, :, :]
else:
point_blob[i, :, :] = weight * point_blob[i, :, :]
return points, points_all, point_blob
def write_dop_results(self, output_dir):
# only write the result file
filename = os.path.join(output_dir, 'posecnn_' + self.name + '.csv')
f = open(filename, 'w')
f.write('scene_id,im_id,obj_id,score,R,t,time\n')
if cfg.TEST.POSE_REFINE:
filename_refined = os.path.join(output_dir, 'posecnn_' + self.name + '_refined.csv')
f1 = open(filename_refined, 'w')
f1.write('scene_id,im_id,obj_id,score,R,t,time\n')
# list the mat file
images_color = []
filename = os.path.join(output_dir, '*.mat')
files = sorted(glob.glob(filename))
# for each image
for i in range(len(files)):
filename = os.path.basename(files[i])
# parse filename
pos = filename.find('_')
scene_id = int(filename[:pos])
im_id = int(filename[pos+1:-4])
# load result
print(files[i])
result = scipy.io.loadmat(files[i])
if len(result['rois']) == 0:
continue
rois = result['rois']
num = rois.shape[0]
for j in range(num):
obj_id = cfg.TRAIN.CLASSES[int(rois[j, 1])]
if obj_id == 0:
continue
score = rois[j, -1]
run_time = -1
# pose from network
R = quat2mat(result['poses'][j, :4].flatten())
t = result['poses'][j, 4:] * 1000
line = '{scene_id},{im_id},{obj_id},{score},{R},{t},{time}\n'.format(
scene_id=scene_id,
im_id=im_id,
obj_id=obj_id,
score=score,
R=' '.join(map(str, R.flatten().tolist())),
t=' '.join(map(str, t.flatten().tolist())),
time=run_time)
f.write(line)
if cfg.TEST.POSE_REFINE:
R = quat2mat(result['poses_refined'][j, :4].flatten())
t = result['poses_refined'][j, 4:] * 1000
line = '{scene_id},{im_id},{obj_id},{score},{R},{t},{time}\n'.format(
scene_id=scene_id,
im_id=im_id,
obj_id=obj_id,
score=score,
R=' '.join(map(str, R.flatten().tolist())),
t=' '.join(map(str, t.flatten().tolist())),
time=run_time)
f1.write(line)
# close file
f.close()
if cfg.TEST.POSE_REFINE:
f1.close()
# compute box
def compute_box(self, cls, intrinsic_matrix, RT):
classes = np.array(cfg.TRAIN.CLASSES)
ind = np.where(classes == cls)[0]
x3d = np.ones((4, self._points_all.shape[1]), dtype=np.float32)
x3d[0, :] = self._points_all[ind,:,0]
x3d[1, :] = self._points_all[ind,:,1]
x3d[2, :] = self._points_all[ind,:,2]
x2d = np.matmul(intrinsic_matrix, np.matmul(RT, x3d))
x2d[0, :] = np.divide(x2d[0, :], x2d[2, :])
x2d[1, :] = np.divide(x2d[1, :], x2d[2, :])
x1 = np.min(x2d[0, :])
y1 = np.min(x2d[1, :])
x2 = np.max(x2d[0, :])
y2 = np.max(x2d[1, :])
return [x1, y1, x2, y2]
def evaluation(self, output_dir):
self.write_dop_results(output_dir)
filename = os.path.join(output_dir, 'results_posecnn.mat')
if os.path.exists(filename):
results_all = scipy.io.loadmat(filename)
print('load results from file')
print(filename)
distances_sys = results_all['distances_sys']
distances_non = results_all['distances_non']
errors_rotation = results_all['errors_rotation']
errors_translation = results_all['errors_translation']
results_seq_id = results_all['results_seq_id'].flatten()
results_frame_id = results_all['results_frame_id'].flatten()
results_object_id = results_all['results_object_id'].flatten()
results_cls_id = results_all['results_cls_id'].flatten()
else:
# save results
num_max = 200000
num_results = 2
distances_sys = np.zeros((num_max, num_results), dtype=np.float32)
distances_non = np.zeros((num_max, num_results), dtype=np.float32)
errors_rotation = np.zeros((num_max, num_results), dtype=np.float32)
errors_translation = np.zeros((num_max, num_results), dtype=np.float32)
results_seq_id = np.zeros((num_max, ), dtype=np.float32)
results_frame_id = np.zeros((num_max, ), dtype=np.float32)
results_object_id = np.zeros((num_max, ), dtype=np.float32)
results_cls_id = np.zeros((num_max, ), dtype=np.float32)
# for each image
count = -1
for i in range(len(self._mapping)):
s, c, f = self._mapping[i]
is_testing = f % _BOP_EVAL_SUBSAMPLING_FACTOR == 0
if not is_testing:
continue
# intrinsics
intrinsics = self._intrinsics[c]
intrinsic_matrix = np.eye(3, dtype=np.float32)
intrinsic_matrix[0, 0] = intrinsics['fx']
intrinsic_matrix[1, 1] = intrinsics['fy']
intrinsic_matrix[0, 2] = intrinsics['ppx']
intrinsic_matrix[1, 2] = intrinsics['ppy']
# parse keyframe name
scene_id, im_id = self.get_bop_id_from_idx(i)
# load result
filename = os.path.join(output_dir, '%04d_%06d.mat' % (scene_id, im_id))
print(filename)
result = scipy.io.loadmat(filename)
# load gt
d = os.path.join(self._data_dir, self._sequences[s], self._serials[c])
label_file = os.path.join(d, self._label_format.format(f))
label = np.load(label_file)
cls_indexes = np.array(self._ycb_ids[s]).flatten()
# poses
poses = label['pose_y']
if len(poses.shape) == 2:
poses = np.reshape(poses, (1, 3, 4))
num = poses.shape[0]
assert num == len(cls_indexes), 'number of poses not equal to number of objects'
# instance label
im_label = label['seg']
instance_ids = np.unique(im_label)
if instance_ids[0] == 0:
instance_ids = instance_ids[1:]
if instance_ids[-1] == 255:
instance_ids = instance_ids[:-1]
# for each gt poses
for j in range(len(instance_ids)):
cls = instance_ids[j]
# find the number of pixels of the object
pixels = np.sum(im_label == cls)
if pixels < 200:
continue
count += 1
# find the pose
object_index = np.where(cls_indexes == cls)[0][0]
RT_gt = poses[object_index, :, :]
box_gt = self.compute_box(cls, intrinsic_matrix, RT_gt)
results_seq_id[count] = scene_id
results_frame_id[count] = im_id
results_object_id[count] = object_index
results_cls_id[count] = cls
# network result
roi_index = []
if len(result['rois']) > 0:
for k in range(result['rois'].shape[0]):
ind = int(result['rois'][k, 1])
if cls == cfg.TRAIN.CLASSES[ind]:
roi_index.append(k)
# select the roi
if len(roi_index) > 1:
# overlaps: (rois x gt_boxes)
roi_blob = result['rois'][roi_index, :]
roi_blob = roi_blob[:, (0, 2, 3, 4, 5, 1)]
gt_box_blob = np.zeros((1, 5), dtype=np.float32)
gt_box_blob[0, 1:] = box_gt
overlaps = bbox_overlaps(
np.ascontiguousarray(roi_blob[:, :5], dtype=np.float),
np.ascontiguousarray(gt_box_blob, dtype=np.float)).flatten()
assignment = overlaps.argmax()
roi_index = [roi_index[assignment]]
if len(roi_index) > 0:
RT = np.zeros((3, 4), dtype=np.float32)
ind = int(result['rois'][roi_index, 1])
points = self._points[ind]
# pose from network
RT[:3, :3] = quat2mat(result['poses'][roi_index, :4].flatten())
RT[:, 3] = result['poses'][roi_index, 4:]
distances_sys[count, 0] = adi(RT[:3, :3], RT[:, 3], RT_gt[:3, :3], RT_gt[:, 3], points)
distances_non[count, 0] = add(RT[:3, :3], RT[:, 3], RT_gt[:3, :3], RT_gt[:, 3], points)
errors_rotation[count, 0] = re(RT[:3, :3], RT_gt[:3, :3])
errors_translation[count, 0] = te(RT[:, 3], RT_gt[:, 3])
# pose after depth refinement
if cfg.TEST.POSE_REFINE:
RT[:3, :3] = quat2mat(result['poses_refined'][roi_index, :4].flatten())
RT[:, 3] = result['poses_refined'][roi_index, 4:]
distances_sys[count, 1] = adi(RT[:3, :3], RT[:, 3], RT_gt[:3, :3], RT_gt[:, 3], points)
distances_non[count, 1] = add(RT[:3, :3], RT[:, 3], RT_gt[:3, :3], RT_gt[:, 3], points)
errors_rotation[count, 1] = re(RT[:3, :3], RT_gt[:3, :3])
errors_translation[count, 1] = te(RT[:, 3], RT_gt[:, 3])
else:
distances_sys[count, 1] = np.inf
distances_non[count, 1] = np.inf
errors_rotation[count, 1] = np.inf
errors_translation[count, 1] = np.inf
else:
distances_sys[count, :] = np.inf
distances_non[count, :] = np.inf
errors_rotation[count, :] = np.inf
errors_translation[count, :] = np.inf
distances_sys = distances_sys[:count+1, :]
distances_non = distances_non[:count+1, :]
errors_rotation = errors_rotation[:count+1, :]
errors_translation = errors_translation[:count+1, :]
results_seq_id = results_seq_id[:count+1]
results_frame_id = results_frame_id[:count+1]
results_object_id = results_object_id[:count+1]
results_cls_id = results_cls_id[:count+1]
results_all = {'distances_sys': distances_sys,
'distances_non': distances_non,
'errors_rotation': errors_rotation,
'errors_translation': errors_translation,
'results_seq_id': results_seq_id,
'results_frame_id': results_frame_id,
'results_object_id': results_object_id,
'results_cls_id': results_cls_id }
filename = os.path.join(output_dir, 'results_posecnn.mat')
scipy.io.savemat(filename, results_all)
# print the results
# for each class
import matplotlib.pyplot as plt
max_distance = 0.1
index_plot = [0, 1]
color = ['r', 'b']
leng = ['PoseCNN', 'PoseCNN refined']
num = len(leng)
ADD = np.zeros((self._num_classes_all, num), dtype=np.float32)
ADDS = np.zeros((self._num_classes_all, num), dtype=np.float32)
TS = np.zeros((self._num_classes_all, num), dtype=np.float32)
classes = list(copy.copy(self._classes_all))
classes[0] = 'all'
for k in range(self._num_classes_all):
fig = plt.figure(figsize=(16.0, 10.0))
if k == 0:
index = range(len(results_cls_id))
else:
index = np.where(results_cls_id == k)[0]
if len(index) == 0:
continue
print('%s: %d objects' % (classes[k], len(index)))
# distance symmetry
ax = fig.add_subplot(2, 3, 1)
lengs = []
for i in index_plot:
D = distances_sys[index, i]
ind = np.where(D > max_distance)[0]
D[ind] = np.inf
d = np.sort(D)
n = len(d)
accuracy = np.cumsum(np.ones((n, ), np.float32)) / n
plt.plot(d, accuracy, color[i], linewidth=2)
ADDS[k, i] = VOCap(d, accuracy)
lengs.append('%s (%.2f)' % (leng[i], ADDS[k, i] * 100))
print('%s, %s: %d objects missed' % (classes[k], leng[i], np.sum(np.isinf(D))))
ax.legend(lengs)
plt.xlabel('Average distance threshold in meter (symmetry)')
plt.ylabel('accuracy')
ax.set_title(classes[k])
# distance non-symmetry
ax = fig.add_subplot(2, 3, 2)
lengs = []
for i in index_plot:
D = distances_non[index, i]
ind = np.where(D > max_distance)[0]
D[ind] = np.inf
d = np.sort(D)
n = len(d)
accuracy = np.cumsum(np.ones((n, ), np.float32)) / n
plt.plot(d, accuracy, color[i], linewidth=2)
ADD[k, i] = VOCap(d, accuracy)
lengs.append('%s (%.2f)' % (leng[i], ADD[k, i] * 100))
print('%s, %s: %d objects missed' % (classes[k], leng[i], np.sum(np.isinf(D))))
ax.legend(lengs)
plt.xlabel('Average distance threshold in meter (non-symmetry)')
plt.ylabel('accuracy')
ax.set_title(classes[k])
# translation
ax = fig.add_subplot(2, 3, 3)
lengs = []
for i in index_plot:
D = errors_translation[index, i]
ind = np.where(D > max_distance)[0]
D[ind] = np.inf
d = np.sort(D)
n = len(d)
accuracy = np.cumsum(np.ones((n, ), np.float32)) / n
plt.plot(d, accuracy, color[i], linewidth=2)
TS[k, i] = VOCap(d, accuracy)
lengs.append('%s (%.2f)' % (leng[i], TS[k, i] * 100))
print('%s, %s: %d objects missed' % (classes[k], leng[i], np.sum(np.isinf(D))))
ax.legend(lengs)
plt.xlabel('Translation threshold in meter')
plt.ylabel('accuracy')
ax.set_title(classes[k])
# rotation histogram
count = 4
for i in index_plot:
ax = fig.add_subplot(2, 3, count)
D = errors_rotation[index, i]
ind = np.where(np.isfinite(D))[0]
D = D[ind]
ax.hist(D, bins=range(0, 190, 10), range=(0, 180))
plt.xlabel('Rotation angle error')
plt.ylabel('count')
ax.set_title(leng[i])
count += 1
# mng = plt.get_current_fig_manager()
# mng.full_screen_toggle()
filename = output_dir + '/' + classes[k] + '.png'
# plt.show()
plt.savefig(filename)
# print ADD
print('==================ADD======================')
for k in range(len(classes)):
print('%s: %f' % (classes[k], ADD[k, 0]))
for k in range(len(classes)-1):
print('%f' % (ADD[k+1, 0]))
print('%f' % (ADD[0, 0]))
print(cfg.TRAIN.SNAPSHOT_INFIX)
print('===========================================')
# print ADD-S
print('==================ADD-S====================')
for k in range(len(classes)):
print('%s: %f' % (classes[k], ADDS[k, 0]))
for k in range(len(classes)-1):
print('%f' % (ADDS[k+1, 0]))
print('%f' % (ADDS[0, 0]))
print(cfg.TRAIN.SNAPSHOT_INFIX)
print('===========================================')
# print ADD
print('==================ADD refined======================')
for k in range(len(classes)):
print('%s: %f' % (classes[k], ADD[k, 1]))
for k in range(len(classes)-1):
print('%f' % (ADD[k+1, 1]))
print('%f' % (ADD[0, 1]))
print(cfg.TRAIN.SNAPSHOT_INFIX)
print('===========================================')
# print ADD-S
print('==================ADD-S refined====================')
for k in range(len(classes)):
print('%s: %f' % (classes[k], ADDS[k, 1]))
for k in range(len(classes)-1):
print('%f' % (ADDS[k+1, 1]))
print('%f' % (ADDS[0, 1]))
print(cfg.TRAIN.SNAPSHOT_INFIX)
print('===========================================')
================================================
FILE: lib/datasets/factory.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
"""Factory method for easily getting imdbs by name."""
__sets = {}
import datasets.ycb_video
import datasets.ycb_object
import datasets.ycb_self_supervision
import datasets.dex_ycb
import datasets.background
import numpy as np
# ycb video dataset
for split in ['train', 'val', 'keyframe', 'trainval', 'debug']:
name = 'ycb_video_{}'.format(split)
print(name)
__sets[name] = (lambda split=split:
datasets.YCBVideo(split))
# ycb object dataset
for split in ['train', 'test']:
name = 'ycb_object_{}'.format(split)
print(name)
__sets[name] = (lambda split=split:
datasets.YCBObject(split))
# ycb self supervision dataset
for split in ['train_1', 'train_2', 'train_3', 'train_4', 'train_5', 'test', 'all', 'train_block_median', 'train_block_median_azure', 'train_block_median_demo', 'train_block_median_azure_demo', 'train_table',
'debug', 'train_block', 'train_block_azure', 'train_block_big_sim', 'train_block_median_sim', 'train_block_small_sim']:
name = 'ycb_self_supervision_{}'.format(split)
print(name)
__sets[name] = (lambda split=split:
datasets.YCBSelfSupervision(split))
# background dataset
for split in ['coco', 'rgbd', 'nvidia', 'table', 'isaac', 'texture']:
name = 'background_{}'.format(split)
print(name)
__sets[name] = (lambda split=split:
datasets.BackgroundDataset(split))
# DEX YCB dataset
for setup in ('s0', 's1', 's2', 's3'):
for split in ('train', 'val', 'test'):
name = 'dex_ycb_{}_{}'.format(setup, split)
__sets[name] = (lambda setup=setup, split=split: datasets.DexYCBDataset(setup, split))
def get_dataset(name):
"""Get an imdb (image database) by name."""
if name not in __sets:
raise KeyError('Unknown dataset: {}'.format(name))
return __sets[name]()
def list_datasets():
"""List all registered imdbs."""
return __sets.keys()
================================================
FILE: lib/datasets/imdb.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import os
import os.path as osp
import numpy as np
import datasets
import math
import glob
from fcn.config import cfg
class imdb(object):
"""Image database."""
def __init__(self):
self._name = ''
self._num_classes = 0
self._classes = []
self._class_colors = []
@property
def name(self):
return self._name
@property
def num_classes(self):
return len(self._classes)
@property
def classes(self):
return self._classes
@property
def class_colors(self):
return self._class_colors
@property
def cache_path(self):
cache_path = osp.abspath(osp.join(datasets.ROOT_DIR, 'data', 'cache'))
if not os.path.exists(cache_path):
os.makedirs(cache_path)
return cache_path
# backproject pixels into 3D points in camera's coordinate system
def backproject(self, depth_cv, intrinsic_matrix, factor):
depth = depth_cv.astype(np.float32, copy=True) / factor
index = np.where(~np.isfinite(depth))
depth[index[0], index[1]] = 0
# get intrinsic matrix
K = intrinsic_matrix
Kinv = np.linalg.inv(K)
# compute the 3D points
width = depth.shape[1]
height = depth.shape[0]
# construct the 2D points matrix
x, y = np.meshgrid(np.arange(width), np.arange(height))
ones = np.ones((height, width), dtype=np.float32)
x2d = np.stack((x, y, ones), axis=2).reshape(width*height, 3)
# backprojection
R = np.dot(Kinv, x2d.transpose())
# compute the 3D points
X = np.multiply(np.tile(depth.reshape(1, width*height), (3, 1)), R)
return np.array(X).transpose().reshape((height, width, 3))
def _build_uniform_poses(self):
self.eulers = []
interval = cfg.TRAIN.UNIFORM_POSE_INTERVAL
for yaw in range(-180, 180, interval):
for pitch in range(-90, 90, interval):
for roll in range(-180, 180, interval):
self.eulers.append([yaw, pitch, roll])
# sample indexes
num_poses = len(self.eulers)
num_classes = len(self._classes_all) - 1 # no background
self.pose_indexes = np.zeros((num_classes, ), dtype=np.int32)
self.pose_lists = []
for i in range(num_classes):
self.pose_lists.append(np.random.permutation(np.arange(num_poses)))
def evaluation(self, output_dir):
print('evaluation function not implemented for dataset %s' % self._name)
================================================
FILE: lib/datasets/ycb_object.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import torch
import torch.utils.data as data
import os, math
import sys
import os.path as osp
from os.path import *
import numpy as np
import numpy.random as npr
import cv2
try:
import cPickle # Use cPickle on Python 2.7
except ImportError:
import pickle as cPickle
import scipy.io
import glob
import datasets
from fcn.config import cfg
from utils.blob import pad_im, chromatic_transform, add_noise, add_noise_cuda, add_noise_depth_cuda
from transforms3d.quaternions import mat2quat, quat2mat
from transforms3d.euler import euler2quat
from utils.se3 import *
from scipy.optimize import minimize
import matplotlib.pyplot as plt
class YCBObject(data.Dataset, datasets.imdb):
def __init__(self, image_set, ycb_object_path = None):
self._name = 'ycb_object_' + image_set
self._image_set = image_set
self._ycb_object_path = self._get_default_path() if ycb_object_path is None \
else ycb_object_path
self._data_path = os.path.join(self._ycb_object_path, 'data')
self._model_path = os.path.join(datasets.ROOT_DIR, 'data', 'models')
self.root_path = self._ycb_object_path
# define all the classes
self._classes_all = ('__background__', '002_master_chef_can', '003_cracker_box', '004_sugar_box', '005_tomato_soup_can', '006_mustard_bottle', \
'007_tuna_fish_can', '008_pudding_box', '009_gelatin_box', '010_potted_meat_can', '011_banana', '019_pitcher_base', \
'021_bleach_cleanser', '024_bowl', '025_mug', '035_power_drill', '036_wood_block', '037_scissors', '040_large_marker', \
'051_large_clamp', '052_extra_large_clamp', '061_foam_brick', 'holiday_cup1', 'holiday_cup2', 'sanning_mug', \
'001_chips_can', 'block_red_big', 'block_green_big', 'block_blue_big', 'block_yellow_big', \
'block_red_small', 'block_green_small', 'block_blue_small', 'block_yellow_small', \
'block_red_median', 'block_green_median', 'block_blue_median', 'block_yellow_median',
'fusion_duplo_dude', 'cabinet_handle')
self._num_classes_all = len(self._classes_all)
self._class_colors_all = [(255, 255, 255), (255, 0, 0), (0, 255, 0), (0, 0, 255), (255, 255, 0), (255, 0, 255), (0, 255, 255), \
(0, 0, 128), (0, 128, 0), (128, 0, 0), (128, 128, 0), (128, 0, 128), (0, 128, 128), \
(0, 64, 0), (64, 0, 0), (0, 0, 64), (64, 64, 0), (64, 0, 64), (0, 64, 64), \
(192, 0, 0), (0, 192, 0), (0, 0, 192), (192, 192, 0), (192, 0, 192), (0, 192, 192), (32, 0, 0), \
(150, 0, 0), (0, 150, 0), (0, 0, 150), (150, 150, 0), (75, 0, 0), (0, 75, 0), (0, 0, 75), (75, 75, 0), \
(200, 0, 0), (0, 200, 0), (0, 0, 200), (200, 200, 0), (16, 16, 0), (16, 16, 16)]
self._extents_all = self._load_object_extents()
self._width = cfg.TRAIN.SYN_WIDTH
self._height = cfg.TRAIN.SYN_HEIGHT
self._intrinsic_matrix = np.array([[524.7917885754071, 0, 332.5213232846151],
[0, 489.3563960810721, 281.2339855172282],
[0, 0, 1]])
# select a subset of classes
self._classes = [self._classes_all[i] for i in cfg.TRAIN.CLASSES]
self._classes_test = [self._classes_all[i] for i in cfg.TEST.CLASSES]
self._num_classes = len(self._classes)
self._class_colors = [self._class_colors_all[i] for i in cfg.TRAIN.CLASSES]
self._class_colors_test = [self._class_colors_all[i] for i in cfg.TEST.CLASSES]
self._symmetry = np.array(cfg.TRAIN.SYMMETRY).astype(np.float32)
self._symmetry_test = np.array(cfg.TEST.SYMMETRY).astype(np.float32)
self._extents = self._extents_all[cfg.TRAIN.CLASSES]
self._extents_test = self._extents_all[cfg.TEST.CLASSES]
# train classes
self._points, self._points_all, self._point_blob = \
self._load_object_points(self._classes, self._extents, self._symmetry)
# test classes
self._points_test, self._points_all_test, self._point_blob_test = \
self._load_object_points(self._classes_test, self._extents_test, self._symmetry_test)
self._pixel_mean = torch.tensor(cfg.PIXEL_MEANS / 255.0).cuda().float()
self._classes_other = []
for i in range(self._num_classes_all):
if i not in cfg.TRAIN.CLASSES:
# do not use clamp
if i == 19 and 20 in cfg.TRAIN.CLASSES:
continue
if i == 20 and 19 in cfg.TRAIN.CLASSES:
continue
self._classes_other.append(i)
self._num_classes_other = len(self._classes_other)
# 3D model paths
self.model_sdf_paths = ['{}/{}/textured_simple_low_res.pth'.format(self._model_path, cls) for cls in self._classes_all[1:]]
self.model_colors = [np.array(self._class_colors_all[i]) / 255.0 for i in range(1, len(self._classes_all))]
self.model_mesh_paths = []
for cls in self._classes_all[1:]:
filename = '{}/{}/textured_simple.ply'.format(self._model_path, cls)
if osp.exists(filename):
self.model_mesh_paths.append(filename)
continue
filename = '{}/{}/textured_simple.obj'.format(self._model_path, cls)
if osp.exists(filename):
self.model_mesh_paths.append(filename)
continue
self.model_texture_paths = []
for cls in self._classes_all[1:]:
filename = '{}/{}/texture_map.png'.format(self._model_path, cls)
if osp.exists(filename):
self.model_texture_paths.append(filename)
else:
self.model_texture_paths.append('')
# target meshes
self.model_colors_target = [np.array(self._class_colors_all[i]) / 255.0 for i in cfg.TRAIN.CLASSES[1:]]
self.model_mesh_paths_target = []
for cls in self._classes[1:]:
filename = '{}/{}/textured_simple.obj'.format(self._model_path, cls)
if osp.exists(filename):
self.model_mesh_paths_target.append(filename)
continue
filename = '{}/{}/textured_simple.ply'.format(self._model_path, cls)
if osp.exists(filename):
self.model_mesh_paths_target.append(filename)
self.model_texture_paths_target = []
for cls in self._classes[1:]:
filename = '{}/{}/texture_map.png'.format(self._model_path, cls)
if osp.exists(filename):
self.model_texture_paths_target.append(filename)
else:
self.model_texture_paths_target.append('')
self._class_to_ind = dict(zip(self._classes, range(self._num_classes)))
self._size = cfg.TRAIN.SYNNUM
self._build_uniform_poses()
# sample indexes real for ycb object
num_poses = 600
num_classes = len(self._classes_all) - 1 # no background
self.pose_indexes_real = np.zeros((num_classes, ), dtype=np.int32)
self.pose_lists_real = []
self.pose_images = []
for i in range(num_classes):
self.pose_lists_real.append(np.random.permutation(np.arange(num_poses)))
dirname = osp.join(self._data_path, self._classes_all[i+1], '*.jpg')
files = glob.glob(dirname)
self.pose_images.append(files)
# construct fake inputs
label_blob = np.zeros((1, self._num_classes, self._height, self._width), dtype=np.float32)
pose_blob = np.zeros((1, self._num_classes, 9), dtype=np.float32)
gt_boxes = np.zeros((1, self._num_classes, 5), dtype=np.float32)
# construct the meta data
K = self._intrinsic_matrix
Kinv = np.linalg.pinv(K)
meta_data_blob = np.zeros((1, 18), dtype=np.float32)
meta_data_blob[0, 0:9] = K.flatten()
meta_data_blob[0, 9:18] = Kinv.flatten()
self.input_labels = torch.from_numpy(label_blob).cuda()
self.input_meta_data = torch.from_numpy(meta_data_blob).cuda()
self.input_extents = torch.from_numpy(self._extents).cuda()
self.input_gt_boxes = torch.from_numpy(gt_boxes).cuda()
self.input_poses = torch.from_numpy(pose_blob).cuda()
self.input_points = torch.from_numpy(self._point_blob).cuda()
self.input_symmetry = torch.from_numpy(self._symmetry).cuda()
def _render_item(self):
height = cfg.TRAIN.SYN_HEIGHT
width = cfg.TRAIN.SYN_WIDTH
fx = self._intrinsic_matrix[0, 0]
fy = self._intrinsic_matrix[1, 1]
px = self._intrinsic_matrix[0, 2]
py = self._intrinsic_matrix[1, 2]
zfar = 6.0
znear = 0.01
# sample target objects
if cfg.TRAIN.SYN_SAMPLE_OBJECT:
maxnum = np.minimum(self.num_classes-1, cfg.TRAIN.SYN_MAX_OBJECT)
num = np.random.randint(cfg.TRAIN.SYN_MIN_OBJECT, maxnum+1)
perm = np.random.permutation(np.arange(self.num_classes-1))
indexes_target = perm[:num] + 1
else:
num = self.num_classes - 1
indexes_target = np.arange(num) + 1
num_target = num
cls_indexes = [cfg.TRAIN.CLASSES[i]-1 for i in indexes_target]
# sample other objects as distractors
if cfg.TRAIN.SYN_SAMPLE_DISTRACTOR:
num_other = min(5, self._num_classes_other)
num_selected = np.random.randint(0, num_other+1)
perm = np.random.permutation(np.arange(self._num_classes_other))
indexes = perm[:num_selected]
for i in range(num_selected):
cls_indexes.append(self._classes_other[indexes[i]]-1)
else:
num_selected = 0
# sample poses
num = num_target + num_selected
poses_all = []
for i in range(num):
qt = np.zeros((7, ), dtype=np.float32)
# rotation
cls = int(cls_indexes[i])
if self.pose_indexes[cls] >= len(self.pose_lists[cls]):
self.pose_indexes[cls] = 0
self.pose_lists[cls] = np.random.permutation(np.arange(len(self.eulers)))
yaw = self.eulers[self.pose_lists[cls][self.pose_indexes[cls]]][0] + 15 * np.random.randn()
pitch = self.eulers[self.pose_lists[cls][self.pose_indexes[cls]]][1] + 15 * np.random.randn()
pitch = np.clip(pitch, -90, 90)
roll = self.eulers[self.pose_lists[cls][self.pose_indexes[cls]]][2] + 15 * np.random.randn()
qt[3:] = euler2quat(yaw * math.pi / 180.0, pitch * math.pi / 180.0, roll * math.pi / 180.0, 'syxz')
self.pose_indexes[cls] += 1
# translation
bound = cfg.TRAIN.SYN_BOUND
if i == 0 or i >= num_target or np.random.rand(1) > 0.5:
qt[0] = np.random.uniform(-bound, bound)
qt[1] = np.random.uniform(-bound, bound)
qt[2] = np.random.uniform(cfg.TRAIN.SYN_TNEAR, cfg.TRAIN.SYN_TFAR)
else:
# sample an object nearby
object_id = np.random.randint(0, i, size=1)[0]
extent = 2 * np.mean(self._extents_all[cls+1, :])
flag = np.random.randint(0, 2)
if flag == 0:
flag = -1
qt[0] = poses_all[object_id][0] + flag * extent * np.random.uniform(1.0, 1.5)
if np.absolute(qt[0]) > bound:
qt[0] = poses_all[object_id][0] - flag * extent * np.random.uniform(1.0, 1.5)
if np.absolute(qt[0]) > bound:
qt[0] = np.random.uniform(-bound, bound)
flag = np.random.randint(0, 2)
if flag == 0:
flag = -1
qt[1] = poses_all[object_id][1] + flag * extent * np.random.uniform(1.0, 1.5)
if np.absolute(qt[1]) > bound:
qt[1] = poses_all[object_id][1] - flag * extent * np.random.uniform(1.0, 1.5)
if np.absolute(qt[1]) > bound:
qt[1] = np.random.uniform(-bound, bound)
qt[2] = poses_all[object_id][2] - extent * np.random.uniform(2.0, 4.0)
if qt[2] < cfg.TRAIN.SYN_TNEAR:
qt[2] = poses_all[object_id][2] + extent * np.random.uniform(2.0, 4.0)
poses_all.append(qt)
cfg.renderer.set_poses(poses_all)
# sample lighting
cfg.renderer.set_light_pos(np.random.uniform(-0.5, 0.5, 3))
intensity = np.random.uniform(0.8, 2)
light_color = intensity * np.random.uniform(0.9, 1.1, 3)
cfg.renderer.set_light_color(light_color)
# rendering
cfg.renderer.set_projection_matrix(width, height, fx, fy, px, py, znear, zfar)
image_tensor = torch.cuda.FloatTensor(height, width, 4).detach()
seg_tensor = torch.cuda.FloatTensor(height, width, 4).detach()
pc_tensor = torch.cuda.FloatTensor(height, width, 4).detach()
cfg.renderer.render(cls_indexes, image_tensor, seg_tensor, pc2_tensor=pc_tensor)
image_tensor = image_tensor.flip(0)
seg_tensor = seg_tensor.flip(0)
pc_tensor = pc_tensor.flip(0)
# foreground mask
seg = seg_tensor[:,:,2] + 256*seg_tensor[:,:,1] + 256*256*seg_tensor[:,:,0]
mask = (seg != 0).unsqueeze(0).repeat((3, 1, 1)).float()
# RGB to BGR order
im = image_tensor.cpu().numpy()
im = np.clip(im, 0, 1)
im = im[:, :, (2, 1, 0)] * 255
im = im.astype(np.uint8)
# XYZ coordinates in camera frame
im_depth = pc_tensor.cpu().numpy()
im_depth = im_depth[:, :, :3]
im_depth_return = im_depth[:, :, 2].copy()
im_label = seg_tensor.cpu().numpy()
im_label = im_label[:, :, (2, 1, 0)] * 255
im_label = np.round(im_label).astype(np.uint8)
im_label = np.clip(im_label, 0, 255)
im_label, im_label_all = self.process_label_image(im_label)
centers = np.zeros((num, 2), dtype=np.float32)
rcenters = cfg.renderer.get_centers()
for i in range(num):
centers[i, 0] = rcenters[i][1] * width
centers[i, 1] = rcenters[i][0] * height
centers = centers[:num_target, :]
'''
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(3, 2, 1)
plt.imshow(im[:, :, (2, 1, 0)])
for i in range(num_target):
plt.plot(centers[i, 0], centers[i, 1], 'yo')
ax = fig.add_subplot(3, 2, 2)
plt.imshow(im_label)
ax = fig.add_subplot(3, 2, 3)
plt.imshow(im_depth[:, :, 0])
ax = fig.add_subplot(3, 2, 4)
plt.imshow(im_depth[:, :, 1])
ax = fig.add_subplot(3, 2, 5)
plt.imshow(im_depth[:, :, 2])
plt.show()
#'''
# chromatic transform
if cfg.TRAIN.CHROMATIC and cfg.MODE == 'TRAIN' and np.random.rand(1) > 0.1:
im = chromatic_transform(im)
im_cuda = torch.from_numpy(im).cuda().float() / 255.0
if cfg.TRAIN.ADD_NOISE and cfg.MODE == 'TRAIN' and np.random.rand(1) > 0.1:
im_cuda = add_noise_cuda(im_cuda)
im_cuda -= self._pixel_mean
im_cuda = im_cuda.permute(2, 0, 1)
if cfg.INPUT == 'DEPTH' or cfg.INPUT == 'RGBD':
# depth mask
z_im = im_depth[:, :, 2]
mask_depth = z_im > 0.0
mask_depth = mask_depth.astype('float')
mask_depth_cuda = torch.from_numpy(mask_depth).cuda().float()
mask_depth_cuda.unsqueeze_(0)
im_cuda_depth = torch.from_numpy(im_depth).cuda().float()
if cfg.TRAIN.ADD_NOISE and cfg.MODE == 'TRAIN' and np.random.rand(1) > 0.1:
im_cuda_depth = add_noise_depth_cuda(im_cuda_depth)
im_cuda_depth = im_cuda_depth.permute(2, 0, 1)
else:
im_cuda_depth = im_cuda.clone()
mask_depth_cuda = torch.cuda.FloatTensor(1, height, width).fill_(0)
# label blob
classes = np.array(range(self.num_classes))
label_blob = np.zeros((self.num_classes, self._height, self._width), dtype=np.float32)
label_blob[0, :, :] = 1.0
for i in range(1, self.num_classes):
I = np.where(im_label == classes[i])
if len(I[0]) > 0:
label_blob[i, I[0], I[1]] = 1.0
label_blob[0, I[0], I[1]] = 0.0
# poses and boxes
pose_blob = np.zeros((self.num_classes, 9), dtype=np.float32)
gt_boxes = np.zeros((self.num_classes, 5), dtype=np.float32)
for i in range(num_target):
cls = int(indexes_target[i])
pose_blob[i, 0] = 1
pose_blob[i, 1] = cls
T = poses_all[i][:3]
qt = poses_all[i][3:]
# egocentric to allocentric
qt_allocentric = egocentric2allocentric(qt, T)
if qt_allocentric[0] < 0:
qt_allocentric = -1 * qt_allocentric
pose_blob[i, 2:6] = qt_allocentric
pose_blob[i, 6:] = T
# compute box
x3d = np.ones((4, self._points_all.shape[1]), dtype=np.float32)
x3d[0, :] = self._points_all[cls,:,0]
x3d[1, :] = self._points_all[cls,:,1]
x3d[2, :] = self._points_all[cls,:,2]
RT = np.zeros((3, 4), dtype=np.float32)
RT[:3, :3] = quat2mat(qt)
RT[:, 3] = T
x2d = np.matmul(self._intrinsic_matrix, np.matmul(RT, x3d))
x2d[0, :] = np.divide(x2d[0, :], x2d[2, :])
x2d[1, :] = np.divide(x2d[1, :], x2d[2, :])
gt_boxes[i, 0] = np.min(x2d[0, :])
gt_boxes[i, 1] = np.min(x2d[1, :])
gt_boxes[i, 2] = np.max(x2d[0, :])
gt_boxes[i, 3] = np.max(x2d[1, :])
gt_boxes[i, 4] = cls
# construct the meta data
"""
format of the meta_data
intrinsic matrix: meta_data[0 ~ 8]
inverse intrinsic matrix: meta_data[9 ~ 17]
"""
K = self._intrinsic_matrix
K[2, 2] = 1
Kinv = np.linalg.pinv(K)
meta_data_blob = np.zeros(18, dtype=np.float32)
meta_data_blob[0:9] = K.flatten()
meta_data_blob[9:18] = Kinv.flatten()
# vertex regression target
if cfg.TRAIN.VERTEX_REG:
vertex_targets, vertex_weights = self._generate_vertex_targets(im_label, indexes_target, centers, poses_all, classes, self.num_classes)
elif cfg.TRAIN.VERTEX_REG_DELTA and cfg.INPUT == 'DEPTH' or cfg.INPUT == 'RGBD':
vertex_targets, vertex_weights = self._generate_vertex_deltas(im_label, indexes_target, centers, poses_all,
classes, self.num_classes, im_depth)
else:
vertex_targets = []
vertex_weights = []
im_info = np.array([im.shape[1], im.shape[2], cfg.TRAIN.SCALES_BASE[0], 1], dtype=np.float32)
sample = {'image_color': im_cuda,
'image_depth': im_cuda_depth,
'im_depth': im_depth_return,
'label': label_blob,
'mask': mask,
'mask_depth': mask_depth_cuda,
'meta_data': meta_data_blob,
'poses': pose_blob,
'extents': self._extents,
'points': self._point_blob,
'symmetry': self._symmetry,
'gt_boxes': gt_boxes,
'im_info': im_info}
if cfg.TRAIN.VERTEX_REG or cfg.TRAIN.VERTEX_REG_DELTA:
sample['vertex_targets'] = vertex_targets
sample['vertex_weights'] = vertex_weights
return sample
def __getitem__(self, index):
return self._render_item()
def __len__(self):
return self._size
# compute the voting label image in 2D
def _generate_vertex_targets(self, im_label, cls_indexes, center, poses, classes, num_classes):
width = im_label.shape[1]
height = im_label.shape[0]
vertex_targets = np.zeros((3 * num_classes, height, width), dtype=np.float32)
vertex_weights = np.zeros((3 * num_classes, height, width), dtype=np.float32)
c = np.zeros((2, 1), dtype=np.float32)
for i in range(1, num_classes):
y, x = np.where(im_label == classes[i])
I = np.where(im_label == classes[i])
ind = np.where(cls_indexes == classes[i])[0]
if len(x) > 0 and len(ind) > 0:
c[0] = center[ind, 0]
c[1] = center[ind, 1]
z = poses[int(ind)][2]
R = np.tile(c, (1, len(x))) - np.vstack((x, y))
# compute the norm
N = np.linalg.norm(R, axis=0) + 1e-10
# normalization
R = np.divide(R, np.tile(N, (2,1)))
# assignment
vertex_targets[3*i+0, y, x] = R[0,:]
vertex_targets[3*i+1, y, x] = R[1,:]
vertex_targets[3*i+2, y, x] = math.log(z)
vertex_weights[3*i+0, y, x] = cfg.TRAIN.VERTEX_W_INSIDE
vertex_weights[3*i+1, y, x] = cfg.TRAIN.VERTEX_W_INSIDE
vertex_weights[3*i+2, y, x] = cfg.TRAIN.VERTEX_W_INSIDE
return vertex_targets, vertex_weights
def _generate_vertex_deltas(self, im_label, cls_indexes, center, poses, classes, num_classes, im_depth):
x_image = im_depth[:, :, 0]
y_image = im_depth[:, :, 1]
z_image = im_depth[:, :, 2]
width = im_label.shape[1]
height = im_label.shape[0]
vertex_targets = np.zeros((3 * num_classes, height, width), dtype=np.float32)
vertex_weights = np.zeros((3 * num_classes, height, width), dtype=np.float32)
c = np.zeros((2, 1), dtype=np.float32)
for i in range(1, num_classes):
valid_mask = (z_image != 0.0)
label_mask = (im_label == classes[i])
fin_mask = valid_mask * label_mask
y, x = np.where(fin_mask)
ind = np.where(cls_indexes == classes[i])[0]
if len(x) > 0 and len(ind) > 0:
extents_here = self._extents[i, :]
largest_dim = np.sqrt(np.sum(extents_here * extents_here))
half_diameter = largest_dim / 2.0
c[0] = center[ind, 0]
c[1] = center[ind, 1]
if isinstance(poses, list):
x_center_coord = poses[int(ind)][0]
y_center_coord = poses[int(ind)][1]
z_center_coord = poses[int(ind)][2]
else:
if len(poses.shape) == 3:
x_center_coord = poses[0, 3, ind]
y_center_coord = poses[1, 3, ind]
z_center_coord = poses[2, 3, ind]
else:
x_center_coord = poses[ind, -3]
y_center_coord = poses[ind, -2]
z_center_coord = poses[ind, -1]
targets_x = (x_image[y, x] - x_center_coord) / half_diameter
targets_y = (y_image[y, x] - y_center_coord) / half_diameter
targets_z = (z_image[y, x] - z_center_coord) / half_diameter
vertex_targets[3 * i + 0, y, x] = targets_x
vertex_targets[3 * i + 1, y, x] = targets_y
vertex_targets[3 * i + 2, y, x] = targets_z
vertex_weights[3 * i + 0, y, x] = cfg.TRAIN.VERTEX_W_INSIDE
vertex_weights[3 * i + 1, y, x] = cfg.TRAIN.VERTEX_W_INSIDE
vertex_weights[3 * i + 2, y, x] = cfg.TRAIN.VERTEX_W_INSIDE
return vertex_targets, vertex_weights
def _get_default_path(self):
"""
Return the default path where ycb_object is expected to be installed.
"""
return os.path.join(datasets.ROOT_DIR, 'data', 'YCB_Object')
def _load_object_extents(self):
extents = np.zeros((self._num_classes_all, 3), dtype=np.float32)
for i in range(1, self._num_classes_all):
point_file = os.path.join(self._model_path, self._classes_all[i], 'points.xyz')
print(point_file)
assert os.path.exists(point_file), 'Path does not exist: {}'.format(point_file)
points = np.loadtxt(point_file)
extents[i, :] = 2 * np.max(np.absolute(points), axis=0)
return extents
def _load_object_points(self, classes, extents, symmetry):
points = [[] for _ in range(len(classes))]
num = np.inf
num_classes = len(classes)
for i in range(1, num_classes):
point_file = os.path.join(self._model_path, classes[i], 'points.xyz')
print(point_file)
assert os.path.exists(point_file), 'Path does not exist: {}'.format(point_file)
points[i] = np.loadtxt(point_file)
if points[i].shape[0] < num:
num = points[i].shape[0]
points_all = np.zeros((num_classes, num, 3), dtype=np.float32)
for i in range(1, num_classes):
points_all[i, :, :] = points[i][:num, :]
# rescale the points
point_blob = points_all.copy()
for i in range(1, num_classes):
# compute the rescaling factor for the points
weight = 10.0 / np.amax(extents[i, :])
if weight < 10:
weight = 10
if symmetry[i] > 0:
point_blob[i, :, :] = 4 * weight * point_blob[i, :, :]
else:
point_blob[i, :, :] = weight * point_blob[i, :, :]
return points, points_all, point_blob
def labels_to_image(self, labels):
height = labels.shape[0]
width = labels.shape[1]
im_label = np.zeros((height, width, 3), dtype=np.uint8)
for i in range(self.num_classes):
I = np.where(labels == i)
im_label[I[0], I[1], :] = self._class_colors[i]
return im_label
def process_label_image(self, label_image):
"""
change label image to label index
"""
height = label_image.shape[0]
width = label_image.shape[1]
labels = np.zeros((height, width), dtype=np.int32)
labels_all = np.zeros((height, width), dtype=np.int32)
# label image is in BGR order
index = label_image[:,:,2] + 256*label_image[:,:,1] + 256*256*label_image[:,:,0]
for i in range(1, len(self._class_colors_all)):
color = self._class_colors_all[i]
ind = color[0] + 256*color[1] + 256*256*color[2]
I = np.where(index == ind)
labels_all[I[0], I[1]] = i
ind = np.where(np.array(cfg.TRAIN.CLASSES) == i)[0]
if len(ind) > 0:
labels[I[0], I[1]] = ind
return labels, labels_all
================================================
FILE: lib/datasets/ycb_self_supervision.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import torch
import torch.utils.data as data
import os, math
import sys
import os.path as osp
from os.path import *
import numpy as np
import numpy.random as npr
import cv2
import scipy.io
import copy
import glob
try:
import cPickle # Use cPickle on Python 2.7
except ImportError:
import pickle as cPickle
import datasets
from fcn.config import cfg
from utils.blob import pad_im, chromatic_transform, add_noise, add_noise_cuda
from transforms3d.quaternions import mat2quat, quat2mat
from utils.se3 import *
from utils.pose_error import *
from utils.cython_bbox import bbox_overlaps
from utils.segmentation_evaluation import multilabel_metrics
def VOCap(rec, prec):
index = np.where(np.isfinite(rec))[0]
rec = rec[index]
prec = prec[index]
if len(rec) == 0 or len(prec) == 0:
ap = 0
else:
mrec = np.insert(rec, 0, 0)
mrec = np.append(mrec, 0.1)
mpre = np.insert(prec, 0, 0)
mpre = np.append(mpre, prec[-1])
for i in range(1, len(mpre)):
mpre[i] = max(mpre[i], mpre[i-1])
i = np.where(mrec[1:] != mrec[:-1])[0] + 1
ap = np.sum(np.multiply(mrec[i] - mrec[i-1], mpre[i])) * 10
return ap
class YCBSelfSupervision(data.Dataset, datasets.imdb):
def __init__(self, image_set, ycb_self_supervision_path = None):
self._name = 'ycb_self_supervision_' + image_set
self._image_set = image_set
self._ycb_self_supervision_path = self._get_default_path() if ycb_self_supervision_path is None \
else ycb_self_supervision_path
self._data_path = os.path.join(self._ycb_self_supervision_path, 'data')
self._model_path = os.path.join(datasets.ROOT_DIR, 'data', 'models')
# define all the classes
self._classes_all = ('__background__', '002_master_chef_can', '003_cracker_box', '004_sugar_box', '005_tomato_soup_can', '006_mustard_bottle', \
'007_tuna_fish_can', '008_pudding_box', '009_gelatin_box', '010_potted_meat_can', '011_banana', '019_pitcher_base', \
'021_bleach_cleanser', '024_bowl', '025_mug', '035_power_drill', '036_wood_block', '037_scissors', '040_large_marker', \
'051_large_clamp', '052_extra_large_clamp', '061_foam_brick', 'holiday_cup1', 'holiday_cup2', 'sanning_mug', \
'001_chips_can', 'block_red_big', 'block_green_big', 'block_blue_big', 'block_yellow_big', \
'block_red_small', 'block_green_small', 'block_blue_small', 'block_yellow_small', \
'block_red_median', 'block_green_median', 'block_blue_median', 'block_yellow_median', 'fusion_duplo_dude', 'cabinet_handle')
self._num_classes_all = len(self._classes_all)
self._class_colors_all = [(255, 255, 255), (255, 0, 0), (0, 255, 0), (0, 0, 255), (255, 255, 0), (255, 0, 255), (0, 255, 255), \
(0, 0, 128), (0, 128, 0), (128, 0, 0), (128, 128, 0), (128, 0, 128), (0, 128, 128), \
(0, 64, 0), (64, 0, 0), (0, 0, 64), (64, 64, 0), (64, 0, 64), (0, 64, 64), \
(192, 0, 0), (0, 192, 0), (0, 0, 192), (192, 192, 0), (192, 0, 192), (0, 192, 192), (32, 0, 0), \
(150, 0, 0), (0, 150, 0), (0, 0, 150), (150, 150, 0), (75, 0, 0), (0, 75, 0), (0, 0, 75), (75, 75, 0), \
(200, 0, 0), (0, 200, 0), (0, 0, 200), (200, 200, 0), (16, 16, 0), (16, 16, 16)]
self._extents_all = self._load_object_extents()
self._width = cfg.TRAIN.SYN_WIDTH
self._height = cfg.TRAIN.SYN_HEIGHT
self._intrinsic_matrix = np.array([[616.3653, 0., 310.25882],
[ 0., 616.20294, 236.59981],
[ 0., 0., 1. ]])
if self._width == 1280:
self._intrinsic_matrix = np.array([[599.48681641, 0., 639.84338379],
[ 0., 599.24389648, 366.09042358],
[ 0., 0., 1. ]])
# select a subset of classes
self._classes = [self._classes_all[i] for i in cfg.TRAIN.CLASSES]
self._classes_test = [self._classes_all[i] for i in cfg.TEST.CLASSES]
self._num_classes = len(self._classes)
self._class_colors = [self._class_colors_all[i] for i in cfg.TRAIN.CLASSES]
self._class_colors_test = [self._class_colors_all[i] for i in cfg.TEST.CLASSES]
self._symmetry = np.array(cfg.TRAIN.SYMMETRY).astype(np.float32)
self._symmetry_test = np.array(cfg.TEST.SYMMETRY).astype(np.float32)
self._extents = self._extents_all[cfg.TRAIN.CLASSES]
self._extents_test = self._extents_all[cfg.TEST.CLASSES]
self._points, self._points_all, self._point_blob = self._load_object_points(self._classes, self._extents, self._symmetry)
self._points_test, self._points_all_test, self._point_blob_test = \
self._load_object_points(self._classes_test, self._extents_test, self._symmetry_test)
self._pixel_mean = torch.tensor(cfg.PIXEL_MEANS / 255.0).cuda().float()
self._classes_other = []
for i in range(self._num_classes_all):
if i not in cfg.TRAIN.CLASSES:
# do not use clamp
if i == 19 and 20 in cfg.TRAIN.CLASSES:
continue
if i == 20 and 19 in cfg.TRAIN.CLASSES:
continue
self._classes_other.append(i)
self._num_classes_other = len(self._classes_other)
# 3D model paths
self.model_sdf_paths = ['{}/{}/textured_simple_low_res.pth'.format(self._model_path, cls) for cls in self._classes_all[1:]]
self.model_colors = [np.array(self._class_colors_all[i]) / 255.0 for i in range(1, len(self._classes_all))]
self.model_mesh_paths = []
for cls in self._classes_all[1:]:
filename = '{}/{}/textured_simple.ply'.format(self._model_path, cls)
if osp.exists(filename):
self.model_mesh_paths.append(filename)
continue
filename = '{}/{}/textured_simple.obj'.format(self._model_path, cls)
if osp.exists(filename):
self.model_mesh_paths.append(filename)
self.model_texture_paths = []
for cls in self._classes_all[1:]:
filename = '{}/{}/texture_map.png'.format(self._model_path, cls)
if osp.exists(filename):
self.model_texture_paths.append(filename)
else:
self.model_texture_paths.append('')
# target meshes
self.model_colors_target = [np.array(self._class_colors_all[i]) / 255.0 for i in cfg.TRAIN.CLASSES[1:]]
self.model_mesh_paths_target = []
for cls in self._classes[1:]:
filename = '{}/{}/textured_simple.ply'.format(self._model_path, cls)
if osp.exists(filename):
self.model_mesh_paths_target.append(filename)
continue
filename = '{}/{}/textured_simple.obj'.format(self._model_path, cls)
if osp.exists(filename):
self.model_mesh_paths_target.append(filename)
self.model_texture_paths_target = []
for cls in self._classes[1:]:
filename = '{}/{}/texture_map.png'.format(self._model_path, cls)
if osp.exists(filename):
self.model_texture_paths_target.append(filename)
else:
self.model_texture_paths_target.append('')
self._class_to_ind = dict(zip(self._classes, range(self._num_classes)))
self._image_ext = '.png'
self._image_index = self._load_image_set_index(image_set)
if (cfg.MODE == 'TRAIN' and cfg.TRAIN.SYNTHESIZE) or (cfg.MODE == 'TEST' and cfg.TEST.SYNTHESIZE):
self._size = len(self._image_index) * (cfg.TRAIN.SYN_RATIO+1)
else:
self._size = len(self._image_index)
if self._size > cfg.TRAIN.MAX_ITERS_PER_EPOCH * cfg.TRAIN.IMS_PER_BATCH:
self._size = cfg.TRAIN.MAX_ITERS_PER_EPOCH * cfg.TRAIN.IMS_PER_BATCH
self._roidb = self.gt_roidb()
if cfg.MODE == 'TRAIN' or cfg.TEST.VISUALIZE:
self._perm = np.random.permutation(np.arange(len(self._roidb)))
else:
self._perm = np.arange(len(self._roidb))
self._cur = 0
self._build_uniform_poses()
self.lb_shift = -0.2
self.ub_shift = 0.2
self.lb_scale = 0.8
self.ub_scale = 2.0
assert os.path.exists(self._ycb_self_supervision_path), \
'ycb_self_supervision path does not exist: {}'.format(self._ycb_self_supervision_path)
assert os.path.exists(self._data_path), \
'Data path does not exist: {}'.format(self._data_path)
# construct fake inputs
label_blob = np.zeros((1, self._num_classes, self._height, self._width), dtype=np.float32)
pose_blob = np.zeros((1, self._num_classes, 9), dtype=np.float32)
gt_boxes = np.zeros((1, self._num_classes, 5), dtype=np.float32)
# construct the meta data
K = self._intrinsic_matrix
Kinv = np.linalg.pinv(K)
meta_data_blob = np.zeros((1, 18), dtype=np.float32)
meta_data_blob[0, 0:9] = K.flatten()
meta_data_blob[0, 9:18] = Kinv.flatten()
self.input_labels = torch.from_numpy(label_blob).cuda()
self.input_meta_data = torch.from_numpy(meta_data_blob).cuda()
self.input_extents = torch.from_numpy(self._extents).cuda()
self.input_gt_boxes = torch.from_numpy(gt_boxes).cuda()
self.input_poses = torch.from_numpy(pose_blob).cuda()
self.input_points = torch.from_numpy(self._point_blob).cuda()
self.input_symmetry = torch.from_numpy(self._symmetry).cuda()
def _render_item(self):
height = cfg.TRAIN.SYN_HEIGHT
width = cfg.TRAIN.SYN_WIDTH
fx = self._intrinsic_matrix[0, 0]
fy = self._intrinsic_matrix[1, 1]
px = self._intrinsic_matrix[0, 2]
py = self._intrinsic_matrix[1, 2]
zfar = 6.0
znear = 0.01
# sample target objects
if cfg.TRAIN.SYN_SAMPLE_OBJECT:
maxnum = np.minimum(self.num_classes-1, cfg.TRAIN.SYN_MAX_OBJECT)
num = np.random.randint(cfg.TRAIN.SYN_MIN_OBJECT, maxnum+1)
perm = np.random.permutation(np.arange(self.num_classes-1))
indexes_target = perm[:num] + 1
else:
num = self.num_classes - 1
indexes_target = np.arange(num) + 1
num_target = num
cls_indexes = [cfg.TRAIN.CLASSES[i]-1 for i in indexes_target]
# sample other objects as distractors
if cfg.TRAIN.SYN_SAMPLE_DISTRACTOR:
num_other = min(5, self._num_classes_other)
num_selected = np.random.randint(0, num_other+1)
perm = np.random.permutation(np.arange(self._num_classes_other))
indexes = perm[:num_selected]
for i in range(num_selected):
cls_indexes.append(self._classes_other[indexes[i]]-1)
else:
num_selected = 0
# sample poses
num = num_target + num_selected
poses_all = []
for i in range(num):
qt = np.zeros((7, ), dtype=np.float32)
# rotation
cls = int(cls_indexes[i])
if self.pose_indexes[cls] >= len(self.pose_lists[cls]):
self.pose_indexes[cls] = 0
self.pose_lists[cls] = np.random.permutation(np.arange(len(self.eulers)))
yaw = self.eulers[self.pose_lists[cls][self.pose_indexes[cls]]][0] + 15 * np.random.randn()
pitch = self.eulers[self.pose_lists[cls][self.pose_indexes[cls]]][1] + 15 * np.random.randn()
pitch = np.clip(pitch, -90, 90)
roll = self.eulers[self.pose_lists[cls][self.pose_indexes[cls]]][2] + 15 * np.random.randn()
qt[3:] = euler2quat(yaw * math.pi / 180.0, pitch * math.pi / 180.0, roll * math.pi / 180.0, 'syxz')
self.pose_indexes[cls] += 1
# translation
bound = cfg.TRAIN.SYN_BOUND
if i == 0 or i >= num_target or np.random.rand(1) > 0.5:
qt[0] = np.random.uniform(-bound, bound)
qt[1] = np.random.uniform(-bound, bound)
qt[2] = np.random.uniform(cfg.TRAIN.SYN_TNEAR, cfg.TRAIN.SYN_TFAR)
else:
# sample an object nearby
object_id = np.random.randint(0, i, size=1)[0]
extent = np.mean(self._extents_all[cls+1, :])
flag = np.random.randint(0, 2)
if flag == 0:
flag = -1
qt[0] = poses_all[object_id][0] + flag * extent * np.random.uniform(1.0, 1.5)
if np.absolute(qt[0]) > bound:
qt[0] = poses_all[object_id][0] - flag * extent * np.random.uniform(1.0, 1.5)
if np.absolute(qt[0]) > bound:
qt[0] = np.random.uniform(-bound, bound)
flag = np.random.randint(0, 2)
if flag == 0:
flag = -1
qt[1] = poses_all[object_id][1] + flag * extent * np.random.uniform(1.0, 1.5)
if np.absolute(qt[1]) > bound:
qt[1] = poses_all[object_id][1] - flag * extent * np.random.uniform(1.0, 1.5)
if np.absolute(qt[1]) > bound:
qt[1] = np.random.uniform(-bound, bound)
qt[2] = poses_all[object_id][2] - extent * np.random.uniform(2.0, 4.0)
if qt[2] < cfg.TRAIN.SYN_TNEAR:
qt[2] = poses_all[object_id][2] + extent * np.random.uniform(2.0, 4.0)
poses_all.append(qt)
cfg.renderer.set_poses(poses_all)
# sample lighting
cfg.renderer.set_light_pos(np.random.uniform(-0.5, 0.5, 3))
intensity = np.random.uniform(0.8, 2)
light_color = intensity * np.random.uniform(0.9, 1.1, 3)
cfg.renderer.set_light_color(light_color)
# rendering
cfg.renderer.set_projection_matrix(width, height, fx, fy, px, py, znear, zfar)
image_tensor = torch.cuda.FloatTensor(height, width, 4).detach()
seg_tensor = torch.cuda.FloatTensor(height, width, 4).detach()
pc_tensor = torch.cuda.FloatTensor(height, width, 4).detach()
cfg.renderer.render(cls_indexes, image_tensor, seg_tensor, pc2_tensor=pc_tensor)
image_tensor = image_tensor.flip(0)
seg_tensor = seg_tensor.flip(0)
pc_tensor = pc_tensor.flip(0)
# foreground mask
seg = seg_tensor[:,:,2] + 256*seg_tensor[:,:,1] + 256*256*seg_tensor[:,:,0]
mask = (seg != 0).unsqueeze(0).repeat((3, 1, 1)).float()
# RGB to BGR order
im = image_tensor.cpu().numpy()
im = np.clip(im, 0, 1)
im = im[:, :, (2, 1, 0)] * 255
im = im.astype(np.uint8)
# XYZ coordinates in camera frame
im_depth = pc_tensor.cpu().numpy()
im_depth = im_depth[:, :, :3]
im_depth_return = im_depth[:, :, 2].copy()
im_label = seg_tensor.cpu().numpy()
im_label = im_label[:, :, (2, 1, 0)] * 255
im_label = np.round(im_label).astype(np.uint8)
im_label = np.clip(im_label, 0, 255)
im_label, im_label_all = self.process_label_image(im_label)
centers = np.zeros((num, 2), dtype=np.float32)
rcenters = cfg.renderer.get_centers()
for i in range(num):
centers[i, 0] = rcenters[i][1] * width
centers[i, 1] = rcenters[i][0] * height
centers = centers[:num_target, :]
'''
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(3, 2, 1)
plt.imshow(im[:, :, (2, 1, 0)])
for i in range(num_target):
plt.plot(centers[i, 0], centers[i, 1], 'yo')
ax = fig.add_subplot(3, 2, 2)
plt.imshow(im_label)
ax = fig.add_subplot(3, 2, 3)
plt.imshow(im_depth[:, :, 0])
ax = fig.add_subplot(3, 2, 4)
plt.imshow(im_depth[:, :, 1])
ax = fig.add_subplot(3, 2, 5)
plt.imshow(im_depth[:, :, 2])
plt.show()
#'''
# chromatic transform
if cfg.TRAIN.CHROMATIC and cfg.MODE == 'TRAIN' and np.random.rand(1) > 0.1:
im = chromatic_transform(im)
im_cuda = torch.from_numpy(im).cuda().float() / 255.0
if cfg.TRAIN.ADD_NOISE and cfg.MODE == 'TRAIN' and np.random.rand(1) > 0.1:
im_cuda = add_noise_cuda(im_cuda)
im_cuda -= self._pixel_mean
im_cuda = im_cuda.permute(2, 0, 1)
if cfg.INPUT == 'DEPTH' or cfg.INPUT == 'RGBD':
# depth mask
z_im = im_depth[:, :, 2]
mask_depth = z_im > 0.0
mask_depth = mask_depth.astype('float')
mask_depth_cuda = torch.from_numpy(mask_depth).cuda().float()
mask_depth_cuda.unsqueeze_(0)
im_cuda_depth = torch.from_numpy(im_depth).cuda().float()
if cfg.TRAIN.ADD_NOISE and cfg.MODE == 'TRAIN' and np.random.rand(1) > 0.1:
im_cuda_depth = add_noise_depth_cuda(im_cuda_depth)
im_cuda_depth = im_cuda_depth.permute(2, 0, 1)
else:
im_cuda_depth = im_cuda.clone()
mask_depth_cuda = torch.cuda.FloatTensor(1, height, width).fill_(0)
# label blob
classes = np.array(range(self.num_classes))
label_blob = np.zeros((self.num_classes, self._height, self._width), dtype=np.float32)
label_blob[0, :, :] = 1.0
for i in range(1, self.num_classes):
I = np.where(im_label == classes[i])
if len(I[0]) > 0:
label_blob[i, I[0], I[1]] = 1.0
label_blob[0, I[0], I[1]] = 0.0
# poses and boxes
pose_blob = np.zeros((self.num_classes, 9), dtype=np.float32)
gt_boxes = np.zeros((self.num_classes, 5), dtype=np.float32)
count = 0
for i in range(num_target):
cls = int(indexes_target[i])
T = poses_all[i][:3]
qt = poses_all[i][3:]
I = np.where(im_label == cls)
if len(I[0]) == 0:
continue
# compute box
x3d = np.ones((4, self._points_all.shape[1]), dtype=np.float32)
x3d[0, :] = self._points_all[cls,:,0]
x3d[1, :] = self._points_all[cls,:,1]
x3d[2, :] = self._points_all[cls,:,2]
RT = np.zeros((3, 4), dtype=np.float32)
RT[:3, :3] = quat2mat(qt)
RT[:, 3] = T
x2d = np.matmul(self._intrinsic_matrix, np.matmul(RT, x3d))
x2d[0, :] = np.divide(x2d[0, :], x2d[2, :])
x2d[1, :] = np.divide(x2d[1, :], x2d[2, :])
x1 = np.min(x2d[0, :])
y1 = np.min(x2d[1, :])
x2 = np.max(x2d[0, :])
y2 = np.max(x2d[1, :])
if x1 > width or y1 > height or x2 < 0 or y2 < 0:
continue
gt_boxes[count, 0] = x1
gt_boxes[count, 1] = y1
gt_boxes[count, 2] = x2
gt_boxes[count, 3] = y2
gt_boxes[count, 4] = cls
pose_blob[count, 0] = 1
pose_blob[count, 1] = cls
# egocentric to allocentric
qt_allocentric = egocentric2allocentric(qt, T)
if qt_allocentric[0] < 0:
qt_allocentric = -1 * qt_allocentric
pose_blob[count, 2:6] = qt_allocentric
pose_blob[count, 6:] = T
count += 1
# construct the meta data
"""
format of the meta_data
intrinsic matrix: meta_data[0 ~ 8]
inverse intrinsic matrix: meta_data[9 ~ 17]
"""
K = self._intrinsic_matrix
K[2, 2] = 1
Kinv = np.linalg.pinv(K)
meta_data_blob = np.zeros(18, dtype=np.float32)
meta_data_blob[0:9] = K.flatten()
meta_data_blob[9:18] = Kinv.flatten()
# vertex regression target
if cfg.TRAIN.VERTEX_REG:
vertex_targets, vertex_weights = self._generate_vertex_targets(im_label, indexes_target, centers, poses_all, classes, self.num_classes)
elif cfg.TRAIN.VERTEX_REG_DELTA and cfg.INPUT == 'DEPTH' or cfg.INPUT == 'RGBD':
vertex_targets, vertex_weights = self._generate_vertex_deltas(im_label, indexes_target, centers, poses_all,
classes, self.num_classes, im_depth)
else:
vertex_targets = []
vertex_weights = []
im_info = np.array([im.shape[1], im.shape[2], cfg.TRAIN.SCALES_BASE[0], 1], dtype=np.float32)
sample = {'image_color': im_cuda,
'im_depth': im_depth_return,
'label': label_blob,
'mask': mask,
'meta_data': meta_data_blob,
'poses': pose_blob,
'extents': self._extents,
'points': self._point_blob,
'symmetry': self._symmetry,
'gt_boxes': gt_boxes,
'im_info': im_info,
'meta_data_path': ''}
if cfg.TRAIN.VERTEX_REG or cfg.TRAIN.VERTEX_REG_DELTA:
sample['vertex_targets'] = vertex_targets
sample['vertex_weights'] = vertex_weights
# affine transformation
if cfg.TRAIN.AFFINE:
shift = np.float32([np.random.uniform(self.lb_shift, self.ub_shift), np.random.uniform(self.lb_shift, self.ub_shift)])
scale = np.random.uniform(self.lb_scale, self.ub_scale)
affine_matrix = np.float32([[scale, 0, shift[0]], [0, scale, shift[1]]])
affine_1 = np.eye(3, dtype=np.float32)
affine_1[0, 2] = -0.5 * self._width
affine_1[1, 2] = -0.5 * self._height
affine_2 = np.eye(3, dtype=np.float32)
affine_2[0, 0] = 1.0 / scale
affine_2[0, 2] = -shift[0] * 0.5 * self._width / scale
affine_2[1, 1] = 1.0 / scale
affine_2[1, 2] = -shift[1] * 0.5 * self._height / scale
affine_3 = np.matmul(affine_2, affine_1)
affine_4 = np.matmul(np.linalg.inv(affine_1), affine_3)
affine_matrix_coordinate = affine_4[:3, :]
sample['affine_matrix'] = torch.from_numpy(affine_matrix).cuda()
sample['affine_matrix_coordinate'] = torch.from_numpy(affine_matrix_coordinate).cuda()
return sample
def __getitem__(self, index):
is_syn = 0
if ((cfg.MODE == 'TRAIN' and cfg.TRAIN.SYNTHESIZE) or (cfg.MODE == 'TEST' and cfg.TEST.SYNTHESIZE)) and (index % (cfg.TRAIN.SYN_RATIO+1) != 0):
is_syn = 1
if is_syn:
return self._render_item()
if self._cur >= len(self._roidb):
self._perm = np.random.permutation(np.arange(len(self._roidb)))
self._cur = 0
db_ind = self._perm[self._cur]
roidb = self._roidb[db_ind]
self._cur += 1
# Get the input image blob
random_scale_ind = npr.randint(0, high=len(cfg.TRAIN.SCALES_BASE))
im_blob, im_depth, im_scale, height, width = self._get_image_blob(roidb, random_scale_ind)
# build the label blob
label_blob, mask, meta_data_blob, pose_blob, gt_boxes, vertex_targets, vertex_weights \
= self._get_label_blob(roidb, self._num_classes, im_scale, height, width)
im_info = np.array([im_blob.shape[1], im_blob.shape[2], im_scale, is_syn], dtype=np.float32)
mask_depth_cuda = torch.cuda.FloatTensor(1, height, width).fill_(0)
sample = {'image_color': im_blob,
'im_depth': im_depth,
'label': label_blob,
'mask': mask,
'meta_data': meta_data_blob,
'poses': pose_blob,
'extents': self._extents,
'points': self._point_blob,
'symmetry': self._symmetry,
'gt_boxes': gt_boxes,
'im_info': im_info,
'meta_data_path': roidb['meta_data']}
if cfg.TRAIN.VERTEX_REG:
sample['vertex_targets'] = vertex_targets
sample['vertex_weights'] = vertex_weights
return sample
def _get_image_blob(self, roidb, scale_ind):
# rgba
rgba = pad_im(cv2.imread(roidb['image'], cv2.IMREAD_UNCHANGED), 16)
if rgba.shape[2] == 4:
im = np.copy(rgba[:,:,:3])
alpha = rgba[:,:,3]
I = np.where(alpha == 0)
im[I[0], I[1], :] = 0
else:
im = rgba
im_scale = cfg.TRAIN.SCALES_BASE[scale_ind]
if im_scale != 1.0:
im = cv2.resize(im, None, None, fx=im_scale, fy=im_scale, interpolation=cv2.INTER_LINEAR)
height = im.shape[0]
width = im.shape[1]
if roidb['flipped']:
im = im[:, ::-1, :]
# chromatic transform
if cfg.TRAIN.CHROMATIC and cfg.MODE == 'TRAIN' and np.random.rand(1) > 0.1:
im = chromatic_transform(im)
im_cuda = torch.from_numpy(im).cuda().float() / 255.0
if cfg.TRAIN.ADD_NOISE and cfg.MODE == 'TRAIN' and np.random.rand(1) > 0.1:
im_cuda = add_noise_cuda(im_cuda)
im_cuda -= self._pixel_mean
im_cuda = im_cuda.permute(2, 0, 1)
# depth image
im_depth = pad_im(cv2.imread(roidb['depth'], cv2.IMREAD_UNCHANGED), 16)
if im_scale != 1.0:
im_depth = cv2.resize(im_depth, None, None, fx=im_scale, fy=im_scale, interpolation=cv2.INTER_NEAREST)
im_depth = im_depth.astype(np.float32) / 1000.0
return im_cuda, im_depth, im_scale, height, width
def _get_label_blob(self, roidb, num_classes, im_scale, height, width):
""" build the label blob """
meta_data = scipy.io.loadmat(roidb['meta_data'])
meta_data['cls_indexes'] = meta_data['cls_indexes'].flatten()
classes = np.array(cfg.TRAIN.CLASSES)
classes_test = np.array(cfg.TEST.CLASSES).flatten()
intrinsic_matrix = np.matrix(meta_data['intrinsic_matrix'])
fx = intrinsic_matrix[0, 0]
fy = intrinsic_matrix[1, 1]
px = intrinsic_matrix[0, 2]
py = intrinsic_matrix[1, 2]
zfar = 6.0
znear = 0.01
# poses
poses = meta_data['poses']
if len(poses.shape) == 2:
poses = np.reshape(poses, (3, 4, 1))
if roidb['flipped']:
poses = _flip_poses(poses, meta_data['intrinsic_matrix'], width)
num = poses.shape[2]
# render poses to get the label image
cls_indexes = []
poses_all = []
qt = np.zeros((7, ), dtype=np.float32)
for i in range(num):
RT = poses[:, :, i]
qt[:3] = RT[:, 3]
qt[3:] = mat2quat(RT[:, :3])
if cfg.MODE == 'TEST':
index = np.where(classes_test == meta_data['cls_indexes'][i])[0]
cls_indexes.append(index[0])
else:
cls_indexes.append(meta_data['cls_indexes'][i] - 1)
poses_all.append(qt.copy())
# rendering
cfg.renderer.set_poses(poses_all)
cfg.renderer.set_light_pos([0, 0, 0])
cfg.renderer.set_light_color([1, 1, 1])
cfg.renderer.set_projection_matrix(width, height, fx, fy, px, py, znear, zfar)
image_tensor = torch.cuda.FloatTensor(height, width, 4).detach()
seg_tensor = torch.cuda.FloatTensor(height, width, 4).detach()
cfg.renderer.render(cls_indexes, image_tensor, seg_tensor)
image_tensor = image_tensor.flip(0)
seg_tensor = seg_tensor.flip(0)
# semantic labels
im_label = seg_tensor.cpu().numpy()
im_label = im_label[:, :, (2, 1, 0)] * 255
im_label = np.round(im_label).astype(np.uint8)
im_label = np.clip(im_label, 0, 255)
im_label, im_label_all = self.process_label_image(im_label)
centers = np.zeros((num, 2), dtype=np.float32)
rcenters = cfg.renderer.get_centers()
for i in range(num):
centers[i, 0] = rcenters[i][1] * width
centers[i, 1] = rcenters[i][0] * height
# label blob
label_blob = np.zeros((num_classes, height, width), dtype=np.float32)
label_blob[0, :, :] = 1.0
for i in range(1, num_classes):
I = np.where(im_label_all == classes[i])
if len(I[0]) > 0:
label_blob[i, I[0], I[1]] = 1.0
label_blob[0, I[0], I[1]] = 0.0
'''
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(1, 2, 1)
plt.imshow(im_label)
for i in range(num):
plt.plot(centers[i, 0], centers[i, 1], 'yo')
ax = fig.add_subplot(1, 2, 2)
plt.imshow(im_label_all)
plt.show()
#'''
# foreground mask
seg = torch.from_numpy((im_label != 0).astype(np.float32))
mask = seg.unsqueeze(0).repeat((3, 1, 1)).float().cuda()
# gt poses
pose_blob = np.zeros((num_classes, 9), dtype=np.float32)
gt_boxes = np.zeros((num_classes, 5), dtype=np.float32)
count = 0
for i in range(num):
cls = int(meta_data['cls_indexes'][i])
ind = np.where(classes == cls)[0]
if len(ind) > 0:
I = np.where(im_label == ind[0])
if len(I[0]) == 0:
continue
R = poses[:, :3, i]
T = poses[:, 3, i]
# compute box
x3d = np.ones((4, self._points_all.shape[1]), dtype=np.float32)
x3d[0, :] = self._points_all[ind,:,0]
x3d[1, :] = self._points_all[ind,:,1]
x3d[2, :] = self._points_all[ind,:,2]
RT = np.zeros((3, 4), dtype=np.float32)
RT[:3, :3] = R
RT[:, 3] = T
x2d = np.matmul(meta_data['intrinsic_matrix'], np.matmul(RT, x3d))
x2d[0, :] = np.divide(x2d[0, :], x2d[2, :])
x2d[1, :] = np.divide(x2d[1, :], x2d[2, :])
x1 = np.min(x2d[0, :]) * im_scale
y1 = np.min(x2d[1, :]) * im_scale
x2 = np.max(x2d[0, :]) * im_scale
y2 = np.max(x2d[1, :]) * im_scale
if x1 > width or y1 > height or x2 < 0 or y2 < 0:
continue
gt_boxes[count, 0] = x1
gt_boxes[count, 1] = y1
gt_boxes[count, 2] = x2
gt_boxes[count, 3] = y2
gt_boxes[count, 4] = ind
# pose
pose_blob[count, 0] = 1
pose_blob[count, 1] = ind
qt = mat2quat(R)
# egocentric to allocentric
qt_allocentric = egocentric2allocentric(qt, T)
if qt_allocentric[0] < 0:
qt_allocentric = -1 * qt_allocentric
pose_blob[count, 2:6] = qt_allocentric
pose_blob[count, 6:] = T
count += 1
# construct the meta data
"""
format of the meta_data
intrinsic matrix: meta_data[0 ~ 8]
inverse intrinsic matrix: meta_data[9 ~ 17]
"""
K = np.matrix(meta_data['intrinsic_matrix']) * im_scale
K[2, 2] = 1
Kinv = np.linalg.pinv(K)
meta_data_blob = np.zeros(18, dtype=np.float32)
meta_data_blob[0:9] = K.flatten()
meta_data_blob[9:18] = Kinv.flatten()
# vertex regression target
if cfg.TRAIN.VERTEX_REG:
if roidb['flipped']:
centers[:, 0] = width - centers[:, 0]
vertex_targets, vertex_weights = self._generate_vertex_targets(im_label_all, meta_data['cls_indexes'], \
centers, poses_all, classes, num_classes)
else:
vertex_targets = []
vertex_weights = []
return label_blob, mask, meta_data_blob, pose_blob, gt_boxes, vertex_targets, vertex_weights
# compute the voting label image in 2D
def _generate_vertex_targets(self, im_label, cls_indexes, center, poses, classes, num_classes):
width = im_label.shape[1]
height = im_label.shape[0]
vertex_targets = np.zeros((3 * num_classes, height, width), dtype=np.float32)
vertex_weights = np.zeros((3 * num_classes, height, width), dtype=np.float32)
c = np.zeros((2, 1), dtype=np.float32)
for i in range(1, num_classes):
y, x = np.where(im_label == classes[i])
I = np.where(im_label == classes[i])
ind = np.where(cls_indexes == classes[i])[0]
if len(x) > 0 and len(ind) > 0:
c[0] = center[ind, 0]
c[1] = center[ind, 1]
z = poses[int(ind)][2]
R = np.tile(c, (1, len(x))) - np.vstack((x, y))
# compute the norm
N = np.linalg.norm(R, axis=0) + 1e-10
# normalization
R = np.divide(R, np.tile(N, (2,1)))
# assignment
vertex_targets[3*i+0, y, x] = R[0,:]
vertex_targets[3*i+1, y, x] = R[1,:]
vertex_targets[3*i+2, y, x] = math.log(z)
vertex_weights[3*i+0, y, x] = cfg.TRAIN.VERTEX_W_INSIDE
vertex_weights[3*i+1, y, x] = cfg.TRAIN.VERTEX_W_INSIDE
vertex_weights[3*i+2, y, x] = cfg.TRAIN.VERTEX_W_INSIDE
return vertex_targets, vertex_weights
def __len__(self):
return self._size
def _get_default_path(self):
"""
Return the default path where ycb_self_supervision is expected to be installed.
"""
return os.path.join(datasets.ROOT_DIR, 'data', 'YCB_Self_Supervision')
def _load_image_set_index(self, image_set):
"""
Load the indexes of images in the data folder
"""
image_set_file = os.path.join(self._ycb_self_supervision_path, image_set + '.txt')
assert os.path.exists(image_set_file), \
'Path does not exist: {}'.format(image_set_file)
subdirs = []
with open(image_set_file) as f:
for x in f.readlines():
subdirs.append(x.rstrip('\n'))
image_index = []
for i in range(len(subdirs)):
subdir = subdirs[i]
folder = osp.join(self._data_path, subdir)
filename = os.path.join(folder, '*.mat')
files = glob.glob(filename)
print(subdir, len(files))
for k in range(len(files)):
filename = files[k]
head, name = os.path.split(filename)
index = subdir + '/' + name[:-9]
image_index.append(index)
print('=======================================================')
print('%d image in %s' % (len(image_index), self._data_path))
print('=======================================================')
return image_index
def _load_object_points(self, classes, extents, symmetry):
points = [[] for _ in range(len(classes))]
num = np.inf
num_classes = len(classes)
for i in range(1, num_classes):
point_file = os.path.join(self._model_path, classes[i], 'points.xyz')
print(point_file)
assert os.path.exists(point_file), 'Path does not exist: {}'.format(point_file)
points[i] = np.loadtxt(point_file)
if points[i].shape[0] < num:
num = points[i].shape[0]
points_all = np.zeros((num_classes, num, 3), dtype=np.float32)
for i in range(1, num_classes):
points_all[i, :, :] = points[i][:num, :]
# rescale the points
point_blob = points_all.copy()
for i in range(1, num_classes):
# compute the rescaling factor for the points
weight = 10.0 / np.amax(extents[i, :])
if weight < 10:
weight = 10
if symmetry[i] > 0:
point_blob[i, :, :] = 4 * weight * point_blob[i, :, :]
else:
point_blob[i, :, :] = weight * point_blob[i, :, :]
return points, points_all, point_blob
def _load_object_extents(self):
extents = np.zeros((self._num_classes_all, 3), dtype=np.float32)
for i in range(1, self._num_classes_all):
point_file = os.path.join(self._model_path, self._classes_all[i], 'points.xyz')
print(point_file)
assert os.path.exists(point_file), 'Path does not exist: {}'.format(point_file)
points = np.loadtxt(point_file)
extents[i, :] = 2 * np.max(np.absolute(points), axis=0)
return extents
# image
def image_path_at(self, i):
"""
Return the absolute path to image i in the image sequence.
"""
return self.image_path_from_index(self.image_index[i])
def image_path_from_index(self, index):
"""
Construct an image path from the image's "index" identifier.
"""
image_path_jpg = os.path.join(self._data_path, index + '_color.jpg')
image_path_png = os.path.join(self._data_path, index + '_color.png')
if os.path.exists(image_path_jpg):
return image_path_jpg
elif os.path.exists(image_path_png):
return image_path_png
assert os.path.exists(image_path_jpg) or os.path.exists(image_path_png), \
'Path does not exist: {} or {}'.format(image_path_jpg, image_path_png)
# depth
def depth_path_at(self, i):
"""
Return the absolute path to depth i in the image sequence.
"""
return self.depth_path_from_index(self.image_index[i])
def depth_path_from_index(self, index):
"""
Construct an depth path from the image's "index" identifier.
"""
depth_path = os.path.join(self._data_path, index + '_depth' + self._image_ext)
assert os.path.exists(depth_path), \
'Path does not exist: {}'.format(depth_path)
return depth_path
# camera pose
def metadata_path_at(self, i):
"""
Return the absolute path to metadata i in the image sequence.
"""
return self.metadata_path_from_index(self.image_index[i])
def metadata_path_from_index(self, index):
"""
Construct an metadata path from the image's "index" identifier.
"""
metadata_path = os.path.join(self._data_path, index + '_meta.mat')
assert os.path.exists(metadata_path), \
'Path does not exist: {}'.format(metadata_path)
return metadata_path
def gt_roidb(self):
"""
Return the database of ground-truth regions of interest.
This function loads/saves from/to a cache file to speed up future calls.
"""
gt_roidb = [self._load_ycb_self_supervision_annotation(index)
for index in self._image_index]
return gt_roidb
def _load_ycb_self_supervision_annotation(self, index):
"""
Load class name and meta data
"""
# image path
image_path = self.image_path_from_index(index)
# depth path
depth_path = self.depth_path_from_index(index)
# metadata path
metadata_path = self.metadata_path_from_index(index)
return {'image': image_path,
'depth': depth_path,
'meta_data': metadata_path,
'flipped': False}
def labels_to_image(self, labels):
height = labels.shape[0]
width = labels.shape[1]
im_label = np.zeros((height, width, 3), dtype=np.uint8)
for i in range(self.num_classes):
I = np.where(labels == i)
im_label[I[0], I[1], :] = self._class_colors[i]
return im_label
def process_label_image(self, label_image):
"""
change label image to label index
"""
height = label_image.shape[0]
width = label_image.shape[1]
labels = np.zeros((height, width), dtype=np.int32)
labels_all = np.zeros((height, width), dtype=np.int32)
# label image is in BGR order
index = label_image[:,:,2] + 256*label_image[:,:,1] + 256*256*label_image[:,:,0]
for i in range(1, len(self._class_colors_all)):
color = self._class_colors_all[i]
ind = color[0] + 256*color[1] + 256*256*color[2]
I = np.where(index == ind)
labels_all[I[0], I[1]] = i
ind = np.where(np.array(cfg.TRAIN.CLASSES) == i)[0]
if len(ind) > 0:
labels[I[0], I[1]] = ind
return labels, labels_all
def render_gt_pose(self, meta_data):
width = self._width
height = self._height
meta_data['cls_indexes'] = meta_data['cls_indexes'].flatten()
classes = np.array(cfg.TRAIN.CLASSES)
classes_test = np.array(cfg.TEST.CLASSES).flatten()
intrinsic_matrix = np.matrix(meta_data['intrinsic_matrix'])
fx = intrinsic_matrix[0, 0]
fy = intrinsic_matrix[1, 1]
px = intrinsic_matrix[0, 2]
py = intrinsic_matrix[1, 2]
zfar = 6.0
znear = 0.01
# poses
poses = meta_data['poses']
if len(poses.shape) == 2:
poses = np.reshape(poses, (3, 4, 1))
num = poses.shape[2]
# render poses to get the label image
cls_indexes = []
poses_all = []
qt = np.zeros((7, ), dtype=np.float32)
for i in range(num):
RT = poses[:, :, i]
qt[:3] = RT[:, 3]
qt[3:] = mat2quat(RT[:, :3])
if cfg.MODE == 'TEST':
index = np.where(classes_test == meta_data['cls_indexes'][i])[0]
cls_indexes.append(index[0])
else:
cls_indexes.append(meta_data['cls_indexes'][i] - 1)
poses_all.append(qt.copy())
# rendering
cfg.renderer.set_poses(poses_all)
cfg.renderer.set_light_pos([0, 0, 0])
cfg.renderer.set_light_color([1, 1, 1])
cfg.renderer.set_projection_matrix(width, height, fx, fy, px, py, znear, zfar)
image_tensor = torch.cuda.FloatTensor(height, width, 4).detach()
seg_tensor = torch.cuda.FloatTensor(height, width, 4).detach()
cfg.renderer.render(cls_indexes, image_tensor, seg_tensor)
image_tensor = image_tensor.flip(0)
seg_tensor = seg_tensor.flip(0)
# semantic labels
im_label = seg_tensor.cpu().numpy()
im_label = im_label[:, :, (2, 1, 0)] * 255
im_label = np.round(im_label).astype(np.uint8)
im_label = np.clip(im_label, 0, 255)
im_label, im_label_all = self.process_label_image(im_label)
# foreground mask
seg = torch.from_numpy((im_label != 0).astype(np.float32))
mask = seg.unsqueeze(0).repeat((3, 1, 1)).float().cuda()
# gt boxes
gt_boxes = np.zeros((self._num_classes, 5), dtype=np.float32)
count = 0
selected = []
for i in range(num):
cls = int(meta_data['cls_indexes'][i])
ind = np.where(classes == cls)[0]
if len(ind) > 0:
R = poses[:, :3, i]
T = poses[:, 3, i]
# compute box
x3d = np.ones((4, self._points_all.shape[1]), dtype=np.float32)
x3d[0, :] = self._points_all[ind,:,0]
x3d[1, :] = self._points_all[ind,:,1]
x3d[2, :] = self._points_all[ind,:,2]
RT = np.zeros((3, 4), dtype=np.float32)
RT[:3, :3] = R
RT[:, 3] = T
x2d = np.matmul(meta_data['intrinsic_matrix'], np.matmul(RT, x3d))
x2d[0, :] = np.divide(x2d[0, :], x2d[2, :])
x2d[1, :] = np.divide(x2d[1, :], x2d[2, :])
x1 = np.min(x2d[0, :])
y1 = np.min(x2d[1, :])
x2 = np.max(x2d[0, :])
y2 = np.max(x2d[1, :])
if x1 > width or y1 > height or x2 < 0 or y2 < 0:
continue
gt_boxes[count, 0] = x1
gt_boxes[count, 1] = y1
gt_boxes[count, 2] = x2
gt_boxes[count, 3] = y2
selected.append(i)
count += 1
meta_data['cls_indexes'] = meta_data['cls_indexes'][selected]
meta_data['poses'] = poses[:, :, selected]
meta_data['im_label'] = im_label
meta_data['box'] = gt_boxes[:count, :4]
return meta_data
def evaluation(self, output_dir):
filename = os.path.join(output_dir, 'results_posecnn.mat')
if os.path.exists(filename):
results_all = scipy.io.loadmat(filename)
print('load results from file')
print(filename)
distances_sys = results_all['distances_sys']
distances_non = results_all['distances_non']
errors_rotation = results_all['errors_rotation']
errors_translation = results_all['errors_translation']
results_frame_id = results_all['results_frame_id'].flatten()
results_object_id = results_all['results_object_id'].flatten()
results_cls_id = results_all['results_cls_id'].flatten()
segmentation_precision = results_all['segmentation_precision']
segmentation_recall = results_all['segmentation_recall']
segmentation_f1 = results_all['segmentation_f1']
segmentation_count = results_all['segmentation_count']
else:
# save results
num_max = 100000
num_results = 2
distances_sys = np.zeros((num_max, num_results), dtype=np.float32)
distances_non = np.zeros((num_max, num_results), dtype=np.float32)
errors_rotation = np.zeros((num_max, num_results), dtype=np.float32)
errors_translation = np.zeros((num_max, num_results), dtype=np.float32)
results_frame_id = np.zeros((num_max, ), dtype=np.float32)
results_object_id = np.zeros((num_max, ), dtype=np.float32)
results_cls_id = np.zeros((num_max, ), dtype=np.float32)
segmentation_precision = np.zeros((num_max, self._num_classes), dtype=np.float32)
segmentation_recall = np.zeros((num_max, self._num_classes), dtype=np.float32)
segmentation_f1 = np.zeros((num_max, self._num_classes), dtype=np.float32)
segmentation_count = np.zeros((num_max, self._num_classes), dtype=np.float32)
# for each image
count = -1
count_file = -1
filename = os.path.join(output_dir, '*.mat')
files = glob.glob(filename)
for i in range(len(files)):
# load result
filename = files[i]
print(filename)
result_posecnn = scipy.io.loadmat(filename)
# load gt poses
filename = result_posecnn['meta_data_path'][0]
print(filename)
gt = scipy.io.loadmat(filename)
# render gt poses
gt = self.render_gt_pose(gt)
# compute segmentation metrics
metrics_dict = multilabel_metrics(result_posecnn['labels'].astype(np.int32), gt['im_label'].astype(np.int32), self._num_classes)
count_file += 1
segmentation_precision[count_file, :] = metrics_dict['Precision']
segmentation_recall[count_file, :] = metrics_dict['Recall']
segmentation_f1[count_file, :] = metrics_dict['F-measure']
segmentation_count[count_file, :] = metrics_dict['Count']
'''
import matplotlib.pyplot as plt
fig = plt.figure()
im_file = filename.replace('_meta.mat', '_color.png')
im = cv2.imread(im_file)
ax = fig.add_subplot(2, 2, 1)
plt.imshow(im[:, :, (2, 1, 0)])
for i in range(gt['box'].shape[0]):
x1 = gt['box'][i, 0]
y1 = gt['box'][i, 1]
x2 = gt['box'][i, 2]
y2 = gt['box'][i, 3]
plt.gca().add_patch(plt.Rectangle((x1, y1), x2-x1, y2-y1, fill=False, edgecolor='g', linewidth=3))
ax = fig.add_subplot(2, 2, 2)
plt.imshow(gt['im_label'])
ax = fig.add_subplot(2, 2, 3)
plt.imshow(result_posecnn['labels'].astype(np.int32))
plt.show()
#'''
# for each gt poses
cls_indexes = gt['cls_indexes'].flatten()
for j in range(len(cls_indexes)):
count += 1
cls_index = cls_indexes[j]
RT_gt = gt['poses'][:, :, j]
results_frame_id[count] = i
results_object_id[count] = j
results_cls_id[count] = cls_index
# network result
result = result_posecnn
roi_index = []
if len(result['rois']) > 0:
for k in range(result['rois'].shape[0]):
ind = int(result['rois'][k, 1])
cls = cfg.TRAIN.CLASSES[ind]
if cls == cls_index:
roi_index.append(k)
# select the roi
if len(roi_index) > 1:
# overlaps: (rois x gt_boxes)
roi_blob = result['rois'][roi_index, :]
roi_blob = roi_blob[:, (0, 2, 3, 4, 5, 1)]
gt_box_blob = np.zeros((1, 5), dtype=np.float32)
gt_box_blob[0, 1:] = gt['box'][j, :]
overlaps = bbox_overlaps(
np.ascontiguousarray(roi_blob[:, :5], dtype=np.float),
np.ascontiguousarray(gt_box_blob, dtype=np.float)).flatten()
assignment = overlaps.argmax()
roi_index = [roi_index[assignment]]
if len(roi_index) > 0:
RT = np.zeros((3, 4), dtype=np.float32)
ind = int(result['rois'][roi_index, 1])
if ind == -1:
points = self._points_clamp
else:
points = self._points[ind]
# pose from network
RT[:3, :3] = quat2mat(result['poses'][roi_index, :4].flatten())
RT[:, 3] = result['poses'][roi_index, 4:]
distances_sys[count, 0] = adi(RT[:3, :3], RT[:, 3], RT_gt[:3, :3], RT_gt[:, 3], points)
distances_non[count, 0] = add(RT[:3, :3], RT[:, 3], RT_gt[:3, :3], RT_gt[:, 3], points)
errors_rotation[count, 0] = re(RT[:3, :3], RT_gt[:3, :3])
errors_translation[count, 0] = te(RT[:, 3], RT_gt[:, 3])
# pose after depth refinement
if cfg.TEST.POSE_REFINE:
RT[:3, :3] = quat2mat(result['poses_refined'][roi_index, :4].flatten())
RT[:, 3] = result['poses_refined'][roi_index, 4:]
distances_sys[count, 1] = adi(RT[:3, :3], RT[:, 3], RT_gt[:3, :3], RT_gt[:, 3], points)
distances_non[count, 1] = add(RT[:3, :3], RT[:, 3], RT_gt[:3, :3], RT_gt[:, 3], points)
errors_rotation[count, 1] = re(RT[:3, :3], RT_gt[:3, :3])
errors_translation[count, 1] = te(RT[:, 3], RT_gt[:, 3])
else:
distances_sys[count, 1] = np.inf
distances_non[count, 1] = np.inf
errors_rotation[count, 1] = np.inf
errors_translation[count, 1] = np.inf
else:
distances_sys[count, :] = np.inf
distances_non[count, :] = np.inf
errors_rotation[count, :] = np.inf
errors_translation[count, :] = np.inf
distances_sys = distances_sys[:count+1, :]
distances_non = distances_non[:count+1, :]
errors_rotation = errors_rotation[:count+1, :]
errors_translation = errors_translation[:count+1, :]
results_frame_id = results_frame_id[:count+1]
results_object_id = results_object_id[:count+1]
results_cls_id = results_cls_id[:count+1]
segmentation_precision = segmentation_precision[:count_file+1, :]
segmentation_recall = segmentation_recall[:count_file+1, :]
segmentation_f1 = segmentation_f1[:count_file+1, :]
segmentation_count = segmentation_count[:count_file+1, :]
results_all = {'distances_sys': distances_sys,
'distances_non': distances_non,
'errors_rotation': errors_rotation,
'errors_translation': errors_translation,
'results_frame_id': results_frame_id,
'results_object_id': results_object_id,
'results_cls_id': results_cls_id,
'segmentation_precision': segmentation_precision,
'segmentation_recall': segmentation_recall,
'segmentation_f1': segmentation_f1,
'segmentation_count': segmentation_count}
filename = os.path.join(output_dir, 'results_posecnn.mat')
scipy.io.savemat(filename, results_all)
# for each class
import matplotlib.pyplot as plt
max_distance = 0.1
index_plot = [0, 1]
color = ['r', 'b']
leng = ['PoseCNN', 'refined']
num = len(leng)
ADD = np.zeros((self._num_classes_all, num), dtype=np.float32)
ADDS = np.zeros((self._num_classes_all, num), dtype=np.float32)
TS = np.zeros((self._num_classes_all, num), dtype=np.float32)
classes = list(copy.copy(self._classes_all))
classes[0] = 'all'
for k in range(self._num_classes_all):
fig = plt.figure()
if k == 0:
index = range(len(results_cls_id))
else:
index = np.where(results_cls_id == k)[0]
if len(index) == 0:
continue
print('%s: %d objects' % (classes[k], len(index)))
# distance symmetry
ax = fig.add_subplot(2, 3, 1)
lengs = []
for i in index_plot:
D = distances_sys[index, i]
ind = np.where(D > max_distance)[0]
D[ind] = np.inf
d = np.sort(D)
n = len(d)
accuracy = np.cumsum(np.ones((n, ), np.float32)) / n
plt.plot(d, accuracy, color[i], linewidth=2)
ADDS[k, i] = VOCap(d, accuracy)
lengs.append('%s (%.2f)' % (leng[i], ADDS[k, i] * 100))
print('%s, %s: %d objects missed' % (classes[k], leng[i], np.sum(np.isinf(D))))
ax.legend(lengs)
plt.xlabel('Average distance threshold in meter (symmetry)')
plt.ylabel('accuracy')
ax.set_title(classes[k])
# distance non-symmetry
ax = fig.add_subplot(2, 3, 2)
lengs = []
for i in index_plot:
D = distances_non[index, i]
ind = np.where(D > max_distance)[0]
D[ind] = np.inf
d = np.sort(D)
n = len(d)
accuracy = np.cumsum(np.ones((n, ), np.float32)) / n
plt.plot(d, accuracy, color[i], linewidth=2)
ADD[k, i] = VOCap(d, accuracy)
lengs.append('%s (%.2f)' % (leng[i], ADD[k, i] * 100))
print('%s, %s: %d objects missed' % (classes[k], leng[i], np.sum(np.isinf(D))))
ax.legend(lengs)
plt.xlabel('Average distance threshold in meter (non-symmetry)')
plt.ylabel('accuracy')
ax.set_title(classes[k])
# translation
ax = fig.add_subplot(2, 3, 3)
lengs = []
for i in index_plot:
D = errors_translation[index, i]
ind = np.where(D > max_distance)[0]
D[ind] = np.inf
d = np.sort(D)
n = len(d)
accuracy = np.cumsum(np.ones((n, ), np.float32)) / n
plt.plot(d, accuracy, color[i], linewidth=2)
TS[k, i] = VOCap(d, accuracy)
lengs.append('%s (%.2f)' % (leng[i], TS[k, i] * 100))
print('%s, %s: %d objects missed' % (classes[k], leng[i], np.sum(np.isinf(D))))
ax.legend(lengs)
plt.xlabel('Translation threshold in meter')
plt.ylabel('accuracy')
ax.set_title(classes[k])
# rotation histogram
count = 4
for i in index_plot:
ax = fig.add_subplot(2, 3, count)
D = errors_rotation[index, i]
ind = np.where(np.isfinite(D))[0]
D = D[ind]
ax.hist(D, bins=range(0, 190, 10), range=(0, 180))
plt.xlabel('Rotation angle error')
plt.ylabel('count')
ax.set_title(leng[i])
count += 1
# mng = plt.get_current_fig_manager()
# mng.full_screen_toggle()
filename = output_dir + '/' + classes[k] + '.png'
plt.savefig(filename)
# plt.show()
# print ADD
print('==================ADD======================')
for k in range(len(classes)):
print('%s: %f' % (classes[k], ADD[k, 0]))
for k in range(len(classes)-1):
print('%f' % (ADD[k+1, 0]))
print('%f' % (ADD[0, 0]))
print(cfg.TRAIN.SNAPSHOT_INFIX)
print('===========================================')
# print ADD-S
print('==================ADD-S====================')
for k in range(len(classes)):
print('%s: %f' % (classes[k], ADDS[k, 0]))
for k in range(len(classes)-1):
print('%f' % (ADDS[k+1, 0]))
print('%f' % (ADDS[0, 0]))
print(cfg.TRAIN.SNAPSHOT_INFIX)
print('===========================================')
# print ADD
print('==================ADD refined======================')
for k in range(len(classes)):
print('%s: %f' % (classes[k], ADD[k, 1]))
for k in range(len(classes)-1):
print('%f' % (ADD[k+1, 1]))
print('%f' % (ADD[0, 1]))
print(cfg.TRAIN.SNAPSHOT_INFIX)
print('===========================================')
# print ADD-S
print('==================ADD-S refined====================')
for k in range(len(classes)):
print('%s: %f' % (classes[k], ADDS[k, 1]))
for k in range(len(classes)-1):
print('%f' % (ADDS[k+1, 1]))
print('%f' % (ADDS[0, 1]))
print(cfg.TRAIN.SNAPSHOT_INFIX)
print('===========================================')
# print segmentation precision
print('==================segmentation precision====================')
for i in range(self._num_classes):
count = np.sum(segmentation_count[:, i])
if count > 0:
precision = np.sum(segmentation_precision[:, i]) / count
else:
precision = 0
print('%s: %d objects, %f' % (self._classes[i], count, precision))
for i in range(self._num_classes):
count = np.sum(segmentation_count[:, i])
if count > 0:
precision = np.sum(segmentation_precision[:, i]) / count
else:
precision = 0
print('%f' % (precision))
print('===========================================')
# print segmentation recall
print('==================segmentation recall====================')
for i in range(self._num_classes):
count = np.sum(segmentation_count[:, i])
if count > 0:
recall = np.sum(segmentation_recall[:, i]) / count
else:
recall = 0
print('%s: %d objects, %f' % (self._classes[i], count, recall))
for i in range(self._num_classes):
count = np.sum(segmentation_count[:, i])
if count > 0:
recall = np.sum(segmentation_recall[:, i]) / count
else:
recall = 0
print('%f' % (recall))
print('===========================================')
# print segmentation f1
print('==================segmentation f1====================')
for i in range(self._num_classes):
count = np.sum(segmentation_count[:, i])
if count > 0:
f1 = np.sum(segmentation_f1[:, i]) / count
else:
f1 = 0
print('%s: %d objects, %f' % (self._classes[i], count, f1))
for i in range(self._num_classes):
count = np.sum(segmentation_count[:, i])
if count > 0:
f1 = np.sum(segmentation_f1[:, i]) / count
else:
f1 = 0
print('%f' % (f1))
print('===========================================')
================================================
FILE: lib/datasets/ycb_video.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import torch
import torch.utils.data as data
import os, math
import sys
import os.path as osp
from os.path import *
import numpy as np
import numpy.random as npr
import cv2
import scipy.io
import copy
import glob
try:
import cPickle # Use cPickle on Python 2.7
except ImportError:
import pickle as cPickle
import datasets
from fcn.config import cfg
from utils.blob import pad_im, chromatic_transform, add_noise, add_noise_cuda
from transforms3d.quaternions import mat2quat, quat2mat
from utils.se3 import *
from utils.pose_error import *
from utils.cython_bbox import bbox_overlaps
class YCBVideo(data.Dataset, datasets.imdb):
def __init__(self, image_set, ycb_video_path = None):
self._name = 'ycb_video_' + image_set
self._image_set = image_set
self._ycb_video_path = self._get_default_path() if ycb_video_path is None \
else ycb_video_path
path = os.path.join(self._ycb_video_path, 'data')
if not os.path.exists(path):
path = os.path.join(self._ycb_video_path, 'YCB_Video_Dataset/YCB_Video_Dataset/YCB_Video_Dataset/data')
self._data_path = path
self._model_path = os.path.join(datasets.ROOT_DIR, 'data', 'models')
# define all the classes
self._classes_all = ('__background__', '002_master_chef_can', '003_cracker_box', '004_sugar_box', '005_tomato_soup_can', '006_mustard_bottle', \
'007_tuna_fish_can', '008_pudding_box', '009_gelatin_box', '010_potted_meat_can', '011_banana', '019_pitcher_base', \
'021_bleach_cleanser', '024_bowl', '025_mug', '035_power_drill', '036_wood_block', '037_scissors', '040_large_marker', \
'051_large_clamp', '052_extra_large_clamp', '061_foam_brick')
self._num_classes_all = len(self._classes_all)
self._class_colors_all = [(255, 255, 255), (255, 0, 0), (0, 255, 0), (0, 0, 255), (255, 255, 0), (255, 0, 255), (0, 255, 255), \
(128, 0, 0), (0, 128, 0), (0, 0, 128), (128, 128, 0), (128, 0, 128), (0, 128, 128), \
(64, 0, 0), (0, 64, 0), (0, 0, 64), (64, 64, 0), (64, 0, 64), (0, 64, 64),
(192, 0, 0), (0, 192, 0), (0, 0, 192)]
self._symmetry_all = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1]).astype(np.float32)
self._extents_all = self._load_object_extents()
self._width = 640
self._height = 480
self._intrinsic_matrix = np.array([[1.066778e+03, 0.000000e+00, 3.129869e+02],
[0.000000e+00, 1.067487e+03, 2.413109e+02],
[0.000000e+00, 0.000000e+00, 1.000000e+00]])
# select a subset of classes
self._classes = [self._classes_all[i] for i in cfg.TRAIN.CLASSES]
self._classes_test = [self._classes_all[i] for i in cfg.TEST.CLASSES]
self._num_classes = len(self._classes)
self._class_colors = [self._class_colors_all[i] for i in cfg.TRAIN.CLASSES]
self._symmetry = self._symmetry_all[cfg.TRAIN.CLASSES]
self._symmetry_test = self._symmetry_all[cfg.TEST.CLASSES]
self._extents = self._extents_all[cfg.TRAIN.CLASSES]
self._extents_test = self._extents_all[cfg.TEST.CLASSES]
self._pixel_mean = cfg.PIXEL_MEANS / 255.0
# train classes
self._points, self._points_all, self._point_blob = \
self._load_object_points(self._classes, self._extents, self._symmetry)
# test classes
self._points_test, self._points_all_test, self._point_blob_test = \
self._load_object_points(self._classes_test, self._extents_test, self._symmetry_test)
# 3D model paths
self.model_mesh_paths = ['{}/{}/textured_simple.obj'.format(self._model_path, cls) for cls in self._classes_all[1:]]
self.model_sdf_paths = ['{}/{}/textured_simple_low_res.pth'.format(self._model_path, cls) for cls in self._classes_all[1:]]
self.model_texture_paths = ['{}/{}/texture_map.png'.format(self._model_path, cls) for cls in self._classes_all[1:]]
self.model_colors = [np.array(self._class_colors_all[i]) / 255.0 for i in range(1, len(self._classes_all))]
self.model_mesh_paths_target = ['{}/{}/textured_simple.obj'.format(self._model_path, cls) for cls in self._classes[1:]]
self.model_sdf_paths_target = ['{}/{}/textured_simple.sdf'.format(self._model_path, cls) for cls in self._classes[1:]]
self.model_texture_paths_target = ['{}/{}/texture_map.png'.format(self._model_path, cls) for cls in self._classes[1:]]
self.model_colors_target = [np.array(self._class_colors_all[i]) / 255.0 for i in cfg.TRAIN.CLASSES[1:]]
self._class_to_ind = dict(zip(self._classes, range(self._num_classes)))
self._image_index = self._load_image_set_index(image_set)
self._size = len(self._image_index)
if self._size > cfg.TRAIN.MAX_ITERS_PER_EPOCH * cfg.TRAIN.IMS_PER_BATCH:
self._size = cfg.TRAIN.MAX_ITERS_PER_EPOCH * cfg.TRAIN.IMS_PER_BATCH
self._roidb = self.gt_roidb()
assert os.path.exists(self._ycb_video_path), \
'ycb_video path does not exist: {}'.format(self._ycb_video_path)
assert os.path.exists(self._data_path), \
'Data path does not exist: {}'.format(self._data_path)
def __getitem__(self, index):
is_syn = 0
roidb = self._roidb[index]
# Get the input image blob
random_scale_ind = npr.randint(0, high=len(cfg.TRAIN.SCALES_BASE))
im_blob, im_depth, im_scale, height, width = self._get_image_blob(roidb, random_scale_ind)
# build the label blob
label_blob, mask, meta_data_blob, pose_blob, gt_boxes, vertex_targets, vertex_weights \
= self._get_label_blob(roidb, self._num_classes, im_scale, height, width)
is_syn = roidb['is_syn']
im_info = np.array([im_blob.shape[1], im_blob.shape[2], im_scale, is_syn], dtype=np.float32)
sample = {'image_color': im_blob,
'im_depth': im_depth,
'label': label_blob,
'mask': mask,
'meta_data': meta_data_blob,
'poses': pose_blob,
'extents': self._extents,
'points': self._point_blob,
'symmetry': self._symmetry,
'gt_boxes': gt_boxes,
'im_info': im_info,
'video_id': roidb['video_id'],
'image_id': roidb['image_id']}
if cfg.TRAIN.VERTEX_REG:
sample['vertex_targets'] = vertex_targets
sample['vertex_weights'] = vertex_weights
return sample
def _get_image_blob(self, roidb, scale_ind):
# rgba
rgba = pad_im(cv2.imread(roidb['image'], cv2.IMREAD_UNCHANGED), 16)
if rgba.shape[2] == 4:
im = np.copy(rgba[:,:,:3])
alpha = rgba[:,:,3]
I = np.where(alpha == 0)
im[I[0], I[1], :] = 0
else:
im = rgba
im_scale = cfg.TRAIN.SCALES_BASE[scale_ind]
if im_scale != 1.0:
im = cv2.resize(im, None, None, fx=im_scale, fy=im_scale, interpolation=cv2.INTER_LINEAR)
height = im.shape[0]
width = im.shape[1]
if roidb['flipped']:
im = im[:, ::-1, :]
# chromatic transform
if cfg.TRAIN.CHROMATIC and cfg.MODE == 'TRAIN' and np.random.rand(1) > 0.1:
im = chromatic_transform(im)
if cfg.TRAIN.ADD_NOISE and cfg.MODE == 'TRAIN' and np.random.rand(1) > 0.1:
im = add_noise(im)
im_tensor = torch.from_numpy(im) / 255.0
im_tensor -= self._pixel_mean
image_blob = im_tensor.permute(2, 0, 1).float()
# depth image
im_depth = pad_im(cv2.imread(roidb['depth'], cv2.IMREAD_UNCHANGED), 16)
if im_scale != 1.0:
im_depth = cv2.resize(im_depth, None, None, fx=im_scale, fy=im_scale, interpolation=cv2.INTER_NEAREST)
im_depth = im_depth.astype('float') / 10000.0
return image_blob, im_depth, im_scale, height, width
def _get_label_blob(self, roidb, num_classes, im_scale, height, width):
""" build the label blob """
meta_data = scipy.io.loadmat(roidb['meta_data'])
meta_data['cls_indexes'] = meta_data['cls_indexes'].flatten()
classes = np.array(cfg.TRAIN.CLASSES)
# read label image
im_label = pad_im(cv2.imread(roidb['label'], cv2.IMREAD_UNCHANGED), 16)
if roidb['flipped']:
if len(im_label.shape) == 2:
im_label = im_label[:, ::-1]
else:
im_label = im_label[:, ::-1, :]
if im_scale != 1.0:
im_label = cv2.resize(im_label, None, None, fx=im_scale, fy=im_scale, interpolation=cv2.INTER_NEAREST)
label_blob = np.zeros((num_classes, height, width), dtype=np.float32)
label_blob[0, :, :] = 1.0
for i in range(1, num_classes):
I = np.where(im_label == classes[i])
if len(I[0]) > 0:
label_blob[i, I[0], I[1]] = 1.0
label_blob[0, I[0], I[1]] = 0.0
# foreground mask
seg = torch.from_numpy((im_label != 0).astype(np.float32))
mask = seg.unsqueeze(0).repeat((3, 1, 1)).float()
# poses
poses = meta_data['poses']
if len(poses.shape) == 2:
poses = np.reshape(poses, (3, 4, 1))
if roidb['flipped']:
poses = _flip_poses(poses, meta_data['intrinsic_matrix'], width)
num = poses.shape[2]
pose_blob = np.zeros((num_classes, 9), dtype=np.float32)
gt_boxes = np.zeros((num_classes, 5), dtype=np.float32)
count = 0
for i in range(num):
cls = int(meta_data['cls_indexes'][i])
ind = np.where(classes == cls)[0]
if len(ind) > 0:
R = poses[:, :3, i]
T = poses[:, 3, i]
pose_blob[count, 0] = 1
pose_blob[count, 1] = ind
qt = mat2quat(R)
# egocentric to allocentric
qt_allocentric = egocentric2allocentric(qt, T)
if qt_allocentric[0] < 0:
qt_allocentric = -1 * qt_allocentric
pose_blob[count, 2:6] = qt_allocentric
pose_blob[count, 6:] = T
# compute box
x3d = np.ones((4, self._points_all.shape[1]), dtype=np.float32)
x3d[0, :] = self._points_all[ind,:,0]
x3d[1, :] = self._points_all[ind,:,1]
x3d[2, :] = self._points_all[ind,:,2]
RT = np.zeros((3, 4), dtype=np.float32)
RT[:3, :3] = quat2mat(qt)
RT[:, 3] = T
x2d = np.matmul(meta_data['intrinsic_matrix'], np.matmul(RT, x3d))
x2d[0, :] = np.divide(x2d[0, :], x2d[2, :])
x2d[1, :] = np.divide(x2d[1, :], x2d[2, :])
gt_boxes[count, 0] = np.min(x2d[0, :]) * im_scale
gt_boxes[count, 1] = np.min(x2d[1, :]) * im_scale
gt_boxes[count, 2] = np.max(x2d[0, :]) * im_scale
gt_boxes[count, 3] = np.max(x2d[1, :]) * im_scale
gt_boxes[count, 4] = ind
count += 1
# construct the meta data
"""
format of the meta_data
intrinsic matrix: meta_data[0 ~ 8]
inverse intrinsic matrix: meta_data[9 ~ 17]
"""
K = np.matrix(meta_data['intrinsic_matrix']) * im_scale
K[2, 2] = 1
Kinv = np.linalg.pinv(K)
meta_data_blob = np.zeros(18, dtype=np.float32)
meta_data_blob[0:9] = K.flatten()
meta_data_blob[9:18] = Kinv.flatten()
# vertex regression target
if cfg.TRAIN.VERTEX_REG:
center = meta_data['center']
if roidb['flipped']:
center[:, 0] = width - center[:, 0]
vertex_targets, vertex_weights = self._generate_vertex_targets(im_label,
meta_data['cls_indexes'], center, poses, classes, num_classes)
else:
vertex_targets = []
vertex_weights = []
return label_blob, mask, meta_data_blob, pose_blob, gt_boxes, vertex_targets, vertex_weights
# compute the voting label image in 2D
def _generate_vertex_targets(self, im_label, cls_indexes, center, poses, classes, num_classes):
width = im_label.shape[1]
height = im_label.shape[0]
vertex_targets = np.zeros((3 * num_classes, height, width), dtype=np.float32)
vertex_weights = np.zeros((3 * num_classes, height, width), dtype=np.float32)
c = np.zeros((2, 1), dtype=np.float32)
for i in range(1, num_classes):
y, x = np.where(im_label == classes[i])
I = np.where(im_label == classes[i])
ind = np.where(cls_indexes == classes[i])[0]
if len(x) > 0 and len(ind) > 0:
c[0] = center[ind, 0]
c[1] = center[ind, 1]
if isinstance(poses, list):
z = poses[int(ind)][2]
else:
if len(poses.shape) == 3:
z = poses[2, 3, ind]
else:
z = poses[ind, -1]
R = np.tile(c, (1, len(x))) - np.vstack((x, y))
# compute the norm
N = np.linalg.norm(R, axis=0) + 1e-10
# normalization
R = np.divide(R, np.tile(N, (2,1)))
# assignment
vertex_targets[3*i+0, y, x] = R[0,:]
vertex_targets[3*i+1, y, x] = R[1,:]
vertex_targets[3*i+2, y, x] = math.log(z)
vertex_weights[3*i+0, y, x] = cfg.TRAIN.VERTEX_W_INSIDE
vertex_weights[3*i+1, y, x] = cfg.TRAIN.VERTEX_W_INSIDE
vertex_weights[3*i+2, y, x] = cfg.TRAIN.VERTEX_W_INSIDE
return vertex_targets, vertex_weights
def __len__(self):
return self._size
def _get_default_path(self):
"""
Return the default path where YCB_Video is expected to be installed.
"""
return os.path.join(datasets.ROOT_DIR, 'data', 'YCB_Video')
def _load_image_set_index(self, image_set):
"""
Load the indexes listed in this dataset's image set file.
"""
image_set_file = os.path.join(self._ycb_video_path, image_set + '.txt')
assert os.path.exists(image_set_file), \
'Path does not exist: {}'.format(image_set_file)
image_index = []
video_ids_selected = set([])
video_ids_not = set([])
count = np.zeros((self.num_classes, ), dtype=np.int32)
with open(image_set_file) as f:
for x in f.readlines():
index = x.rstrip('\n')
pos = index.find('/')
video_id = index[:pos]
if not video_id in video_ids_selected and not video_id in video_ids_not:
filename = os.path.join(self._data_path, video_id, '000001-meta.mat')
meta_data = scipy.io.loadmat(filename)
cls_indexes = meta_data['cls_indexes'].flatten()
flag = 0
for i in range(len(cls_indexes)):
cls_index = int(cls_indexes[i])
ind = np.where(np.array(cfg.TRAIN.CLASSES) == cls_index)[0]
if len(ind) > 0:
count[ind] += 1
flag = 1
if flag:
video_ids_selected.add(video_id)
else:
video_ids_not.add(video_id)
if video_id in video_ids_selected:
image_index.append(index)
for i in range(1, self.num_classes):
print('%d %s [%d/%d]' % (i, self.classes[i], count[i], len(list(video_ids_selected))))
# sample a subset for training
if image_set == 'train':
image_index = image_index[::5]
# add synthetic data
filename = os.path.join(self._data_path + '_syn', '*.mat')
files = glob.glob(filename)
print('adding synthetic %d data' % (len(files)))
for i in range(len(files)):
filename = files[i].replace(self._data_path, '../data')[:-9]
image_index.append(filename)
return image_index
def _load_object_points(self, classes, extents, symmetry):
points = [[] for _ in range(len(classes))]
num = np.inf
num_classes = len(classes)
for i in range(1, num_classes):
point_file = os.path.join(self._model_path, classes[i], 'points.xyz')
print(point_file)
assert os.path.exists(point_file), 'Path does not exist: {}'.format(point_file)
points[i] = np.loadtxt(point_file)
if points[i].shape[0] < num:
num = points[i].shape[0]
points_all = np.zeros((num_classes, num, 3), dtype=np.float32)
for i in range(1, num_classes):
points_all[i, :, :] = points[i][:num, :]
# rescale the points
point_blob = points_all.copy()
for i in range(1, num_classes):
# compute the rescaling factor for the points
weight = 10.0 / np.amax(extents[i, :])
if weight < 10:
weight = 10
if symmetry[i] > 0:
point_blob[i, :, :] = 4 * weight * point_blob[i, :, :]
else:
point_blob[i, :, :] = weight * point_blob[i, :, :]
return points, points_all, point_blob
def _load_object_extents(self):
extents = np.zeros((self._num_classes_all, 3), dtype=np.float32)
for i in range(1, self._num_classes_all):
point_file = os.path.join(self._model_path, self._classes_all[i], 'points.xyz')
print(point_file)
assert os.path.exists(point_file), 'Path does not exist: {}'.format(point_file)
points = np.loadtxt(point_file)
extents[i, :] = 2 * np.max(np.absolute(points), axis=0)
return extents
# image
def image_path_at(self, i):
"""
Return the absolute path to image i in the image sequence.
"""
return self.image_path_from_index(self.image_index[i])
def image_path_from_index(self, index):
"""
Construct an image path from the image's "index" identifier.
"""
image_path = os.path.join(self._data_path, index + '-color.jpg')
if not os.path.exists(image_path):
image_path = os.path.join(self._data_path, index + '-color.png')
assert os.path.exists(image_path), \
'Path does not exist: {}'.format(image_path)
return image_path
# depth
def depth_path_at(self, i):
"""
Return the absolute path to depth i in the image sequence.
"""
return self.depth_path_from_index(self.image_index[i])
def depth_path_from_index(self, index):
"""
Construct an depth path from the image's "index" identifier.
"""
depth_path = os.path.join(self._data_path, index + '-depth.png')
assert os.path.exists(depth_path), \
'Path does not exist: {}'.format(depth_path)
return depth_path
# label
def label_path_at(self, i):
"""
Return the absolute path to metadata i in the image sequence.
"""
return self.label_path_from_index(self.image_index[i])
def label_path_from_index(self, index):
"""
Construct an metadata path from the image's "index" identifier.
"""
label_path = os.path.join(self._data_path, index + '-label.png')
assert os.path.exists(label_path), \
'Path does not exist: {}'.format(label_path)
return label_path
# camera pose
def metadata_path_at(self, i):
"""
Return the absolute path to metadata i in the image sequence.
"""
return self.metadata_path_from_index(self.image_index[i])
def metadata_path_from_index(self, index):
"""
Construct an metadata path from the image's "index" identifier.
"""
metadata_path = os.path.join(self._data_path, index + '-meta.mat')
assert os.path.exists(metadata_path), \
'Path does not exist: {}'.format(metadata_path)
return metadata_path
def gt_roidb(self):
"""
Return the database of ground-truth regions of interest.
This function loads/saves from/to a cache file to speed up future calls.
"""
prefix = '_class'
for i in range(len(cfg.TRAIN.CLASSES)):
prefix += '_%d' % cfg.TRAIN.CLASSES[i]
cache_file = os.path.join(self.cache_path, self.name + prefix + '_gt_roidb.pkl')
if os.path.exists(cache_file):
with open(cache_file, 'rb') as fid:
roidb = cPickle.load(fid)
print('{} gt roidb loaded from {}'.format(self.name, cache_file))
return roidb
print('loading gt...')
gt_roidb = [self._load_ycb_video_annotation(index)
for index in self._image_index]
with open(cache_file, 'wb') as fid:
cPickle.dump(gt_roidb, fid, cPickle.HIGHEST_PROTOCOL)
print('wrote gt roidb to {}'.format(cache_file))
return gt_roidb
def _load_ycb_video_annotation(self, index):
"""
Load class name and meta data
"""
# image path
image_path = self.image_path_from_index(index)
# depth path
depth_path = self.depth_path_from_index(index)
# label path
label_path = self.label_path_from_index(index)
# metadata path
metadata_path = self.metadata_path_from_index(index)
# is synthetic image or not
if 'data_syn' in image_path:
is_syn = 1
video_id = ''
image_id = ''
else:
is_syn = 0
# parse image name
pos = index.find('/')
video_id = index[:pos]
image_id = index[pos+1:]
return {'image': image_path,
'depth': depth_path,
'label': label_path,
'meta_data': metadata_path,
'video_id': video_id,
'image_id': image_id,
'is_syn': is_syn,
'flipped': False}
def labels_to_image(self, labels):
height = labels.shape[0]
width = labels.shape[1]
im_label = np.zeros((height, width, 3), dtype=np.uint8)
for i in range(self.num_classes):
I = np.where(labels == i)
im_label[I[0], I[1], :] = self._class_colors[i]
return im_label
def process_label_image(self, label_image):
"""
change label image to label index
"""
height = label_image.shape[0]
width = label_image.shape[1]
labels = np.zeros((height, width), dtype=np.int32)
labels_all = np.zeros((height, width), dtype=np.int32)
# label image is in BGR order
index = label_image[:,:,2] + 256*label_image[:,:,1] + 256*256*label_image[:,:,0]
for i in range(1, len(self._class_colors_all)):
color = self._class_colors_all[i]
ind = color[0] + 256*color[1] + 256*256*color[2]
I = np.where(index == ind)
labels_all[I[0], I[1]] = i
ind = np.where(np.array(cfg.TRAIN.CLASSES) == i)[0]
if len(ind) > 0:
labels[I[0], I[1]] = ind
return labels, labels_all
def evaluation(self, output_dir):
filename = os.path.join(output_dir, 'results_posecnn.mat')
if os.path.exists(filename):
results_all = scipy.io.loadmat(filename)
print('load results from file')
print(filename)
distances_sys = results_all['distances_sys']
distances_non = results_all['distances_non']
errors_rotation = results_all['errors_rotation']
errors_translation = results_all['errors_translation']
results_seq_id = results_all['results_seq_id'].flatten()
results_frame_id = results_all['results_frame_id'].flatten()
results_object_id = results_all['results_object_id'].flatten()
results_cls_id = results_all['results_cls_id'].flatten()
else:
# save results
num_max = 100000
num_results = 2
distances_sys = np.zeros((num_max, num_results), dtype=np.float32)
distances_non = np.zeros((num_max, num_results), dtype=np.float32)
errors_rotation = np.zeros((num_max, num_results), dtype=np.float32)
errors_translation = np.zeros((num_max, num_results), dtype=np.float32)
results_seq_id = np.zeros((num_max, ), dtype=np.float32)
results_frame_id = np.zeros((num_max, ), dtype=np.float32)
results_object_id = np.zeros((num_max, ), dtype=np.float32)
results_cls_id = np.zeros((num_max, ), dtype=np.float32)
# for each image
count = -1
for i in range(len(self._roidb)):
# parse keyframe name
seq_id = int(self._roidb[i]['video_id'])
frame_id = int(self._roidb[i]['image_id'])
# load result
filename = os.path.join(output_dir, '%04d_%06d.mat' % (seq_id, frame_id))
print(filename)
result_posecnn = scipy.io.loadmat(filename)
# load gt poses
filename = osp.join(self._data_path, '%04d/%06d-meta.mat' % (seq_id, frame_id))
print(filename)
gt = scipy.io.loadmat(filename)
# for each gt poses
cls_indexes = gt['cls_indexes'].flatten()
for j in range(len(cls_indexes)):
count += 1
cls_index = cls_indexes[j]
RT_gt = gt['poses'][:, :, j]
results_seq_id[count] = seq_id
results_frame_id[count] = frame_id
results_object_id[count] = j
results_cls_id[count] = cls_index
# network result
result = result_posecnn
roi_index = []
if len(result['rois']) > 0:
for k in range(result['rois'].shape[0]):
ind = int(result['rois'][k, 1])
if ind == -1:
cls = 19
else:
cls = cfg.TRAIN.CLASSES[ind]
if cls == cls_index:
roi_index.append(k)
# select the roi
if len(roi_index) > 1:
# overlaps: (rois x gt_boxes)
roi_blob = result['rois'][roi_index, :]
roi_blob = roi_blob[:, (0, 2, 3, 4, 5, 1)]
gt_box_blob = np.zeros((1, 5), dtype=np.float32)
gt_box_blob[0, 1:] = gt['box'][j, :]
overlaps = bbox_overlaps(
np.ascontiguousarray(roi_blob[:, :5], dtype=np.float),
np.ascontiguousarray(gt_box_blob, dtype=np.float)).flatten()
assignment = overlaps.argmax()
roi_index = [roi_index[assignment]]
if len(roi_index) > 0:
RT = np.zeros((3, 4), dtype=np.float32)
ind = int(result['rois'][roi_index, 1])
points = self._points[ind]
# pose from network
RT[:3, :3] = quat2mat(result['poses'][roi_index, :4].flatten())
RT[:, 3] = result['poses'][roi_index, 4:]
distances_sys[count, 0] = adi(RT[:3, :3], RT[:, 3], RT_gt[:3, :3], RT_gt[:, 3], points)
distances_non[count, 0] = add(RT[:3, :3], RT[:, 3], RT_gt[:3, :3], RT_gt[:, 3], points)
errors_rotation[count, 0] = re(RT[:3, :3], RT_gt[:3, :3])
errors_translation[count, 0] = te(RT[:, 3], RT_gt[:, 3])
# pose after depth refinement
if cfg.TEST.POSE_REFINE:
RT[:3, :3] = quat2mat(result['poses_refined'][roi_index, :4].flatten())
RT[:, 3] = result['poses_refined'][roi_index, 4:]
distances_sys[count, 1] = adi(RT[:3, :3], RT[:, 3], RT_gt[:3, :3], RT_gt[:, 3], points)
distances_non[count, 1] = add(RT[:3, :3], RT[:, 3], RT_gt[:3, :3], RT_gt[:, 3], points)
errors_rotation[count, 1] = re(RT[:3, :3], RT_gt[:3, :3])
errors_translation[count, 1] = te(RT[:, 3], RT_gt[:, 3])
else:
distances_sys[count, 1] = np.inf
distances_non[count, 1] = np.inf
errors_rotation[count, 1] = np.inf
errors_translation[count, 1] = np.inf
else:
distances_sys[count, :] = np.inf
distances_non[count, :] = np.inf
errors_rotation[count, :] = np.inf
errors_translation[count, :] = np.inf
distances_sys = distances_sys[:count+1, :]
distances_non = distances_non[:count+1, :]
errors_rotation = errors_rotation[:count+1, :]
errors_translation = errors_translation[:count+1, :]
results_seq_id = results_seq_id[:count+1]
results_frame_id = results_frame_id[:count+1]
results_object_id = results_object_id[:count+1]
results_cls_id = results_cls_id[:count+1]
results_all = {'distances_sys': distances_sys,
'distances_non': distances_non,
'errors_rotation': errors_rotation,
'errors_translation': errors_translation,
'results_seq_id': results_seq_id,
'results_frame_id': results_frame_id,
'results_object_id': results_object_id,
'results_cls_id': results_cls_id }
filename = os.path.join(output_dir, 'results_posecnn.mat')
scipy.io.savemat(filename, results_all)
# print the results
# for each class
import matplotlib.pyplot as plt
max_distance = 0.1
index_plot = [0, 1]
color = ['r', 'b']
leng = ['PoseCNN', 'PoseCNN refined']
num = len(leng)
ADD = np.zeros((self._num_classes_all, num), dtype=np.float32)
ADDS = np.zeros((self._num_classes_all, num), dtype=np.float32)
TS = np.zeros((self._num_classes_all, num), dtype=np.float32)
classes = list(copy.copy(self._classes_all))
classes[0] = 'all'
for k in range(self._num_classes_all):
fig = plt.figure()
if k == 0:
index = range(len(results_cls_id))
else:
index = np.where(results_cls_id == k)[0]
if len(index) == 0:
continue
print('%s: %d objects' % (classes[k], len(index)))
# distance symmetry
ax = fig.add_subplot(2, 3, 1)
lengs = []
for i in index_plot:
D = distances_sys[index, i]
ind = np.where(D > max_distance)[0]
D[ind] = np.inf
d = np.sort(D)
n = len(d)
accuracy = np.cumsum(np.ones((n, ), np.float32)) / n
plt.plot(d, accuracy, color[i], linewidth=2)
ADDS[k, i] = VOCap(d, accuracy)
lengs.append('%s (%.2f)' % (leng[i], ADDS[k, i] * 100))
print('%s, %s: %d objects missed' % (classes[k], leng[i], np.sum(np.isinf(D))))
ax.legend(lengs)
plt.xlabel('Average distance threshold in meter (symmetry)')
plt.ylabel('accuracy')
ax.set_title(classes[k])
# distance non-symmetry
ax = fig.add_subplot(2, 3, 2)
lengs = []
for i in index_plot:
D = distances_non[index, i]
ind = np.where(D > max_distance)[0]
D[ind] = np.inf
d = np.sort(D)
n = len(d)
accuracy = np.cumsum(np.ones((n, ), np.float32)) / n
plt.plot(d, accuracy, color[i], linewidth=2)
ADD[k, i] = VOCap(d, accuracy)
lengs.append('%s (%.2f)' % (leng[i], ADD[k, i] * 100))
print('%s, %s: %d objects missed' % (classes[k], leng[i], np.sum(np.isinf(D))))
ax.legend(lengs)
plt.xlabel('Average distance threshold in meter (non-symmetry)')
plt.ylabel('accuracy')
ax.set_title(classes[k])
# translation
ax = fig.add_subplot(2, 3, 3)
lengs = []
for i in index_plot:
D = errors_translation[index, i]
ind = np.where(D > max_distance)[0]
D[ind] = np.inf
d = np.sort(D)
n = len(d)
accuracy = np.cumsum(np.ones((n, ), np.float32)) / n
plt.plot(d, accuracy, color[i], linewidth=2)
TS[k, i] = VOCap(d, accuracy)
lengs.append('%s (%.2f)' % (leng[i], TS[k, i] * 100))
print('%s, %s: %d objects missed' % (classes[k], leng[i], np.sum(np.isinf(D))))
ax.legend(lengs)
plt.xlabel('Translation threshold in meter')
plt.ylabel('accuracy')
ax.set_title(classes[k])
# rotation histogram
count = 4
for i in index_plot:
ax = fig.add_subplot(2, 3, count)
D = errors_rotation[index, i]
ind = np.where(np.isfinite(D))[0]
D = D[ind]
ax.hist(D, bins=range(0, 190, 10), range=(0, 180))
plt.xlabel('Rotation angle error')
plt.ylabel('count')
ax.set_title(leng[i])
count += 1
# mng = plt.get_current_fig_manager()
# mng.full_screen_toggle()
filename = output_dir + '/' + classes[k] + '.png'
plt.savefig(filename)
# plt.show()
# print ADD
print('==================ADD======================')
for k in range(len(classes)):
print('%s: %f' % (classes[k], ADD[k, 0]))
for k in range(len(classes)-1):
print('%f' % (ADD[k+1, 0]))
print('%f' % (ADD[0, 0]))
print(cfg.TRAIN.SNAPSHOT_INFIX)
print('===========================================')
# print ADD-S
print('==================ADD-S====================')
for k in range(len(classes)):
print('%s: %f' % (classes[k], ADDS[k, 0]))
for k in range(len(classes)-1):
print('%f' % (ADDS[k+1, 0]))
print('%f' % (ADDS[0, 0]))
print(cfg.TRAIN.SNAPSHOT_INFIX)
print('===========================================')
# print ADD
print('==================ADD refined======================')
for k in range(len(classes)):
print('%s: %f' % (classes[k], ADD[k, 1]))
for k in range(len(classes)-1):
print('%f' % (ADD[k+1, 1]))
print('%f' % (ADD[0, 1]))
print(cfg.TRAIN.SNAPSHOT_INFIX)
print('===========================================')
# print ADD-S
print('==================ADD-S refined====================')
for k in range(len(classes)):
print('%s: %f' % (classes[k], ADDS[k, 1]))
for k in range(len(classes)-1):
print('%f' % (ADDS[k+1, 1]))
print('%f' % (ADDS[0, 1]))
print(cfg.TRAIN.SNAPSHOT_INFIX)
print('===========================================')
================================================
FILE: lib/fcn/__init__.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
================================================
FILE: lib/fcn/config.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
"""PoseCNN config system.
This file specifies default config options for Fast R-CNN. You should not
change values in this file. Instead, you should write a config file (in yaml)
and use cfg_from_file(yaml_file) to load it and override the default options.
Most tools in $ROOT/tools take a --cfg option to specify an override file.
- See tools/{train,test}_net.py for example code that uses cfg_from_file()
- See experiments/cfgs/*.yml for example YAML config override files
"""
import os
import os.path as osp
import numpy as np
import math
# `pip install easydict` if you don't have it
from easydict import EasyDict as edict
import logging
__C = edict()
# Consumers can get config by:
# from fast_rcnn_config import cfg
cfg = __C
__C.FLIP_X = False
__C.INPUT = 'RGBD'
__C.NETWORK = 'VGG16'
__C.RIG = ''
__C.CAD = ''
__C.POSE = ''
__C.BACKGROUND = ''
__C.USE_GPU_NMS = True
__C.MODE = 'TRAIN'
__C.INTRINSICS = ()
__C.DATA_PATH = ''
__C.FLOW_HEIGHT = 512
__C.FLOW_WIDTH = 640
# Anchor scales for RPN
__C.ANCHOR_SCALES = (8,16,32)
# Anchor ratios for RPN
__C.ANCHOR_RATIOS = (0.5,1,2)
__C.FEATURE_STRIDE = 16
__C.gpu_id = 0
__C.instance_id = 0
#
# Training options
#
__C.TRAIN = edict()
__C.TRAIN.WEIGHT_DECAY = 0.0001
__C.TRAIN.SEGMENTATION = True
__C.TRAIN.ITERNUM = 4
__C.TRAIN.HEATUP = 4
__C.TRAIN.GPUNUM = 1
__C.TRAIN.CLASSES = (0,1,2,3)
__C.TRAIN.SYMMETRY = (0,0,0,0)
__C.TRAIN.SLIM = False
__C.TRAIN.SINGLE_FRAME = False
__C.TRAIN.TRAINABLE = True
__C.TRAIN.VERTEX_REG = True
__C.TRAIN.VERTEX_REG_DELTA = False
__C.TRAIN.POSE_REG = True
__C.TRAIN.LABEL_W = 1.0
__C.TRAIN.VERTEX_W = 1.0
__C.TRAIN.VERTEX_W_INSIDE = 10.0
__C.TRAIN.POSE_W = 1.0
__C.TRAIN.BOX_W = 1.0
__C.TRAIN.HARD_LABEL_THRESHOLD = 1.0
__C.TRAIN.HARD_LABEL_SAMPLING = 1.0
__C.TRAIN.HARD_ANGLE = 15.0
__C.TRAIN.VISUALIZE = False
__C.TRAIN.GAN = False
__C.TRAIN.MATCHING = False
__C.TRAIN.NOISE_LEVEL = 0.05
__C.TRAIN.FREEZE_LAYERS = True
__C.TRAIN.MAX_ITERS_PER_EPOCH = 1000000
__C.TRAIN.UNIFORM_POSE_INTERVAL = 15
__C.TRAIN.AFFINE = False
# Hough voting
__C.TRAIN.HOUGH_LABEL_THRESHOLD = 100
__C.TRAIN.HOUGH_VOTING_THRESHOLD = -1
__C.TRAIN.HOUGH_SKIP_PIXELS = -1
__C.TRAIN.HOUGH_INLIER_THRESHOLD = 0.9
# synthetic training
__C.TRAIN.SYNTHESIZE = False
__C.TRAIN.SYN_ONLINE = False
__C.TRAIN.SYN_WIDTH = 640
__C.TRAIN.SYN_HEIGHT = 480
__C.TRAIN.SYNROOT = '/var/Projects/Deep_Pose/data/LOV/data_syn/'
if not os.path.exists(__C.TRAIN.SYNROOT):
__C.TRAIN.SYNROOT = '/home/yuxiang/Projects/Deep_Pose/data/LOV/data_syn/'
__C.TRAIN.SYNITER = 0
__C.TRAIN.SYNNUM = 80000
__C.TRAIN.SYN_RATIO = 1
__C.TRAIN.SYN_CLASS_INDEX = 1
__C.TRAIN.SYN_TNEAR = 0.5
__C.TRAIN.SYN_TFAR = 2.0
__C.TRAIN.SYN_BACKGROUND_SPECIFIC = False
__C.TRAIN.SYN_BACKGROUND_SUBTRACT_MEAN = False
__C.TRAIN.SYN_BACKGROUND_CONSTANT_PROB = 0.1
__C.TRAIN.SYN_BACKGROUND_AFFINE = False
__C.TRAIN.SYN_SAMPLE_OBJECT = True
__C.TRAIN.SYN_SAMPLE_POSE = True
__C.TRAIN.SYN_STD_ROTATION = 15
__C.TRAIN.SYN_STD_TRANSLATION = 0.05
__C.TRAIN.SYN_MIN_OBJECT = 5
__C.TRAIN.SYN_MAX_OBJECT = 8
__C.TRAIN.SYN_TNEAR = 0.5
__C.TRAIN.SYN_TFAR = 2.0
__C.TRAIN.SYN_BOUND = 0.4
__C.TRAIN.SYN_SAMPLE_DISTRACTOR = True
__C.TRAIN.SYN_CROP = False
__C.TRAIN.SYN_CROP_SIZE = 224
__C.TRAIN.SYN_TABLE_PROB = 0.8
# autoencoder
__C.TRAIN.BOOSTRAP_PIXELS = 20
# domain adaptation
__C.TRAIN.ADAPT = False
__C.TRAIN.ADAPT_ROOT = ''
__C.TRAIN.ADAPT_NUM = 400
__C.TRAIN.ADAPT_RATIO = 1
__C.TRAIN.ADAPT_WEIGHT = 0.1
# learning rate
__C.TRAIN.OPTIMIZER = 'MOMENTUM'
__C.TRAIN.LEARNING_RATE = 0.0001
__C.TRAIN.MILESTONES = (100, 150, 200)
__C.TRAIN.MOMENTUM = 0.9
__C.TRAIN.BETA = 0.999
__C.TRAIN.GAMMA = 0.1
__C.TRAIN.SYMSIZE = 0
# voxel grid size
__C.TRAIN.GRID_SIZE = 256
# Scales to compute real features
__C.TRAIN.SCALES_BASE = (0.25, 0.5, 1.0, 2.0, 3.0)
# parameters for data augmentation
__C.TRAIN.CHROMATIC = True
__C.TRAIN.ADD_NOISE = False
# Images to use per minibatch
__C.TRAIN.IMS_PER_BATCH = 2
__C.TRAIN.NUM_STEPS = 5
__C.TRAIN.NUM_UNITS = 64
# Use horizontally-flipped images during training?
__C.TRAIN.USE_FLIPPED = True
# Iterations between snapshots
__C.TRAIN.SNAPSHOT_EPOCHS = 1
# solver.prototxt specifies the snapshot path prefix, this adds an optional
# infix to yield the path: [_]_iters_XYZ.caffemodel
__C.TRAIN.SNAPSHOT_PREFIX = 'caffenet_fast_rcnn'
__C.TRAIN.SNAPSHOT_INFIX = ''
__C.TRAIN.DISPLAY = 20
__C.TRAIN.ITERS = 0
# Whether to add ground truth boxes to the pool when sampling regions
__C.TRAIN.USE_GT = False
# Minibatch size (number of regions of interest [ROIs])
__C.TRAIN.BATCH_SIZE = 128
# Fraction of minibatch that is labeled foreground (i.e. class > 0)
__C.TRAIN.FG_FRACTION = 0.25
# Overlap threshold for a ROI to be considered foreground (if >= FG_THRESH)
__C.TRAIN.FG_THRESH = 0.5
__C.TRAIN.FG_THRESH_POSE = 0.2
# Overlap threshold for a ROI to be considered background (class = 0 if
# overlap in [LO, HI))
__C.TRAIN.BG_THRESH_HI = 0.5
__C.TRAIN.BG_THRESH_LO = 0.1
# Use RPN to detect objects
__C.TRAIN.HAS_RPN = True
# IOU >= thresh: positive example
__C.TRAIN.RPN_POSITIVE_OVERLAP = 0.7
# IOU < thresh: negative example
__C.TRAIN.RPN_NEGATIVE_OVERLAP = 0.3
# If an anchor satisfied by positive and negative conditions set to negative
__C.TRAIN.RPN_CLOBBER_POSITIVES = False
# Max number of foreground examples
__C.TRAIN.RPN_FG_FRACTION = 0.5
# Total number of examples
__C.TRAIN.RPN_BATCHSIZE = 256
# NMS threshold used on RPN proposals
__C.TRAIN.RPN_NMS_THRESH = 0.7
# Number of top scoring boxes to keep before apply NMS to RPN proposals
__C.TRAIN.RPN_PRE_NMS_TOP_N = 12000
# Number of top scoring boxes to keep after applying NMS to RPN proposals
__C.TRAIN.RPN_POST_NMS_TOP_N = 2000
# Deprecated (outside weights)
__C.TRAIN.RPN_BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0)
# Give the positive RPN examples weight of p * 1 / {num positives}
# and give negatives a weight of (1 - p)
# Set to -1.0 to use uniform example weighting
__C.TRAIN.RPN_POSITIVE_WEIGHT = -1.0
# Normalize the targets (subtract empirical mean, divide by empirical stddev)
__C.TRAIN.BBOX_NORMALIZE_TARGETS = True
# Deprecated (inside weights)
__C.TRAIN.BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0)
# Normalize the targets using "precomputed" (or made up) means and stdevs
# (BBOX_NORMALIZE_TARGETS must also be True)
__C.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED = True
__C.TRAIN.BBOX_NORMALIZE_MEANS = (0.0, 0.0, 0.0, 0.0)
__C.TRAIN.BBOX_NORMALIZE_STDS = (0.1, 0.1, 0.2, 0.2)
#
# Testing options
#
__C.TEST = edict()
__C.TEST.GLOBAL_SEARCH = False
__C.TEST.SEGMENTATION = True
__C.TEST.SINGLE_FRAME = False
__C.TEST.VERTEX_REG_2D = False
__C.TEST.VERTEX_REG_3D = False
__C.TEST.VISUALIZE = False
__C.TEST.RANSAC = False
__C.TEST.GAN = False
__C.TEST.POSE_REG = False
__C.TEST.POSE_REFINE = False
__C.TEST.POSE_SDF = True
__C.TEST.POSE_CODEBOOK = False
__C.TEST.SYNTHESIZE = False
__C.TEST.ROS_CAMERA = 'camera'
__C.TEST.DET_THRESHOLD = 0.5
__C.TEST.BUILD_CODEBOOK = False
__C.TEST.IMS_PER_BATCH = 1
__C.TEST.MEAN_SHIFT = False
__C.TEST.CHECK_SIZE = False
__C.TEST.NUM_SDF_ITERATIONS_INIT = 100
__C.TEST.NUM_SDF_ITERATIONS_TRACKING = 50
__C.TEST.SDF_TRANSLATION_REG = 10.0
__C.TEST.SDF_ROTATION_REG = 0.1
__C.TEST.NUM_LOST = 3
__C.TEST.ALIGN_Z_AXIS = False
__C.TEST.GEN_DATA = False
# Hough voting
__C.TEST.HOUGH_LABEL_THRESHOLD = 100
__C.TEST.HOUGH_VOTING_THRESHOLD = -1
__C.TEST.HOUGH_SKIP_PIXELS = -1
__C.TEST.HOUGH_INLIER_THRESHOLD = 0.9
__C.TEST.CLASSES = (0,1,2,3)
__C.TEST.SYMMETRY = (0,0,0,0)
__C.TEST.ITERNUM = 4
# Scales to compute real features
__C.TEST.SCALES_BASE = (0.25, 0.5, 1.0, 2.0, 3.0)
# voxel grid size
__C.TEST.GRID_SIZE = 256
## NMS threshold used on RPN proposals
__C.TEST.RPN_NMS_THRESH = 0.7
# Number of top scoring boxes to keep before apply NMS to RPN proposals
__C.TEST.RPN_PRE_NMS_TOP_N = 6000
# Number of top scoring boxes to keep after applying NMS to RPN proposals
__C.TEST.RPN_POST_NMS_TOP_N = 300
# Test using bounding-box regressors
__C.TEST.BBOX_REG = True
# Overlap threshold used for non-maximum suppression (suppress boxes with
# IoU >= this threshold)
__C.TEST.NMS = 0.3
# Pixel mean values (BGR order) as a (1, 1, 3) array
# These are the values originally used for training VGG16
__C.PIXEL_MEANS = np.array([[[102.9801, 115.9465, 122.7717]]])
# For reproducibility
__C.RNG_SEED = 3
# A small number that's used many times
__C.EPS = 1e-14
# Root directory of project
__C.ROOT_DIR = osp.abspath(osp.join(osp.dirname(__file__), '..', '..'))
# Place outputs under an experiments directory
__C.EXP_DIR = 'default'
# Default GPU device id
__C.GPU_ID = 0
def get_output_dir(imdb, net):
"""Return the directory where experimental artifacts are placed.
A canonical path is built using the name from an imdb and a network
(if not None).
"""
path = osp.abspath(osp.join(__C.ROOT_DIR, 'output', __C.EXP_DIR, imdb.name))
if net is None:
return path
else:
return osp.join(path, net)
def _merge_a_into_b(a, b):
"""Merge config dictionary a into config dictionary b, clobbering the
options in b whenever they are also specified in a.
"""
if type(a) is not edict:
return
for k, v in a.items():
# a must specify keys that are in b
if k not in b:
raise KeyError('{} is not a valid config key'.format(k))
# the types must match, too
if type(b[k]) is not type(v):
raise ValueError(('Type mismatch ({} vs. {}) '
'for config key: {}').format(type(b[k]),
type(v), k))
# recursively merge dicts
if type(v) is edict:
try:
_merge_a_into_b(a[k], b[k])
except:
print('Error under config key: {}'.format(k))
raise
else:
b[k] = v
def cfg_from_file(filename):
"""Load a config file and merge it into the default options."""
import yaml
with open(filename, 'r') as f:
yaml_cfg = edict(yaml.load(f))
_merge_a_into_b(yaml_cfg, __C)
def yaml_from_file(filename):
"""Load a config file and merge it into the default options."""
import yaml
with open(filename, 'r') as f:
yaml_cfg = edict(yaml.load(f))
return yaml_cfg
================================================
FILE: lib/fcn/render_utils.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import torch
import time
import sys, os
import numpy as np
import cv2
import matplotlib.pyplot as plt
from fcn.config import cfg
from transforms3d.quaternions import quat2mat
def render_image(dataset, im, rois, poses, poses_refine, labels):
# label image
label_image = dataset.labels_to_image(labels)
im_label = im[:, :, (2, 1, 0)].copy()
I = np.where(labels != 0)
im_label[I[0], I[1], :] = 0.5 * label_image[I[0], I[1], :] + 0.5 * im_label[I[0], I[1], :]
num = poses.shape[0]
classes = dataset._classes_test
class_colors = dataset._class_colors_test
cls_indexes = []
poses_all = []
poses_refine_all = []
for i in range(num):
if cfg.MODE == 'TEST':
cls_index = int(rois[i, 1]) - 1
else:
cls_index = cfg.TRAIN.CLASSES[int(rois[i, 1])] - 1
if cls_index < 0:
continue
cls_indexes.append(cls_index)
qt = np.zeros((7, ), dtype=np.float32)
qt[:3] = poses[i, 4:7]
qt[3:] = poses[i, :4]
poses_all.append(qt.copy())
if cfg.TEST.POSE_REFINE and poses_refine is not None:
qt[:3] = poses_refine[i, 4:7]
qt[3:] = poses_refine[i, :4]
poses_refine_all.append(qt.copy())
cls = int(rois[i, 1])
print(classes[cls], rois[i, -1], cls_index)
if cls > 0 and rois[i, -1] > cfg.TEST.DET_THRESHOLD:
# draw roi
x1 = rois[i, 2]
y1 = rois[i, 3]
x2 = rois[i, 4]
y2 = rois[i, 5]
cv2.rectangle(im_label, (x1, y1), (x2, y2), class_colors[cls], 2)
# draw center
cx = int((x1 + x2) / 2)
cy = int((y1 + y2) / 2)
cv2.circle(im_label, (cx, cy), 2, (255, 255, 0), 10)
# rendering
if len(cls_indexes) > 0 and cfg.TRAIN.POSE_REG:
height = im.shape[0]
width = im.shape[1]
intrinsic_matrix = dataset._intrinsic_matrix
fx = intrinsic_matrix[0, 0]
fy = intrinsic_matrix[1, 1]
px = intrinsic_matrix[0, 2]
py = intrinsic_matrix[1, 2]
zfar = 10.0
znear = 0.01
image_tensor = torch.cuda.FloatTensor(height, width, 4)
seg_tensor = torch.cuda.FloatTensor(height, width, 4)
# set renderer
cfg.renderer.set_light_pos([0, 0, 0])
cfg.renderer.set_light_color([1, 1, 1])
cfg.renderer.set_projection_matrix(width, height, fx, fy, px, py, znear, zfar)
# pose
cfg.renderer.set_poses(poses_all)
frame = cfg.renderer.render(cls_indexes, image_tensor, seg_tensor)
image_tensor = image_tensor.flip(0)
im_render = image_tensor.cpu().numpy()
im_render = np.clip(im_render, 0, 1)
im_render = im_render[:, :, :3] * 255
im_render = im_render.astype(np.uint8)
im_output = 0.8 * im[:,:,(2, 1, 0)].astype(np.float32) + 1.0 * im_render.astype(np.float32)
im_output = np.clip(im_output, 0, 255)
# pose refine
if cfg.TEST.POSE_REFINE and poses_refine is not None:
cfg.renderer.set_poses(poses_refine_all)
frame = cfg.renderer.render(cls_indexes, image_tensor, seg_tensor)
image_tensor = image_tensor.flip(0)
im_render = image_tensor.cpu().numpy()
im_render = np.clip(im_render, 0, 1)
im_render = im_render[:, :, :3] * 255
im_render = im_render.astype(np.uint8)
im_output_refine = 0.8 * im[:,:,(2, 1, 0)].astype(np.float32) + 1.0 * im_render.astype(np.float32)
im_output_refine = np.clip(im_output_refine, 0, 255)
im_output_refine = im_output_refine.astype(np.uint8)
else:
im_output_refine = im_output.copy()
else:
im_output = 0.4 * im[:,:,(2, 1, 0)]
im_output_refine = im_output.copy()
return im_output.astype(np.uint8), im_output_refine.astype(np.uint8), im_label
================================================
FILE: lib/fcn/test_common.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import torch
import time
import sys, os
import numpy as np
import posecnn_cuda
import matplotlib.pyplot as plt
from transforms3d.quaternions import mat2quat, quat2mat
from matplotlib.patches import Circle
from fcn.config import cfg
def compute_index_sdf(rois):
num = rois.shape[0]
index_sdf = []
for i in range(num):
cls = int(rois[i, 1])
if cls == 0:
continue
if cfg.TRAIN.CLASSES[cls] not in cfg.TEST.CLASSES:
continue
if rois[i, -1] < cfg.TEST.DET_THRESHOLD:
continue
index_sdf.append(i)
return index_sdf
# SDF refinement
def refine_pose(im_label, im_depth, rois, poses, meta_data, dataset, visualize=False):
start_time = time.time()
width = im_depth.shape[1]
height = im_depth.shape[0]
sdf_optim = cfg.sdf_optimizer
steps = cfg.TEST.NUM_SDF_ITERATIONS_TRACKING
index_sdf = compute_index_sdf(rois)
# backproject depth
intrinsic_matrix = meta_data[0, :9].cpu().numpy().reshape((3, 3))
fx = intrinsic_matrix[0, 0]
fy = intrinsic_matrix[1, 1]
px = intrinsic_matrix[0, 2]
py = intrinsic_matrix[1, 2]
zfar = 6.0
znear = 0.01
im_pcloud = posecnn_cuda.backproject_forward(fx, fy, px, py, im_depth)[0]
dpoints = im_pcloud[:,:,:3].cpu().numpy().reshape((-1, 3))
# rendering
num = len(index_sdf)
poses_all = []
cls_indexes = []
for i in range(num):
ind = index_sdf[i]
cls = int(rois[ind, 1])
cls_render = cfg.TEST.CLASSES.index(cfg.TRAIN.CLASSES[cls]) - 1
cls_indexes.append(cls_render)
qt = np.zeros((7, ), dtype=np.float32)
qt[3:] = poses[ind, :4]
qt[:3] = poses[ind, 4:]
poses_all.append(qt)
cfg.renderer.set_poses(poses_all)
cfg.renderer.set_projection_matrix(width, height, fx, fy, px, py, znear, zfar)
image_tensor = torch.cuda.FloatTensor(height, width, 4).detach()
seg_tensor = torch.cuda.FloatTensor(height, width, 4).detach()
pcloud_tensor = torch.cuda.FloatTensor(height, width, 4).detach()
cfg.renderer.render(cls_indexes, image_tensor, seg_tensor, pc2_tensor=pcloud_tensor)
pcloud_tensor = pcloud_tensor.flip(0)
pcloud = pcloud_tensor[:,:,:3].cpu().numpy().reshape((-1, 3))
# refine translation
poses_t = poses.copy()
for i in range(num):
ind = index_sdf[i]
cls = int(rois[ind, 1])
cls_render = cfg.TEST.CLASSES.index(cfg.TRAIN.CLASSES[cls]) - 1
x1 = max(int(rois[ind, 2]), 0)
y1 = max(int(rois[ind, 3]), 0)
x2 = min(int(rois[ind, 4]), width-1)
y2 = min(int(rois[ind, 5]), height-1)
labels = torch.zeros_like(im_label)
labels[y1:y2, x1:x2] = im_label[y1:y2, x1:x2]
labels = labels.cpu().numpy().reshape((width * height, ))
index = np.where((labels == cls) & np.isfinite(dpoints[:, 0]) & (pcloud[:, 0] != 0) & (dpoints[:, 0] != 0))[0]
if len(index) > 10:
T = np.mean(dpoints[index, :] - pcloud[index, :], axis=0)
z_new = poses[ind, 6] + T[2]
poses_t[ind, 6] = z_new
poses_t[ind, 4] = (poses[ind, 4] / poses[ind, 6]) * z_new
poses_t[ind, 5] = (poses[ind, 5] / poses[ind, 6]) * z_new
print('object {}, class {}, z {}, z new {}'.format(i, dataset._classes_test[cls_render+1], poses[ind, 6], z_new))
if visualize:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
image_tensor = image_tensor.flip(0)
im = image_tensor.cpu().numpy() * 255
im = im.astype(np.uint8)
plt.imshow(im)
plt.show()
# compare the depth
depth_meas_roi = im_pcloud[:, :, 2]
mask_depth_meas = depth_meas_roi > 0
mask_depth_valid = torch.isfinite(depth_meas_roi)
# prepare data
T_oc_init = np.zeros((num, 4, 4), dtype=np.float32)
cls_index = torch.cuda.FloatTensor(0, 1)
obj_index = torch.cuda.FloatTensor(0, 1)
pix_index = torch.cuda.LongTensor(0, 2)
for i in range(num):
# pose
ind = index_sdf[i]
pose = poses_t[ind, :].copy()
T_co = np.eye(4, dtype=np.float32)
T_co[:3, :3] = quat2mat(pose[:4])
T_co[:3, 3] = pose[4:]
T_oc_init[i] = np.linalg.inv(T_co)
# filter out points very far away
z = float(pose[6])
roi = rois[ind, :].copy()
cls = int(roi[1])
extent = 1.0 * np.mean(dataset._extents[cls, :])
mask_distance = torch.abs(depth_meas_roi - z) < extent
# mask label
cls_render = cfg.TEST.CLASSES.index(cfg.TRAIN.CLASSES[cls]) - 1
w = roi[4] - roi[2]
h = roi[5] - roi[3]
x1 = max(int(roi[2] - w / 2), 0)
y1 = max(int(roi[3] - h / 2), 0)
x2 = min(int(roi[4] + w / 2), width - 1)
y2 = min(int(roi[5] + h / 2), height - 1)
if im_label is not None:
labels = torch.zeros_like(im_label)
labels[y1:y2, x1:x2] = im_label[y1:y2, x1:x2]
mask_label = labels == cls
else:
mask_label = torch.zeros_like(mask_depth_meas)
mask_label[y1:y2, x1:x2] = 1
mask = mask_label * mask_depth_meas * mask_depth_valid * mask_distance
index_p = torch.nonzero(mask)
n = index_p.shape[0]
if n > 100:
pix_index = torch.cat((pix_index, index_p), dim=0)
index = cls_render * torch.ones((n, 1), dtype=torch.float32, device=0)
cls_index = torch.cat((cls_index, index), dim=0)
index = i * torch.ones((n, 1), dtype=torch.float32, device=0)
obj_index = torch.cat((obj_index, index), dim=0)
print('sdf {} points for object {}, class {} {}'.format(n, i, cls_render, dataset._classes_test[cls_render+1]))
else:
print('sdf {} points for object {}, class {} {}, no refinement'.format(n, i, cls_render, dataset._classes_test[cls_render+1]))
if visualize and n <= 100:
fig = plt.figure()
ax = fig.add_subplot(2, 3, 1)
plt.imshow(mask_label.cpu().numpy())
ax.set_title('mask label')
ax = fig.add_subplot(2, 3, 2)
plt.imshow(mask_depth_meas.cpu().numpy())
ax.set_title('mask_depth_meas')
ax = fig.add_subplot(2, 3, 3)
plt.imshow(mask_depth_valid.cpu().numpy())
ax.set_title('mask_depth_valid')
ax = fig.add_subplot(2, 3, 4)
plt.imshow(mask_distance.cpu().numpy())
ax.set_title('mask_distance')
print(extent, z)
ax = fig.add_subplot(2, 3, 5)
plt.imshow(depth_meas_roi.cpu().numpy())
ax.set_title('depth')
plt.show()
# data
n = pix_index.shape[0]
print('sdf with {} points'.format(n))
if n == 0:
return poses_t.copy()
points = im_pcloud[pix_index[:, 0], pix_index[:, 1], :]
points = torch.cat((points, cls_index, obj_index), dim=1)
T_oc_opt = sdf_optim.refine_pose_layer(T_oc_init, points, steps=steps)
# collect poses
poses_refined = poses_t.copy()
for i in range(num):
RT_opt = T_oc_opt[i]
ind = index_sdf[i]
if RT_opt[3, 3] > 0:
RT_opt = np.linalg.inv(RT_opt)
poses_refined[ind, :4] = mat2quat(RT_opt[:3, :3])
poses_refined[ind, 4:] = RT_opt[:3, 3]
if visualize:
points = points.cpu().numpy()
for i in range(num):
ind = index_sdf[i]
roi = rois[ind, :].copy()
cls = int(roi[1])
cls = cfg.TEST.CLASSES.index(cfg.TRAIN.CLASSES[cls])
T_co_init = np.linalg.inv(T_oc_init[i])
pose = poses_refined[ind, :].copy()
T_co_opt = np.eye(4, dtype=np.float32)
T_co_opt[:3, :3] = quat2mat(pose[:4])
T_co_opt[:3, 3] = pose[4:]
index = np.where(points[:, 4] == i)[0]
if len(index) == 0:
continue
pts = points[index, :4].copy()
pts[:, 3] = 1.0
# show points
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection='3d')
points_obj = dataset._points_all_test[cls, :, :]
points_init = np.matmul(np.linalg.inv(T_co_init), pts.transpose()).transpose()
points_opt = np.matmul(np.linalg.inv(T_co_opt), pts.transpose()).transpose()
ax.scatter(points_obj[::5, 0], points_obj[::5, 1], points_obj[::5, 2], color='yellow')
ax.scatter(points_init[::5, 0], points_init[::5, 1], points_init[::5, 2], color='red')
ax.scatter(points_opt[::5, 0], points_opt[::5, 1], points_opt[::5, 2], color='blue')
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')
ax.set_xlim(sdf_optim.xmins[cls-1], sdf_optim.xmaxs[cls-1])
ax.set_ylim(sdf_optim.ymins[cls-1], sdf_optim.ymaxs[cls-1])
ax.set_zlim(sdf_optim.zmins[cls-1], sdf_optim.zmaxs[cls-1])
ax.set_title(dataset._classes_test[cls])
plt.show()
print('pose refine time %.6f' % (time.time() - start_time))
return poses_refined
================================================
FILE: lib/fcn/test_dataset.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import torch
import torch.nn as nn
import time
import sys, os
import numpy as np
import cv2
import scipy
import matplotlib.pyplot as plt
from fcn.config import cfg
from fcn.test_common import refine_pose
from transforms3d.quaternions import mat2quat, quat2mat, qmult
from utils.se3 import *
from utils.nms import nms
from utils.pose_error import re, te
class AverageMeter(object):
"""Computes and stores the average and current value"""
def __init__(self):
self.reset()
def reset(self):
self.val = 0
self.avg = 0
self.sum = 0
self.count = 0
def update(self, val, n=1):
self.val = val
self.sum += val * n
self.count += n
self.avg = self.sum / self.count
def __repr__(self):
return '{:.3f} ({:.3f})'.format(self.val, self.avg)
def test(test_loader, background_loader, network, output_dir):
batch_time = AverageMeter()
epoch_size = len(test_loader)
enum_background = enumerate(background_loader)
# switch to test mode
network.eval()
for i, sample in enumerate(test_loader):
# if 'is_testing' in sample and sample['is_testing'] == 0:
# continue
end = time.time()
inputs = sample['image_color']
im_info = sample['im_info']
# add background
mask = sample['mask']
try:
_, background = next(enum_background)
except:
enum_background = enumerate(background_loader)
_, background = next(enum_background)
if inputs.size(0) != background['background_color'].size(0):
enum_background = enumerate(background_loader)
_, background = next(enum_background)
background_color = background['background_color'].cuda()
for j in range(inputs.size(0)):
is_syn = im_info[j, -1]
if is_syn:
inputs[j] = mask[j] * inputs[j] + (1 - mask[j]) * background_color[j]
labels = sample['label'].cuda()
meta_data = sample['meta_data'].cuda()
extents = sample['extents'][0, :, :].repeat(cfg.TRAIN.GPUNUM, 1, 1).cuda()
gt_boxes = sample['gt_boxes'].cuda()
poses = sample['poses'].cuda()
points = sample['points'][0, :, :, :].repeat(cfg.TRAIN.GPUNUM, 1, 1, 1).cuda()
symmetry = sample['symmetry'][0, :].repeat(cfg.TRAIN.GPUNUM, 1).cuda()
# compute output
if cfg.TRAIN.VERTEX_REG:
if cfg.TRAIN.POSE_REG:
out_label, out_vertex, rois, out_pose, out_quaternion = network(inputs, labels, meta_data, extents, gt_boxes, poses, points, symmetry)
# combine poses
rois = rois.detach().cpu().numpy()
out_pose = out_pose.detach().cpu().numpy()
out_quaternion = out_quaternion.detach().cpu().numpy()
num = rois.shape[0]
poses = out_pose.copy()
for j in range(num):
cls = int(rois[j, 1])
if cls >= 0:
qt = out_quaternion[j, 4*cls:4*cls+4]
qt = qt / np.linalg.norm(qt)
# allocentric to egocentric
poses[j, 4] *= poses[j, 6]
poses[j, 5] *= poses[j, 6]
T = poses[j, 4:]
poses[j, :4] = allocentric2egocentric(qt, T)
# non-maximum suppression within class
index = nms(rois, 0.5)
rois = rois[index, :]
poses = poses[index, :]
# refine pose
if cfg.TEST.POSE_REFINE:
im_depth = sample['im_depth'].numpy()[0]
depth_tensor = torch.from_numpy(im_depth).cuda().float()
labels_out = out_label[0]
poses_refined = refine_pose(labels_out, depth_tensor, rois, poses, sample['meta_data'], test_loader.dataset)
else:
poses_refined = []
else:
out_label, out_vertex, rois, out_pose = network(inputs, labels, meta_data, extents, gt_boxes, poses, points, symmetry)
rois = rois.detach().cpu().numpy()
out_pose = out_pose.detach().cpu().numpy()
poses = out_pose.copy()
poses_refined = []
# non-maximum suppression within class
index = nms(rois, 0.5)
rois = rois[index, :]
poses = poses[index, :]
else:
out_label = network(inputs, labels, meta_data, extents, gt_boxes, poses, points, symmetry)
out_vertex = []
rois = []
poses = []
poses_refined = []
if cfg.TEST.VISUALIZE:
_vis_test(inputs, labels, out_label, out_vertex, rois, poses, poses_refined, sample, \
test_loader.dataset._points_all, test_loader.dataset.classes, test_loader.dataset.class_colors)
# measure elapsed time
batch_time.update(time.time() - end)
if not cfg.TEST.VISUALIZE:
result = {'labels': out_label[0].detach().cpu().numpy(), 'rois': rois, 'poses': poses, 'poses_refined': poses_refined}
if 'video_id' in sample and 'image_id' in sample:
filename = os.path.join(output_dir, sample['video_id'][0] + '_' + sample['image_id'][0] + '.mat')
else:
result['meta_data_path'] = sample['meta_data_path']
print(result['meta_data_path'])
filename = os.path.join(output_dir, '%06d.mat' % i)
print(filename)
scipy.io.savemat(filename, result, do_compression=True)
print('[%d/%d], batch time %.2f' % (i, epoch_size, batch_time.val))
filename = os.path.join(output_dir, 'results_posecnn.mat')
if os.path.exists(filename):
os.remove(filename)
def _vis_test(inputs, labels, out_label, out_vertex, rois, poses, poses_refined, sample, points, classes, class_colors):
"""Visualize a mini-batch for debugging."""
import matplotlib.pyplot as plt
im_blob = inputs.cpu().numpy()
label_blob = labels.cpu().numpy()
label_pred = out_label.cpu().numpy()
gt_poses = sample['poses'].numpy()
meta_data_blob = sample['meta_data'].numpy()
metadata = meta_data_blob[0, :]
intrinsic_matrix = metadata[:9].reshape((3,3))
gt_boxes = sample['gt_boxes'].numpy()
extents = sample['extents'][0, :, :].numpy()
if cfg.TRAIN.VERTEX_REG or cfg.TRAIN.VERTEX_REG_DELTA:
vertex_targets = sample['vertex_targets'].numpy()
vertex_pred = out_vertex.detach().cpu().numpy()
m = 4
n = 4
for i in range(im_blob.shape[0]):
fig = plt.figure()
start = 1
# show image
im = im_blob[i, :, :, :].copy()
im = im.transpose((1, 2, 0)) * 255.0
im += cfg.PIXEL_MEANS
im = im[:, :, (2, 1, 0)]
im = im.astype(np.uint8)
ax = fig.add_subplot(m, n, 1)
plt.imshow(im)
ax.set_title('color')
start += 1
# show gt boxes
boxes = gt_boxes[i]
for j in range(boxes.shape[0]):
if boxes[j, 4] == 0:
continue
x1 = boxes[j, 0]
y1 = boxes[j, 1]
x2 = boxes[j, 2]
y2 = boxes[j, 3]
plt.gca().add_patch(
plt.Rectangle((x1, y1), x2-x1, y2-y1, fill=False, edgecolor='g', linewidth=3))
# show gt label
label_gt = label_blob[i, :, :, :]
label_gt = label_gt.transpose((1, 2, 0))
height = label_gt.shape[0]
width = label_gt.shape[1]
num_classes = label_gt.shape[2]
im_label_gt = np.zeros((height, width, 3), dtype=np.uint8)
for j in range(num_classes):
I = np.where(label_gt[:, :, j] > 0)
im_label_gt[I[0], I[1], :] = class_colors[j]
ax = fig.add_subplot(m, n, start)
start += 1
plt.imshow(im_label_gt)
ax.set_title('gt labels')
# show predicted label
label = label_pred[i, :, :]
height = label.shape[0]
width = label.shape[1]
im_label = np.zeros((height, width, 3), dtype=np.uint8)
for j in range(num_classes):
I = np.where(label == j)
im_label[I[0], I[1], :] = class_colors[j]
ax = fig.add_subplot(m, n, start)
start += 1
plt.imshow(im_label)
ax.set_title('predicted labels')
if cfg.TRAIN.VERTEX_REG or cfg.TRAIN.VERTEX_REG_DELTA:
# show predicted boxes
ax = fig.add_subplot(m, n, start)
start += 1
plt.imshow(im)
ax.set_title('predicted boxes')
for j in range(rois.shape[0]):
if rois[j, 0] != i or rois[j, -1] < cfg.TEST.DET_THRESHOLD:
continue
cls = rois[j, 1]
x1 = rois[j, 2]
y1 = rois[j, 3]
x2 = rois[j, 4]
y2 = rois[j, 5]
plt.gca().add_patch(
plt.Rectangle((x1, y1), x2-x1, y2-y1, fill=False, edgecolor=np.array(class_colors[int(cls)])/255.0, linewidth=3))
cx = (x1 + x2) / 2
cy = (y1 + y2) / 2
plt.plot(cx, cy, 'yo')
# show gt poses
ax = fig.add_subplot(m, n, start)
start += 1
ax.set_title('gt poses')
plt.imshow(im)
pose_blob = gt_poses[i]
for j in range(pose_blob.shape[0]):
if pose_blob[j, 0] == 0:
continue
cls = int(pose_blob[j, 1])
# extract 3D points
x3d = np.ones((4, points.shape[1]), dtype=np.float32)
x3d[0, :] = points[cls,:,0]
x3d[1, :] = points[cls,:,1]
x3d[2, :] = points[cls,:,2]
# projection
RT = np.zeros((3, 4), dtype=np.float32)
qt = pose_blob[j, 2:6]
T = pose_blob[j, 6:]
qt_new = allocentric2egocentric(qt, T)
RT[:3, :3] = quat2mat(qt_new)
RT[:, 3] = T
x2d = np.matmul(intrinsic_matrix, np.matmul(RT, x3d))
x2d[0, :] = np.divide(x2d[0, :], x2d[2, :])
x2d[1, :] = np.divide(x2d[1, :], x2d[2, :])
plt.plot(x2d[0, :], x2d[1, :], '.', color=np.divide(class_colors[cls], 255.0), alpha=0.1)
# show predicted poses
ax = fig.add_subplot(m, n, start)
start += 1
ax.set_title('predicted poses')
plt.imshow(im)
for j in range(rois.shape[0]):
if rois[j, 0] != i:
continue
cls = int(rois[j, 1])
if cls > 0:
print('%s: detection score %s' % (classes[cls], rois[j, -1]))
if rois[j, -1] > cfg.TEST.DET_THRESHOLD:
# extract 3D points
x3d = np.ones((4, points.shape[1]), dtype=np.float32)
x3d[0, :] = points[cls,:,0]
x3d[1, :] = points[cls,:,1]
x3d[2, :] = points[cls,:,2]
# projection
RT = np.zeros((3, 4), dtype=np.float32)
RT[:3, :3] = quat2mat(poses[j, :4])
RT[:, 3] = poses[j, 4:7]
x2d = np.matmul(intrinsic_matrix, np.matmul(RT, x3d))
x2d[0, :] = np.divide(x2d[0, :], x2d[2, :])
x2d[1, :] = np.divide(x2d[1, :], x2d[2, :])
plt.plot(x2d[0, :], x2d[1, :], '.', color=np.divide(class_colors[cls], 255.0), alpha=0.1)
# show predicted refined poses
if cfg.TEST.POSE_REFINE:
ax = fig.add_subplot(m, n, start)
start += 1
ax.set_title('predicted refined poses')
plt.imshow(im)
for j in range(rois.shape[0]):
if rois[j, 0] != i:
continue
cls = int(rois[j, 1])
if rois[j, -1] > cfg.TEST.DET_THRESHOLD:
# extract 3D points
x3d = np.ones((4, points.shape[1]), dtype=np.float32)
x3d[0, :] = points[cls,:,0]
x3d[1, :] = points[cls,:,1]
x3d[2, :] = points[cls,:,2]
# projection
RT = np.zeros((3, 4), dtype=np.float32)
RT[:3, :3] = quat2mat(poses_refined[j, :4])
RT[:, 3] = poses_refined[j, 4:7]
x2d = np.matmul(intrinsic_matrix, np.matmul(RT, x3d))
x2d[0, :] = np.divide(x2d[0, :], x2d[2, :])
x2d[1, :] = np.divide(x2d[1, :], x2d[2, :])
plt.plot(x2d[0, :], x2d[1, :], '.', color=np.divide(class_colors[cls], 255.0), alpha=0.1)
# show gt vertex targets
vertex_target = vertex_targets[i, :, :, :]
center = np.zeros((3, height, width), dtype=np.float32)
for j in range(1, num_classes):
index = np.where(label_gt[:, :, j] > 0)
if len(index[0]) > 0:
center[:, index[0], index[1]] = vertex_target[3*j:3*j+3, index[0], index[1]]
ax = fig.add_subplot(m, n, start)
start += 1
plt.imshow(center[0,:,:])
ax.set_title('gt center x')
ax = fig.add_subplot(m, n, start)
start += 1
plt.imshow(center[1,:,:])
ax.set_title('gt center y')
ax = fig.add_subplot(m, n, start)
start += 1
plt.imshow(np.exp(center[2,:,:]))
ax.set_title('gt z')
# show predicted vertex targets
vertex_target = vertex_pred[i, :, :, :]
center = np.zeros((3, height, width), dtype=np.float32)
for j in range(1, num_classes):
index = np.where(label == j)
if len(index[0]) > 0:
center[:, index[0], index[1]] = vertex_target[3*j:3*j+3, index[0], index[1]]
ax = fig.add_subplot(m, n, start)
start += 1
plt.imshow(center[0,:,:])
ax.set_title('predicted center x')
ax = fig.add_subplot(m, n, start)
start += 1
plt.imshow(center[1,:,:])
ax.set_title('predicted center y')
ax = fig.add_subplot(m, n, start)
start += 1
plt.imshow(np.exp(center[2,:,:]))
ax.set_title('predicted z')
plt.show()
================================================
FILE: lib/fcn/test_imageset.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import torch
import torch.nn.functional as F
import time
import sys, os
import cv2
import numpy as np
import matplotlib.pyplot as plt
from fcn.config import cfg
from fcn.render_utils import render_image
from fcn.test_common import refine_pose
from transforms3d.quaternions import mat2quat, quat2mat, qmult
from utils.se3 import *
from utils.nms import *
from utils.pose_error import re, te
def test_image(network, dataset, im_color, im_depth=None, im_index=None):
"""test on a single image"""
# compute image blob
im = im_color.astype(np.float32, copy=True)
im -= cfg.PIXEL_MEANS
height = im.shape[0]
width = im.shape[1]
im = np.transpose(im / 255.0, (2, 0, 1))
im = im[np.newaxis, :, :, :]
K = dataset._intrinsic_matrix
K[2, 2] = 1
Kinv = np.linalg.pinv(K)
meta_data = np.zeros((1, 18), dtype=np.float32)
meta_data[0, 0:9] = K.flatten()
meta_data[0, 9:18] = Kinv.flatten()
meta_data = torch.from_numpy(meta_data).cuda()
if im_depth is not None:
depth_tensor = torch.from_numpy(im_depth).cuda().float()
else:
depth_tensor = None
# transfer to GPU
inputs = torch.from_numpy(im).cuda()
# run network
if cfg.TRAIN.VERTEX_REG:
if cfg.TRAIN.POSE_REG:
out_label, out_vertex, rois, out_pose, out_quaternion = network(inputs, dataset.input_labels, meta_data, \
dataset.input_extents, dataset.input_gt_boxes, dataset.input_poses, dataset.input_points, dataset.input_symmetry)
labels = out_label[0]
# combine poses
rois = rois.detach().cpu().numpy()
out_pose = out_pose.detach().cpu().numpy()
out_quaternion = out_quaternion.detach().cpu().numpy()
num = rois.shape[0]
poses = out_pose.copy()
for j in range(num):
cls = int(rois[j, 1])
if cls >= 0:
qt = out_quaternion[j, 4*cls:4*cls+4]
qt = qt / np.linalg.norm(qt)
# allocentric to egocentric
poses[j, 4] *= poses[j, 6]
poses[j, 5] *= poses[j, 6]
T = poses[j, 4:]
poses[j, :4] = allocentric2egocentric(qt, T)
# filter out detections
index = np.where(rois[:, -1] > cfg.TEST.DET_THRESHOLD)[0]
rois = rois[index, :]
poses = poses[index, :]
# non-maximum suppression within class
index = nms(rois, 0.2)
rois = rois[index, :]
poses = poses[index, :]
# optimize depths
if cfg.TEST.POSE_REFINE and im_depth is not None:
poses_refined = refine_pose(labels, depth_tensor, rois, poses, meta_data, dataset)
else:
poses_refined = None
else:
# no pose regression
out_label, out_vertex, rois, out_pose = network(inputs, dataset.input_labels, meta_data, \
dataset.input_extents, dataset.input_gt_boxes, dataset.input_poses, dataset.input_points, dataset.input_symmetry)
labels = out_label[0]
rois = rois.detach().cpu().numpy()
out_pose = out_pose.detach().cpu().numpy()
poses = out_pose.copy()
# filter out detections
index = np.where(rois[:, -1] > cfg.TEST.DET_THRESHOLD)[0]
rois = rois[index, :]
poses = poses[index, :]
poses_refined = None
# non-maximum suppression within class
index = nms(rois, 0.2)
rois = rois[index, :]
poses = poses[index, :]
else:
# segmentation only
out_label = network(inputs, dataset.input_labels, dataset.input_meta_data, \
dataset.input_extents, dataset.input_gt_boxes, dataset.input_poses, dataset.input_points, dataset.input_symmetry)
labels = out_label[0]
rois = np.zeros((0, 7), dtype=np.float32)
poses = np.zeros((0, 7), dtype=np.float32)
poses_refined = None
im_pose, im_pose_refined, im_label = render_image(dataset, im_color, rois, poses, poses_refined, labels.cpu().numpy())
if cfg.TEST.VISUALIZE:
vis_test(dataset, im, im_depth, labels.cpu().numpy(), rois, poses, poses_refined, im_pose, im_pose_refined, out_vertex)
return im_pose, im_pose_refined, im_label, labels.cpu().numpy(), rois, poses, poses_refined
def vis_test(dataset, im, im_depth, label, rois, poses, poses_refined, im_pose, im_pose_refine, out_vertex=None, im_index=None):
"""Visualize a testing results."""
import matplotlib.pyplot as plt
num_classes = len(dataset._class_colors_test)
classes = dataset._classes_test
class_colors = dataset._class_colors_test
points = dataset._points_all_test
intrinsic_matrix = dataset._intrinsic_matrix
height = label.shape[0]
width = label.shape[1]
if out_vertex is not None:
vertex_pred = out_vertex.detach().cpu().numpy()
fig = plt.figure()
plot = 1
m = 2
n = 3
# show image
im = im[0, :, :, :].copy()
im = im.transpose((1, 2, 0)) * 255.0
im += cfg.PIXEL_MEANS
im = im[:, :, (2, 1, 0)]
im = im.astype(np.uint8)
ax = fig.add_subplot(m, n, plot)
plot += 1
plt.imshow(im)
plt.axis('off')
ax.set_title('input image')
# show predicted label
im_label = dataset.labels_to_image(label)
ax = fig.add_subplot(m, n, plot)
plot += 1
plt.imshow(im_label)
plt.axis('off')
ax.set_title('predicted labels')
if cfg.TRAIN.VERTEX_REG or cfg.TRAIN.VERTEX_REG_DELTA:
# show predicted boxes
ax = fig.add_subplot(m, n, plot)
plot += 1
plt.imshow(im)
plt.axis('off')
ax.set_title('predicted boxes')
for j in range(rois.shape[0]):
cls = rois[j, 1]
x1 = rois[j, 2]
y1 = rois[j, 3]
x2 = rois[j, 4]
y2 = rois[j, 5]
plt.gca().add_patch(
plt.Rectangle((x1, y1), x2-x1, y2-y1, fill=False, edgecolor=np.array(class_colors[int(cls)])/255.0, linewidth=3))
cx = (x1 + x2) / 2
cy = (y1 + y2) / 2
plt.plot(cx, cy, 'yo')
'''
# show predicted poses
if cfg.TRAIN.POSE_REG:
ax = fig.add_subplot(m, n, plot)
plot += 1
ax.set_title('predicted poses')
plt.imshow(im)
for j in range(rois.shape[0]):
cls = int(rois[j, 1])
print(classes[cls], rois[j, -1])
if cls > 0 and rois[j, -1] > cfg.TEST.DET_THRESHOLD:
# extract 3D points
x3d = np.ones((4, points.shape[1]), dtype=np.float32)
x3d[0, :] = points[cls,:,0]
x3d[1, :] = points[cls,:,1]
x3d[2, :] = points[cls,:,2]
# projection
RT = np.zeros((3, 4), dtype=np.float32)
RT[:3, :3] = quat2mat(poses[j, :4])
RT[:, 3] = poses[j, 4:7]
x2d = np.matmul(intrinsic_matrix, np.matmul(RT, x3d))
x2d[0, :] = np.divide(x2d[0, :], x2d[2, :])
x2d[1, :] = np.divide(x2d[1, :], x2d[2, :])
plt.plot(x2d[0, :], x2d[1, :], '.', color=np.divide(class_colors[cls], 255.0), alpha=0.5)
if out_vertex is not None:
# show predicted vertex targets
vertex_target = vertex_pred[0, :, :, :]
center = np.zeros((3, height, width), dtype=np.float32)
for j in range(1, dataset._num_classes):
index = np.where(label == j)
if len(index[0]) > 0:
center[0, index[0], index[1]] = vertex_target[3*j, index[0], index[1]]
center[1, index[0], index[1]] = vertex_target[3*j+1, index[0], index[1]]
center[2, index[0], index[1]] = np.exp(vertex_target[3*j+2, index[0], index[1]])
ax = fig.add_subplot(m, n, plot)
plot += 1
plt.imshow(center[0,:,:])
plt.axis('off')
ax.set_title('predicted center x')
ax = fig.add_subplot(m, n, plot)
plot += 1
plt.imshow(center[1,:,:])
plt.axis('off')
ax.set_title('predicted center y')
ax = fig.add_subplot(m, n, plot)
plot += 1
plt.imshow(center[2,:,:])
plt.axis('off')
ax.set_title('predicted z')
'''
# show depth
if im_depth is not None:
ax = fig.add_subplot(m, n, plot)
plot += 1
plt.imshow(im_depth)
plt.axis('off')
ax.set_title('input depth')
ax = fig.add_subplot(m, n, plot)
plot += 1
plt.imshow(im_pose)
plt.axis('off')
ax.set_title('estimated poses')
if cfg.TEST.POSE_REFINE and im_pose_refine is not None and im_depth is not None:
ax = fig.add_subplot(m, n, plot)
plot += 1
plt.imshow(im_pose_refine)
plt.axis('off')
ax.set_title('estimated poses refined')
if im_index is not None:
mng = plt.get_current_fig_manager()
mng.resize(*mng.window.maxsize())
plt.show(block=False)
plt.pause(1)
filename = 'output/images/%06d.png' % im_index
fig.savefig(filename)
plt.close()
else:
plt.show()
================================================
FILE: lib/fcn/train.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import torch
import torch.nn as nn
import time
import sys, os
import numpy as np
import matplotlib.pyplot as plt
from fcn.config import cfg
from transforms3d.quaternions import mat2quat, quat2mat, qmult
from utils.se3 import *
from utils.nms import *
from utils.pose_error import re, te
class AverageMeter(object):
"""Computes and stores the average and current value"""
def __init__(self):
self.reset()
def reset(self):
self.val = 0
self.avg = 0
self.sum = 0
self.count = 0
def update(self, val, n=1):
self.val = val
self.sum += val * n
self.count += n
self.avg = self.sum / self.count
def __repr__(self):
return '{:.3f} ({:.3f})'.format(self.val, self.avg)
def loss_cross_entropy(scores, labels):
"""
scores: a tensor [batch_size, num_classes, height, width]
labels: a tensor [batch_size, num_classes, height, width]
"""
cross_entropy = -torch.sum(labels * scores, dim=1)
loss = torch.div(torch.sum(cross_entropy), torch.sum(labels)+1e-10)
return loss
def smooth_l1_loss(vertex_pred, vertex_targets, vertex_weights, sigma=1.0):
sigma_2 = sigma ** 2
vertex_diff = vertex_pred - vertex_targets
diff = torch.mul(vertex_weights, vertex_diff)
abs_diff = torch.abs(diff)
smoothL1_sign = torch.lt(abs_diff, 1. / sigma_2).float().detach()
in_loss = torch.pow(diff, 2) * (sigma_2 / 2.) * smoothL1_sign \
+ (abs_diff - (0.5 / sigma_2)) * (1. - smoothL1_sign)
loss = torch.div( torch.sum(in_loss), torch.sum(vertex_weights) + 1e-10 )
return loss
#************************************#
# train PoseCNN #
#************************************#
'''
sample = {'image_color': im_blob,
'im_depth': im_depth,
'label': label_blob,
'mask': mask,
'meta_data': meta_data_blob,
'poses': pose_blob,
'extents': self._extents,
'points': self._point_blob,
'symmetry': self._symmetry,
'gt_boxes': gt_boxes,
'im_info': im_info,
'vertex_targets': vertex_targets,
'vertex_weights': vertex_weights}
'''
def train(train_loader, background_loader, network, optimizer, epoch):
batch_time = AverageMeter()
losses = AverageMeter()
epoch_size = len(train_loader)
enum_background = enumerate(background_loader)
pixel_mean = torch.from_numpy(cfg.PIXEL_MEANS.transpose(2, 0, 1) / 255.0).float()
# switch to train mode
network.train()
for i, sample in enumerate(train_loader):
end = time.time()
# prepare data
inputs = sample['image_color'].cuda()
im_info = sample['im_info']
mask = sample['mask'].cuda()
labels = sample['label'].cuda()
meta_data = sample['meta_data'].cuda()
extents = sample['extents'][0, :, :].repeat(cfg.TRAIN.GPUNUM, 1, 1).cuda()
gt_boxes = sample['gt_boxes'].cuda()
poses = sample['poses'].cuda()
points = sample['points'][0, :, :, :].repeat(cfg.TRAIN.GPUNUM, 1, 1, 1).cuda()
symmetry = sample['symmetry'][0, :].repeat(cfg.TRAIN.GPUNUM, 1).cuda()
if cfg.TRAIN.VERTEX_REG:
vertex_targets = sample['vertex_targets'].cuda()
vertex_weights = sample['vertex_weights'].cuda()
else:
vertex_targets = []
vertex_weights = []
# add background
try:
_, background = next(enum_background)
except:
enum_background = enumerate(background_loader)
_, background = next(enum_background)
if inputs.size(0) != background['background_color'].size(0):
enum_background = enumerate(background_loader)
_, background = next(enum_background)
background_color = background['background_color'].cuda()
for j in range(inputs.size(0)):
is_syn = im_info[j, -1]
if is_syn or np.random.rand(1) > 0.5:
inputs[j] = mask[j] * inputs[j] + (1 - mask[j]) * background_color[j]
# visualization
if cfg.TRAIN.VISUALIZE:
_vis_minibatch(inputs, background, labels, vertex_targets, sample, train_loader.dataset.class_colors)
# compute output
if cfg.TRAIN.VERTEX_REG:
if cfg.TRAIN.POSE_REG:
out_logsoftmax, out_weight, out_vertex, out_logsoftmax_box, \
bbox_labels, bbox_pred, bbox_targets, bbox_inside_weights, loss_pose_tensor, poses_weight \
= network(inputs, labels, meta_data, extents, gt_boxes, poses, points, symmetry)
loss_label = loss_cross_entropy(out_logsoftmax, out_weight)
loss_vertex = cfg.TRAIN.VERTEX_W * smooth_l1_loss(out_vertex, vertex_targets, vertex_weights)
loss_box = loss_cross_entropy(out_logsoftmax_box, bbox_labels)
loss_location = smooth_l1_loss(bbox_pred, bbox_targets, bbox_inside_weights)
loss_pose = torch.mean(loss_pose_tensor)
loss = loss_label + loss_vertex + loss_box + loss_location + loss_pose
else:
out_logsoftmax, out_weight, out_vertex, out_logsoftmax_box, \
bbox_labels, bbox_pred, bbox_targets, bbox_inside_weights \
= network(inputs, labels, meta_data, extents, gt_boxes, poses, points, symmetry)
loss_label = loss_cross_entropy(out_logsoftmax, out_weight)
loss_vertex = cfg.TRAIN.VERTEX_W * smooth_l1_loss(out_vertex, vertex_targets, vertex_weights)
loss_box = loss_cross_entropy(out_logsoftmax_box, bbox_labels)
loss_location = smooth_l1_loss(bbox_pred, bbox_targets, bbox_inside_weights)
loss = loss_label + loss_vertex + loss_box + loss_location
else:
out_logsoftmax, out_weight = network(inputs, labels, meta_data, extents, gt_boxes, poses, points, symmetry)
loss = loss_cross_entropy(out_logsoftmax, out_weight)
# record loss
losses.update(loss.data, inputs.size(0))
# compute gradient and do optimization step
optimizer.zero_grad()
loss.backward()
optimizer.step()
# measure elapsed time
batch_time.update(time.time() - end)
if cfg.TRAIN.VERTEX_REG:
if cfg.TRAIN.POSE_REG:
num_bg = torch.sum(bbox_labels[:, 0])
num_fg = torch.sum(torch.sum(bbox_labels[:, 1:], dim=1))
num_fg_pose = torch.sum(torch.sum(poses_weight[:, 4:], dim=1)) / 4
print('[%d/%d][%d/%d], %.4f, label %.4f, center %.4f, box %.4f (%03d, %03d), loc %.4f, pose %.4f (%03d), lr %.6f, time %.2f' \
% (epoch, cfg.epochs, i, epoch_size, loss.data, loss_label.data, loss_vertex.data, loss_box.data, num_fg.data, num_bg.data, \
loss_location.data, loss_pose.data, num_fg_pose, optimizer.param_groups[0]['lr'], batch_time.val))
else:
num_bg = torch.sum(bbox_labels[:, 0])
num_fg = torch.sum(torch.sum(bbox_labels[:, 1:], dim=1))
print('[%d/%d][%d/%d], %.4f, label %.4f, center %.4f, box %.4f (%03d, %03d), loc %.4f, lr %.6f, time %.2f' \
% (epoch, cfg.epochs, i, epoch_size, loss.data, loss_label.data, loss_vertex.data, loss_box.data, num_fg.data, num_bg.data, \
loss_location.data, optimizer.param_groups[0]['lr'], batch_time.val))
else:
print('[%d/%d][%d/%d], loss %.4f, lr %.6f, time %.2f' \
% (epoch, cfg.epochs, i, epoch_size, loss, optimizer.param_groups[0]['lr'], batch_time.val))
cfg.TRAIN.ITERS += 1
def _get_bb3D(extent):
bb = np.zeros((3, 8), dtype=np.float32)
xHalf = extent[0] * 0.5
yHalf = extent[1] * 0.5
zHalf = extent[2] * 0.5
bb[:, 0] = [xHalf, yHalf, zHalf]
bb[:, 1] = [-xHalf, yHalf, zHalf]
bb[:, 2] = [xHalf, -yHalf, zHalf]
bb[:, 3] = [-xHalf, -yHalf, zHalf]
bb[:, 4] = [xHalf, yHalf, -zHalf]
bb[:, 5] = [-xHalf, yHalf, -zHalf]
bb[:, 6] = [xHalf, -yHalf, -zHalf]
bb[:, 7] = [-xHalf, -yHalf, -zHalf]
return bb
def _vis_minibatch(inputs, background, labels, vertex_targets, sample, class_colors):
"""Visualize a mini-batch for debugging."""
import matplotlib.pyplot as plt
im_blob = inputs.cpu().numpy()
label_blob = labels.cpu().numpy()
gt_poses = sample['poses'].numpy()
meta_data_blob = sample['meta_data'].numpy()
gt_boxes = sample['gt_boxes'].numpy()
extents = sample['extents'][0, :, :].numpy()
background_color = background['background_color'].numpy()
if cfg.TRAIN.VERTEX_REG:
vertex_target_blob = vertex_targets.cpu().numpy()
if cfg.INPUT == 'COLOR':
m = 3
n = 3
else:
m = 3
n = 4
for i in range(im_blob.shape[0]):
fig = plt.figure()
start = 1
metadata = meta_data_blob[i, :]
intrinsic_matrix = metadata[:9].reshape((3,3))
# show image
if cfg.INPUT == 'COLOR' or cfg.INPUT == 'RGBD':
if cfg.INPUT == 'COLOR':
im = im_blob[i, :, :, :].copy()
else:
im = im_blob[i, :3, :, :].copy()
im = im.transpose((1, 2, 0)) * 255.0
im += cfg.PIXEL_MEANS
im = im[:, :, (2, 1, 0)]
im = np.clip(im, 0, 255)
im = im.astype(np.uint8)
ax = fig.add_subplot(m, n, 1)
plt.imshow(im)
ax.set_title('color')
start += 1
if cfg.INPUT == 'DEPTH' or cfg.INPUT == 'RGBD':
if cfg.INPUT == 'DEPTH':
im_depth = im_blob[i, :, :, :].copy()
else:
im_depth = im_blob[i, 3:6, :, :].copy()
ax = fig.add_subplot(m, n, start)
plt.imshow(im_depth[0, :, :])
ax.set_title('depth x')
start += 1
ax = fig.add_subplot(m, n, start)
plt.imshow(im_depth[1, :, :])
ax.set_title('depth y')
start += 1
ax = fig.add_subplot(m, n, start)
plt.imshow(im_depth[2, :, :])
ax.set_title('depth z')
start += 1
if cfg.INPUT == 'RGBD':
ax = fig.add_subplot(m, n, start)
mask = im_blob[i, 6, :, :].copy()
plt.imshow(mask)
ax.set_title('depth mask')
start += 1
# project the 3D box to image
pose_blob = gt_poses[i]
for j in range(pose_blob.shape[0]):
if pose_blob[j, 0] == 0:
continue
class_id = int(pose_blob[j, 1])
bb3d = _get_bb3D(extents[class_id, :])
x3d = np.ones((4, 8), dtype=np.float32)
x3d[0:3, :] = bb3d
# projection
RT = np.zeros((3, 4), dtype=np.float32)
# allocentric to egocentric
T = pose_blob[j, 6:]
qt = allocentric2egocentric(pose_blob[j, 2:6], T)
RT[:3, :3] = quat2mat(qt)
# RT[:3, :3] = quat2mat(pose_blob[j, 2:6])
RT[:, 3] = pose_blob[j, 6:]
x2d = np.matmul(intrinsic_matrix, np.matmul(RT, x3d))
x2d[0, :] = np.divide(x2d[0, :], x2d[2, :])
x2d[1, :] = np.divide(x2d[1, :], x2d[2, :])
x1 = np.min(x2d[0, :])
x2 = np.max(x2d[0, :])
y1 = np.min(x2d[1, :])
y2 = np.max(x2d[1, :])
plt.gca().add_patch(plt.Rectangle((x1, y1), x2-x1, y2-y1, fill=False, edgecolor='g', linewidth=3, clip_on=False))
if cfg.INPUT == 'COLOR' or cfg.INPUT == 'RGBD':
im_background = background_color[i]
im_background = im_background.transpose((1, 2, 0)) * 255.0
im_background += cfg.PIXEL_MEANS
im_background = im_background[:, :, (2, 1, 0)]
im_background = np.clip(im_background, 0, 255)
im_background = im_background.astype(np.uint8)
ax = fig.add_subplot(m, n, start)
plt.imshow(im_background)
ax.set_title('background')
start += 1
# show gt boxes
ax = fig.add_subplot(m, n, start)
start += 1
if cfg.INPUT == 'COLOR' or cfg.INPUT == 'RGBD':
plt.imshow(im)
else:
plt.imshow(im_depth[2, :, :])
ax.set_title('gt boxes')
boxes = gt_boxes[i]
for j in range(boxes.shape[0]):
if boxes[j, 4] == 0:
continue
x1 = boxes[j, 0]
y1 = boxes[j, 1]
x2 = boxes[j, 2]
y2 = boxes[j, 3]
plt.gca().add_patch(
plt.Rectangle((x1, y1), x2-x1, y2-y1, fill=False, edgecolor='g', linewidth=3, clip_on=False))
# show label
label = label_blob[i, :, :, :]
label = label.transpose((1, 2, 0))
height = label.shape[0]
width = label.shape[1]
num_classes = label.shape[2]
im_label = np.zeros((height, width, 3), dtype=np.uint8)
for j in range(num_classes):
I = np.where(label[:, :, j] > 0)
im_label[I[0], I[1], :] = class_colors[j]
ax = fig.add_subplot(m, n, start)
start += 1
plt.imshow(im_label)
ax.set_title('label')
# show vertex targets
if cfg.TRAIN.VERTEX_REG:
vertex_target = vertex_target_blob[i, :, :, :]
center = np.zeros((3, height, width), dtype=np.float32)
for j in range(1, num_classes):
index = np.where(label[:, :, j] > 0)
if len(index[0]) > 0:
center[0, index[0], index[1]] = vertex_target[3*j, index[0], index[1]]
center[1, index[0], index[1]] = vertex_target[3*j+1, index[0], index[1]]
center[2, index[0], index[1]] = np.exp(vertex_target[3*j+2, index[0], index[1]])
ax = fig.add_subplot(m, n, start)
start += 1
plt.imshow(center[0,:,:])
ax.set_title('center x')
ax = fig.add_subplot(m, n, start)
start += 1
plt.imshow(center[1,:,:])
ax.set_title('center y')
ax = fig.add_subplot(m, n, start)
start += 1
plt.imshow(center[2,:,:])
ax.set_title('z')
plt.show()
================================================
FILE: lib/layers/ROIAlign_cuda.cu
================================================
// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
#include
#include
#include
#include
#include
// TODO make it in a common file
#define CUDA_1D_KERNEL_LOOP(i, n) \
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \
i += blockDim.x * gridDim.x)
template
__device__ T bilinear_interpolate(const T* bottom_data,
const int height, const int width,
T y, T x,
const int index /* index for debug only*/) {
// deal with cases that inverse elements are out of feature map boundary
if (y < -1.0 || y > height || x < -1.0 || x > width) {
//empty
/*
if (x < -1.0 && y < -1.0)
return bottom_data[0];
if (x < -1.0 && y < height)
{
int yy = (int)y;
return bottom_data[yy * width];
}
if (x < -1.0 && y > height)
return bottom_data[(height - 1) * width];
if (x < width && y > height)
{
int xx = (int)x;
return bottom_data[(height - 1) * width + xx];
}
if (x > width && y > height)
return bottom_data[(height - 1) * width + width - 1];
if (x > width && y < -1.0)
return bottom_data[width - 1];
if (x > width && y < height)
{
int yy = (int)y;
return bottom_data[yy * width + width - 1];
}
if (x < width && y < -1.0)
{
int xx = (int)x;
return bottom_data[xx];
}
*/
return 0;
}
if (y <= 0) y = 0;
if (x <= 0) x = 0;
int y_low = (int) y;
int x_low = (int) x;
int y_high;
int x_high;
if (y_low >= height - 1) {
y_high = y_low = height - 1;
y = (T) y_low;
} else {
y_high = y_low + 1;
}
if (x_low >= width - 1) {
x_high = x_low = width - 1;
x = (T) x_low;
} else {
x_high = x_low + 1;
}
T ly = y - y_low;
T lx = x - x_low;
T hy = 1. - ly, hx = 1. - lx;
// do bilinear interpolation
T v1 = bottom_data[y_low * width + x_low];
T v2 = bottom_data[y_low * width + x_high];
T v3 = bottom_data[y_high * width + x_low];
T v4 = bottom_data[y_high * width + x_high];
T w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
T val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
return val;
}
template
__global__ void RoIAlignForward(const int nthreads, const T* bottom_data,
const T spatial_scale, const int channels,
const int height, const int width,
const int pooled_height, const int pooled_width,
const int sampling_ratio,
const T* bottom_rois, T* top_data) {
CUDA_1D_KERNEL_LOOP(index, nthreads) {
// (n, c, ph, pw) is an element in the pooled output
int pw = index % pooled_width;
int ph = (index / pooled_width) % pooled_height;
int c = (index / pooled_width / pooled_height) % channels;
int n = index / pooled_width / pooled_height / channels;
const T* offset_bottom_rois = bottom_rois + n * 5;
int roi_batch_ind = offset_bottom_rois[0];
// Do not using rounding; this implementation detail is critical
T roi_start_w = offset_bottom_rois[1] * spatial_scale;
T roi_start_h = offset_bottom_rois[2] * spatial_scale;
T roi_end_w = offset_bottom_rois[3] * spatial_scale;
T roi_end_h = offset_bottom_rois[4] * spatial_scale;
// T roi_start_w = round(offset_bottom_rois[1] * spatial_scale);
// T roi_start_h = round(offset_bottom_rois[2] * spatial_scale);
// T roi_end_w = round(offset_bottom_rois[3] * spatial_scale);
// T roi_end_h = round(offset_bottom_rois[4] * spatial_scale);
// Force malformed ROIs to be 1x1
T roi_width = max(roi_end_w - roi_start_w, (T)1.);
T roi_height = max(roi_end_h - roi_start_h, (T)1.);
T bin_size_h = static_cast(roi_height) / static_cast(pooled_height);
T bin_size_w = static_cast(roi_width) / static_cast(pooled_width);
const T* offset_bottom_data = bottom_data + (roi_batch_ind * channels + c) * height * width;
// We use roi_bin_grid to sample the grid and mimic integral
int roi_bin_grid_h = (sampling_ratio > 0) ? sampling_ratio : ceil(roi_height / pooled_height); // e.g., = 2
int roi_bin_grid_w = (sampling_ratio > 0) ? sampling_ratio : ceil(roi_width / pooled_width);
// We do average (integral) pooling inside a bin
const T count = roi_bin_grid_h * roi_bin_grid_w; // e.g. = 4
T output_val = 0.;
for (int iy = 0; iy < roi_bin_grid_h; iy ++) // e.g., iy = 0, 1
{
const T y = roi_start_h + ph * bin_size_h + static_cast(iy + .5f) * bin_size_h / static_cast(roi_bin_grid_h); // e.g., 0.5, 1.5
for (int ix = 0; ix < roi_bin_grid_w; ix ++)
{
const T x = roi_start_w + pw * bin_size_w + static_cast(ix + .5f) * bin_size_w / static_cast(roi_bin_grid_w);
T val = bilinear_interpolate(offset_bottom_data, height, width, y, x, index);
output_val += val;
}
}
output_val /= count;
top_data[index] = output_val;
}
}
template
__device__ void bilinear_interpolate_gradient(
const int height, const int width,
T y, T x,
T & w1, T & w2, T & w3, T & w4,
int & x_low, int & x_high, int & y_low, int & y_high,
const int index /* index for debug only*/) {
// deal with cases that inverse elements are out of feature map boundary
if (y < -1.0 || y > height || x < -1.0 || x > width) {
//empty
w1 = w2 = w3 = w4 = 0.;
x_low = x_high = y_low = y_high = -1;
return;
}
if (y <= 0) y = 0;
if (x <= 0) x = 0;
y_low = (int) y;
x_low = (int) x;
if (y_low >= height - 1) {
y_high = y_low = height - 1;
y = (T) y_low;
} else {
y_high = y_low + 1;
}
if (x_low >= width - 1) {
x_high = x_low = width - 1;
x = (T) x_low;
} else {
x_high = x_low + 1;
}
T ly = y - y_low;
T lx = x - x_low;
T hy = 1. - ly, hx = 1. - lx;
// reference in forward
// T v1 = bottom_data[y_low * width + x_low];
// T v2 = bottom_data[y_low * width + x_high];
// T v3 = bottom_data[y_high * width + x_low];
// T v4 = bottom_data[y_high * width + x_high];
// T val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
return;
}
template
__global__ void RoIAlignBackwardFeature(const int nthreads, const T* top_diff,
const int num_rois, const T spatial_scale,
const int channels, const int height, const int width,
const int pooled_height, const int pooled_width,
const int sampling_ratio,
T* bottom_diff,
const T* bottom_rois) {
CUDA_1D_KERNEL_LOOP(index, nthreads) {
// (n, c, ph, pw) is an element in the pooled output
int pw = index % pooled_width;
int ph = (index / pooled_width) % pooled_height;
int c = (index / pooled_width / pooled_height) % channels;
int n = index / pooled_width / pooled_height / channels;
const T* offset_bottom_rois = bottom_rois + n * 5;
int roi_batch_ind = offset_bottom_rois[0];
// Do not using rounding; this implementation detail is critical
T roi_start_w = offset_bottom_rois[1] * spatial_scale;
T roi_start_h = offset_bottom_rois[2] * spatial_scale;
T roi_end_w = offset_bottom_rois[3] * spatial_scale;
T roi_end_h = offset_bottom_rois[4] * spatial_scale;
// T roi_start_w = round(offset_bottom_rois[1] * spatial_scale);
// T roi_start_h = round(offset_bottom_rois[2] * spatial_scale);
// T roi_end_w = round(offset_bottom_rois[3] * spatial_scale);
// T roi_end_h = round(offset_bottom_rois[4] * spatial_scale);
// Force malformed ROIs to be 1x1
T roi_width = max(roi_end_w - roi_start_w, (T)1.);
T roi_height = max(roi_end_h - roi_start_h, (T)1.);
T bin_size_h = static_cast(roi_height) / static_cast(pooled_height);
T bin_size_w = static_cast(roi_width) / static_cast(pooled_width);
T* offset_bottom_diff = bottom_diff + (roi_batch_ind * channels + c) * height * width;
int top_offset = (n * channels + c) * pooled_height * pooled_width;
const T* offset_top_diff = top_diff + top_offset;
const T top_diff_this_bin = offset_top_diff[ph * pooled_width + pw];
// We use roi_bin_grid to sample the grid and mimic integral
int roi_bin_grid_h = (sampling_ratio > 0) ? sampling_ratio : ceil(roi_height / pooled_height); // e.g., = 2
int roi_bin_grid_w = (sampling_ratio > 0) ? sampling_ratio : ceil(roi_width / pooled_width);
// We do average (integral) pooling inside a bin
const T count = roi_bin_grid_h * roi_bin_grid_w; // e.g. = 4
for (int iy = 0; iy < roi_bin_grid_h; iy ++) // e.g., iy = 0, 1
{
const T y = roi_start_h + ph * bin_size_h + static_cast(iy + .5f) * bin_size_h / static_cast(roi_bin_grid_h); // e.g., 0.5, 1.5
for (int ix = 0; ix < roi_bin_grid_w; ix ++)
{
const T x = roi_start_w + pw * bin_size_w + static_cast(ix + .5f) * bin_size_w / static_cast(roi_bin_grid_w);
T w1, w2, w3, w4;
int x_low, x_high, y_low, y_high;
bilinear_interpolate_gradient(height, width, y, x,
w1, w2, w3, w4,
x_low, x_high, y_low, y_high,
index);
T g1 = top_diff_this_bin * w1 / count;
T g2 = top_diff_this_bin * w2 / count;
T g3 = top_diff_this_bin * w3 / count;
T g4 = top_diff_this_bin * w4 / count;
if (x_low >= 0 && x_high >= 0 && y_low >= 0 && y_high >= 0)
{
atomicAdd(offset_bottom_diff + y_low * width + x_low, static_cast(g1));
atomicAdd(offset_bottom_diff + y_low * width + x_high, static_cast(g2));
atomicAdd(offset_bottom_diff + y_high * width + x_low, static_cast(g3));
atomicAdd(offset_bottom_diff + y_high * width + x_high, static_cast(g4));
} // if
} // ix
} // iy
} // CUDA_1D_KERNEL_LOOP
} // RoIAlignBackward
at::Tensor ROIAlign_forward_cuda(const at::Tensor& input,
const at::Tensor& rois,
const float spatial_scale,
const int pooled_height,
const int pooled_width,
const int sampling_ratio) {
AT_ASSERTM(input.type().is_cuda(), "input must be a CUDA tensor");
AT_ASSERTM(rois.type().is_cuda(), "rois must be a CUDA tensor");
auto num_rois = rois.size(0);
auto channels = input.size(1);
auto height = input.size(2);
auto width = input.size(3);
auto output = at::empty({num_rois, channels, pooled_height, pooled_width}, input.options());
auto output_size = num_rois * pooled_height * pooled_width * channels;
cudaStream_t stream = at::cuda::getCurrentCUDAStream();
dim3 grid(std::min(THCCeilDiv(output_size, 512L), 4096L));
dim3 block(512);
if (output.numel() == 0) {
THCudaCheck(cudaGetLastError());
return output;
}
AT_DISPATCH_FLOATING_TYPES(input.type(), "ROIAlign_forward", [&] {
RoIAlignForward<<>>(
output_size,
input.contiguous().data(),
spatial_scale,
channels,
height,
width,
pooled_height,
pooled_width,
sampling_ratio,
rois.contiguous().data(),
output.data());
});
THCudaCheck(cudaGetLastError());
return output;
}
// TODO remove the dependency on input and use instead its sizes -> save memory
at::Tensor ROIAlign_backward_cuda(const at::Tensor& grad,
const at::Tensor& rois,
const float spatial_scale,
const int pooled_height,
const int pooled_width,
const int batch_size,
const int channels,
const int height,
const int width,
const int sampling_ratio) {
AT_ASSERTM(grad.type().is_cuda(), "grad must be a CUDA tensor");
AT_ASSERTM(rois.type().is_cuda(), "rois must be a CUDA tensor");
auto num_rois = rois.size(0);
auto grad_input = at::zeros({batch_size, channels, height, width}, grad.options());
cudaStream_t stream = at::cuda::getCurrentCUDAStream();
dim3 grid(std::min(THCCeilDiv(grad.numel(), 512L), 4096L));
dim3 block(512);
// handle possibly empty gradients
if (grad.numel() == 0) {
THCudaCheck(cudaGetLastError());
return grad_input;
}
AT_DISPATCH_FLOATING_TYPES(grad.type(), "ROIAlign_backward", [&] {
RoIAlignBackwardFeature<<>>(
grad.numel(),
grad.contiguous().data(),
num_rois,
spatial_scale,
channels,
height,
width,
pooled_height,
pooled_width,
sampling_ratio,
grad_input.data(),
rois.contiguous().data());
});
THCudaCheck(cudaGetLastError());
return grad_input;
}
================================================
FILE: lib/layers/__init__.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
================================================
FILE: lib/layers/backproject_kernel.cu
================================================
// Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
// This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
// text can be found in LICENSE.md
#include
#include
#include
#include
#include
#define CUDA_1D_KERNEL_LOOP(i, n) \
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \
i += blockDim.x * gridDim.x)
template
__global__ void BackprojectForward(const int nthreads, const int width, const float fx,
const float fy, const float px, const float py, const Dtype* depth, Dtype* top_data)
{
CUDA_1D_KERNEL_LOOP(index, nthreads)
{
int x = index % width;
int y = index / width;
Dtype d = depth[index];
top_data[3 * index + 0] = d * (x - px) / fx;
top_data[3 * index + 1] = d * (y - py) / fy;
top_data[3 * index + 2] = d;
}
}
std::vector backproject_cuda_forward(
float fx, float fy, float px, float py,
at::Tensor depth)
{
// run kernels
cudaError_t err;
const int kThreadsPerBlock = 512;
int output_size;
int height = depth.size(0);
int width = depth.size(1);
auto top_data = at::zeros({height, width, 3}, depth.options());
// compute the losses and gradients
output_size = height * width;
BackprojectForward<<<(output_size + kThreadsPerBlock - 1) / kThreadsPerBlock, kThreadsPerBlock>>>(
output_size, width, fx, fy, px, py, depth.data(), top_data.data());
cudaDeviceSynchronize();
err = cudaGetLastError();
if(cudaSuccess != err)
{
fprintf( stderr, "cudaCheckError() failed: %s\n", cudaGetErrorString( err ) );
exit( -1 );
}
return {top_data};
}
================================================
FILE: lib/layers/hard_label.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import math
from torch import nn
from torch.autograd import Function
import torch
import posecnn_cuda
class HardLabelFunction(Function):
@staticmethod
def forward(ctx, prob, label, rand, threshold, sample_percentage):
outputs = posecnn_cuda.hard_label_forward(threshold, sample_percentage, prob, label, rand)
top_data = outputs[0]
return top_data
@staticmethod
def backward(ctx, top_diff):
outputs = posecnn_cuda.hard_label_backward(top_diff)
d_prob, d_label = outputs
return d_prob, d_label, None, None, None
class HardLabel(nn.Module):
def __init__(self, threshold, sample_percentage):
super(HardLabel, self).__init__()
self.threshold = threshold
self.sample_percentage = sample_percentage
def forward(self, prob, label, rand):
return HardLabelFunction.apply(prob, label, rand, self.threshold, self.sample_percentage)
================================================
FILE: lib/layers/hard_label_kernel.cu
================================================
// Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
// This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
// text can be found in LICENSE.md
#include
#include
#include
#include
#include
#define CUDA_1D_KERNEL_LOOP(i, n) \
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \
i += blockDim.x * gridDim.x)
template
__global__ void HardLabelForward(const int nthreads, const float threshold, const float sample_percentage,
const Dtype* bottom_prob, const Dtype* bottom_label, const Dtype* bottom_rand, Dtype* top_data)
{
CUDA_1D_KERNEL_LOOP(index, nthreads)
{
if (bottom_label[index] > 0 && (bottom_prob[index] < threshold || bottom_rand[index] < sample_percentage))
top_data[index] = 1.0;
}
}
std::vector hard_label_cuda_forward(
float threshold,
float sample_percentage,
at::Tensor bottom_prob,
at::Tensor bottom_label,
at::Tensor bottom_rand)
{
// run kernels
const int kThreadsPerBlock = 1024;
int output_size;
if (bottom_prob.dim() == 4)
output_size = bottom_prob.size(0) * bottom_prob.size(1) * bottom_prob.size(2) * bottom_prob.size(3);
else
output_size = bottom_prob.size(0) * bottom_prob.size(1);
auto top_data = at::zeros(bottom_prob.sizes(), bottom_prob.options());
AT_DISPATCH_FLOATING_TYPES(bottom_prob.type(), "hard_label_forward_cuda", ([&] {
// compute the losses and gradients
HardLabelForward<<<(output_size + kThreadsPerBlock - 1) / kThreadsPerBlock, kThreadsPerBlock>>>(
output_size,
threshold,
sample_percentage,
bottom_prob.data(),
bottom_label.data(),
bottom_rand.data(),
top_data.data());
}));
return {top_data};
}
std::vector hard_label_cuda_backward(
at::Tensor top_diff)
{
auto grad_prob = at::zeros(top_diff.sizes(), top_diff.options());
auto grad_label = at::zeros(top_diff.sizes(), top_diff.options());
return {grad_prob, grad_label};
}
================================================
FILE: lib/layers/hough_voting.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import math
from torch import nn
from torch.autograd import Function
import torch
import posecnn_cuda
class HoughVotingFunction(Function):
@staticmethod
def forward(ctx, label, vertex, meta_data, extents, is_train, skip_pixels, \
label_threshold, inlier_threshold, voting_threshold, per_threshold):
outputs = posecnn_cuda.hough_voting_forward(label, vertex, meta_data, extents, is_train, skip_pixels, \
label_threshold, inlier_threshold, voting_threshold, per_threshold)
top_box = outputs[0]
top_pose = outputs[1]
return top_box, top_pose
@staticmethod
def backward(ctx, top_diff_box, top_diff_pose):
return None, None, None, None, None, None, None, None, None, None
class HoughVoting(nn.Module):
def __init__(self, is_train=0, skip_pixels=10, label_threshold=100, inlier_threshold=0.9, voting_threshold=-1, per_threshold=0.01):
super(HoughVoting, self).__init__()
self.is_train = is_train
self.skip_pixels = skip_pixels
self.label_threshold = label_threshold
self.inlier_threshold = inlier_threshold
self.voting_threshold = voting_threshold
self.per_threshold = per_threshold
def forward(self, label, vertex, meta_data, extents):
return HoughVotingFunction.apply(label, vertex, meta_data, extents, self.is_train, self.skip_pixels, \
self.label_threshold, self.inlier_threshold, self.voting_threshold, self.per_threshold)
================================================
FILE: lib/layers/hough_voting_kernel.cu
================================================
// Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
// This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
// text can be found in LICENSE.md
#include
#include
#include
#include
#include
#include
#include
#include
#include
#define VERTEX_CHANNELS 3
#define MAX_ROI 128
#define CUDA_1D_KERNEL_LOOP(i, n) \
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \
i += blockDim.x * gridDim.x)
__device__ inline float angle_distance(int cx, int cy, int x, int y, float u, float v)
{
float dx = cx - x;
float dy = cy - y;
float n1 = sqrt(u * u + v * v);
float n2 = sqrt(dx * dx + dy * dy);
float dot = u * dx + v * dy;
float distance = dot / (n1 * n2);
return distance;
}
__device__ inline float angle_distance_label(int cx, int cy, int x, int y, float u, float v,
int cls, const int height, const int width, const int* labelmap)
{
float dx = cx - x;
float dy = cy - y;
float n1 = sqrt(u * u + v * v);
float n2 = sqrt(dx * dx + dy * dy);
float dot = u * dx + v * dy;
float distance = dot / (n1 * n2);
int num = 10;
int count = 0;
for (int i = 1; i <= num; i++)
{
float step = float(i) / float(num);
int px = int(x + step * dx);
int py = int(y + step * dy);
if (px >= 0 && px < width && py >= 0 && py < height)
{
if (labelmap[py * width + px] == cls)
count++;
}
}
if ((float)count / float(num) < 0.5)
distance = 0;
return distance;
}
__device__ inline void project_box(int cls, const float* extents, const float* meta_data, float distance, float factor, float* threshold)
{
float xHalf = extents[cls * 3 + 0] * 0.5;
float yHalf = extents[cls * 3 + 1] * 0.5;
float zHalf = extents[cls * 3 + 2] * 0.5;
float bb3D[24];
bb3D[0] = xHalf; bb3D[1] = yHalf; bb3D[2] = zHalf + distance;
bb3D[3] = -xHalf; bb3D[4] = yHalf; bb3D[5] = zHalf + distance;
bb3D[6] = xHalf; bb3D[7] = -yHalf; bb3D[8] = zHalf + distance;
bb3D[9] = -xHalf; bb3D[10] = -yHalf; bb3D[11] = zHalf + distance;
bb3D[12] = xHalf; bb3D[13] = yHalf; bb3D[14] = -zHalf + distance;
bb3D[15] = -xHalf; bb3D[16] = yHalf; bb3D[17] = -zHalf + distance;
bb3D[18] = xHalf; bb3D[19] = -yHalf; bb3D[20] = -zHalf + distance;
bb3D[21] = -xHalf; bb3D[22] = -yHalf; bb3D[23] = -zHalf + distance;
float fx = meta_data[0];
float fy = meta_data[4];
float px = meta_data[2];
float py = meta_data[5];
float minX = 1e8;
float maxX = -1e8;
float minY = 1e8;
float maxY = -1e8;
for (int i = 0; i < 8; i++)
{
float x = fx * (bb3D[i * 3] / bb3D[i * 3 + 2]) + px;
float y = fy * (bb3D[i * 3 + 1] / bb3D[i * 3 + 2]) + py;
minX = fmin(minX, x);
minY = fmin(minY, y);
maxX = fmax(maxX, x);
maxY = fmax(maxY, y);
}
float width = maxX - minX + 1;
float height = maxY - minY + 1;
*threshold = fmax(width, height) * factor;
}
__global__ void compute_arrays_kernel(const int nthreads, const int* labelmap,
int* arrays, int* array_size, const int height, const int width)
{
CUDA_1D_KERNEL_LOOP(index, nthreads)
{
int cls = labelmap[index];
if (cls > 0)
{
int size = atomicAdd(array_size + cls, 1);
int offset = cls * height * width + size;
arrays[offset] = index;
}
}
}
__global__ void compute_hough_kernel(const int nthreads, float* hough_space, float* hough_data,
const int* labelmap, const float* vertmap, const float* extents, const float* meta_data,
int* arrays, int* array_size, int* class_indexes, const int height, const int width,
const int num_classes, const int count, const float inlierThreshold, const int skip_pixels)
{
CUDA_1D_KERNEL_LOOP(index, nthreads)
{
// (cls, cx, cy) is an element in the hough space
int ind = index / (height * width);
int cls = class_indexes[ind];
int n = index % (height * width);
int cx = n % width;
int cy = n / width;
int size = array_size[cls];
float distance = 0;
float threshold;
for (int i = 0; i < size; i += skip_pixels)
{
int offset = cls * height * width + i;
int location = arrays[offset];
int x = location % width;
int y = location / width;
// read the direction
offset = VERTEX_CHANNELS * cls * height * width + y * width + x;
float u = vertmap[offset];
offset = VERTEX_CHANNELS * cls * height * width + height * width + y * width + x;
float v = vertmap[offset];
offset = VERTEX_CHANNELS * cls * height * width + 2 * height * width + y * width + x;
float d = exp(vertmap[offset]);
// vote
if (angle_distance_label(cx, cy, x, y, u, v, cls, height, width, labelmap) > inlierThreshold)
{
project_box(cls, extents, meta_data, d, 0.6, &threshold);
float dx = fabsf(x - cx);
float dy = fabsf(y - cy);
if (dx < threshold && dy < threshold)
{
hough_space[index]++;
distance += d;
}
}
}
if (hough_space[index] > 0)
{
distance /= hough_space[index];
float bb_width = -1;
float bb_height = -1;
for (int i = 0; i < size; i += skip_pixels)
{
int offset = cls * height * width + i;
int location = arrays[offset];
int x = location % width;
int y = location / width;
// read the direction
offset = VERTEX_CHANNELS * cls * height * width + y * width + x;
float u = vertmap[offset];
offset = VERTEX_CHANNELS * cls * height * width + height * width + y * width + x;
float v = vertmap[offset];
// vote
if (angle_distance_label(cx, cy, x, y, u, v, cls, height, width, labelmap) > inlierThreshold)
{
project_box(cls, extents, meta_data, distance, 0.6, &threshold);
float dx = fabsf(x - cx);
float dy = fabsf(y - cy);
if (dx > bb_width && dx < threshold && dy < threshold)
bb_width = dx;
if (dy > bb_height && dx < threshold && dy < threshold)
bb_height = dy;
}
}
int offset = ind * height * width * 3 + 3 * (cy * width + cx);
hough_data[offset] = distance;
hough_data[offset + 1] = 2 * bb_height;
hough_data[offset + 2] = 2 * bb_width;
}
}
}
__global__ void compute_max_indexes_kernel(const int nthreads, int* max_indexes, int index_size, int* num_max, float* hough_space,
float* hough_data, int height, int width, float threshold, float perThreshold, const int is_train)
{
CUDA_1D_KERNEL_LOOP(index, nthreads)
{
// (ind, cx, cy) is an element in the hough space
int ind = index / (height * width);
int n = index % (height * width);
int cx = n % width;
int cy = n / width;
int kernel_size = 3;
int offset = ind * height * width * 3 + 3 * (cy * width + cx);
float bb_height = hough_data[offset + 1];
float bb_width = hough_data[offset + 2];
if (hough_space[index] > threshold && bb_height > 0 && bb_width > 0)
{
// check if the location is local maximum
int flag = 0;
for (int x = cx - kernel_size; x <= cx + kernel_size; x++)
{
for (int y = cy - kernel_size; y <= cy + kernel_size; y++)
{
if (x >= 0 && x < width && y >= 0 && y < height)
{
if (hough_space[ind * height * width + y * width + x] > hough_space[index])
{
flag = 1;
break;
}
if (is_train == 0 && hough_space[ind * height * width + y * width + x] == hough_space[index])
{
if (ind * height * width + y * width + x > index)
{
flag = 1;
break;
}
}
}
}
// check the percentage of voting
if (hough_space[index] / (bb_height * bb_width) < perThreshold)
flag = 1;
}
if (flag == 0)
{
// add the location to max_indexes
int max_index = atomicAdd(num_max, 1);
if (max_index < index_size)
max_indexes[max_index] = index;
}
}
}
}
__global__ void compute_rois_kernel(const int nthreads, float* top_box, float* top_pose,
const float* meta_data, float* hough_space, float* hough_data, int* max_indexes, int* class_indexes,
int batch_index, const int height, const int width, const int num_classes, int* num_rois, const int is_train)
{
CUDA_1D_KERNEL_LOOP(index, nthreads)
{
float scale = 0.0;
int max_index = max_indexes[index];
int ind = max_index / (height * width);
int cls = class_indexes[ind];
int n = max_index % (height * width);
int x = n % width;
int y = n / width;
float fx = meta_data[0];
float fy = meta_data[4];
float px = meta_data[2];
float py = meta_data[5];
float rx = (x - px) / fx;
float ry = (y - py) / fy;
int offset = ind * height * width * 3 + 3 * (y * width + x);
float bb_distance = hough_data[offset];
float bb_height = hough_data[offset + 1];
float bb_width = hough_data[offset + 2];
if (is_train)
{
int roi_index = atomicAdd(num_rois, 9);
top_box[roi_index * 7 + 0] = batch_index;
top_box[roi_index * 7 + 1] = cls;
top_box[roi_index * 7 + 2] = x - bb_width * (0.5 + scale);
top_box[roi_index * 7 + 3] = y - bb_height * (0.5 + scale);
top_box[roi_index * 7 + 4] = x + bb_width * (0.5 + scale);
top_box[roi_index * 7 + 5] = y + bb_height * (0.5 + scale);
top_box[roi_index * 7 + 6] = hough_space[max_index];
for (int j = 0; j < 9; j++)
{
top_pose[(roi_index + j) * 7 + 0] = 1;
top_pose[(roi_index + j) * 7 + 1] = 0;
top_pose[(roi_index + j) * 7 + 2] = 0;
top_pose[(roi_index + j) * 7 + 3] = 0;
top_pose[(roi_index + j) * 7 + 4] = rx;
top_pose[(roi_index + j) * 7 + 5] = ry;
top_pose[(roi_index + j) * 7 + 6] = bb_distance;
}
// add jittering boxes
float x1 = top_box[roi_index * 7 + 2];
float y1 = top_box[roi_index * 7 + 3];
float x2 = top_box[roi_index * 7 + 4];
float y2 = top_box[roi_index * 7 + 5];
float ww = x2 - x1;
float hh = y2 - y1;
// (-1, -1)
roi_index++;
top_box[roi_index * 7 + 0] = batch_index;
top_box[roi_index * 7 + 1] = cls;
top_box[roi_index * 7 + 2] = x1 - 0.05 * ww;
top_box[roi_index * 7 + 3] = y1 - 0.05 * hh;
top_box[roi_index * 7 + 4] = top_box[roi_index * 7 + 2] + ww;
top_box[roi_index * 7 + 5] = top_box[roi_index * 7 + 3] + hh;
top_box[roi_index * 7 + 6] = hough_space[max_index];
// (+1, -1)
roi_index++;
top_box[roi_index * 7 + 0] = batch_index;
top_box[roi_index * 7 + 1] = cls;
top_box[roi_index * 7 + 2] = x1 + 0.05 * ww;
top_box[roi_index * 7 + 3] = y1 - 0.05 * hh;
top_box[roi_index * 7 + 4] = top_box[roi_index * 7 + 2] + ww;
top_box[roi_index * 7 + 5] = top_box[roi_index * 7 + 3] + hh;
top_box[roi_index * 7 + 6] = hough_space[max_index];
// (-1, +1)
roi_index++;
top_box[roi_index * 7 + 0] = batch_index;
top_box[roi_index * 7 + 1] = cls;
top_box[roi_index * 7 + 2] = x1 - 0.05 * ww;
top_box[roi_index * 7 + 3] = y1 + 0.05 * hh;
top_box[roi_index * 7 + 4] = top_box[roi_index * 7 + 2] + ww;
top_box[roi_index * 7 + 5] = top_box[roi_index * 7 + 3] + hh;
top_box[roi_index * 7 + 6] = hough_space[max_index];
// (+1, +1)
roi_index++;
top_box[roi_index * 7 + 0] = batch_index;
top_box[roi_index * 7 + 1] = cls;
top_box[roi_index * 7 + 2] = x1 + 0.05 * ww;
top_box[roi_index * 7 + 3] = y1 + 0.05 * hh;
top_box[roi_index * 7 + 4] = top_box[roi_index * 7 + 2] + ww;
top_box[roi_index * 7 + 5] = top_box[roi_index * 7 + 3] + hh;
top_box[roi_index * 7 + 6] = hough_space[max_index];
// (0, -1)
roi_index++;
top_box[roi_index * 7 + 0] = batch_index;
top_box[roi_index * 7 + 1] = cls;
top_box[roi_index * 7 + 2] = x1;
top_box[roi_index * 7 + 3] = y1 - 0.05 * hh;
top_box[roi_index * 7 + 4] = top_box[roi_index * 7 + 2] + ww;
top_box[roi_index * 7 + 5] = top_box[roi_index * 7 + 3] + hh;
top_box[roi_index * 7 + 6] = hough_space[max_index];
// (-1, 0)
roi_index++;
top_box[roi_index * 7 + 0] = batch_index;
top_box[roi_index * 7 + 1] = cls;
top_box[roi_index * 7 + 2] = x1 - 0.05 * ww;
top_box[roi_index * 7 + 3] = y1;
top_box[roi_index * 7 + 4] = top_box[roi_index * 7 + 2] + ww;
top_box[roi_index * 7 + 5] = top_box[roi_index * 7 + 3] + hh;
top_box[roi_index * 7 + 6] = hough_space[max_index];
// (0, +1)
roi_index++;
top_box[roi_index * 7 + 0] = batch_index;
top_box[roi_index * 7 + 1] = cls;
top_box[roi_index * 7 + 2] = x1;
top_box[roi_index * 7 + 3] = y1 + 0.05 * hh;
top_box[roi_index * 7 + 4] = top_box[roi_index * 7 + 2] + ww;
top_box[roi_index * 7 + 5] = top_box[roi_index * 7 + 3] + hh;
top_box[roi_index * 7 + 6] = hough_space[max_index];
// (+1, 0)
roi_index++;
top_box[roi_index * 7 + 0] = batch_index;
top_box[roi_index * 7 + 1] = cls;
top_box[roi_index * 7 + 2] = x1 + 0.05 * ww;
top_box[roi_index * 7 + 3] = y1;
top_box[roi_index * 7 + 4] = top_box[roi_index * 7 + 2] + ww;
top_box[roi_index * 7 + 5] = top_box[roi_index * 7 + 3] + hh;
top_box[roi_index * 7 + 6] = hough_space[max_index];
}
else
{
int roi_index = atomicAdd(num_rois, 1);
top_box[roi_index * 7 + 0] = batch_index;
top_box[roi_index * 7 + 1] = cls;
top_box[roi_index * 7 + 2] = x - bb_width * (0.5 + scale);
top_box[roi_index * 7 + 3] = y - bb_height * (0.5 + scale);
top_box[roi_index * 7 + 4] = x + bb_width * (0.5 + scale);
top_box[roi_index * 7 + 5] = y + bb_height * (0.5 + scale);
top_box[roi_index * 7 + 6] = hough_space[max_index];
top_pose[roi_index * 7 + 0] = 1;
top_pose[roi_index * 7 + 1] = 0;
top_pose[roi_index * 7 + 2] = 0;
top_pose[roi_index * 7 + 3] = 0;
top_pose[roi_index * 7 + 4] = rx;
top_pose[roi_index * 7 + 5] = ry;
top_pose[roi_index * 7 + 6] = bb_distance;
}
}
}
std::vector hough_voting_cuda_forward(
at::Tensor bottom_label,
at::Tensor bottom_vertex,
at::Tensor bottom_meta_data,
at::Tensor extents,
int is_train,
int skip_pixels,
int labelThreshold,
float inlierThreshold,
float votingThreshold,
float perThreshold)
{
const int kThreadsPerBlock = 1024;
int output_size;
cudaError_t err;
const int batch_size = bottom_vertex.size(0);
const int num_classes = bottom_vertex.size(1) / VERTEX_CHANNELS;
const int height = bottom_vertex.size(2);
const int width = bottom_vertex.size(3);
const int num_meta_data = bottom_meta_data.size(1);
const int index_size = MAX_ROI / batch_size;
auto top_box = at::zeros({MAX_ROI * 9, 7}, bottom_vertex.options());
auto top_pose = at::zeros({MAX_ROI * 9, 7}, bottom_vertex.options());
auto num_rois = at::zeros({1}, bottom_label.options());
for (int batch_index = 0; batch_index < batch_size; batch_index++)
{
const int* labelmap = bottom_label.data() + batch_index * height * width;
const float* vertmap = bottom_vertex.data() + batch_index * height * width * VERTEX_CHANNELS * num_classes;
const float* meta_data = bottom_meta_data.data() + batch_index * num_meta_data;
// step 1: compute a label index array for each class
auto arrays = at::zeros({num_classes, height * width}, bottom_label.options());
auto array_sizes = at::zeros({num_classes}, bottom_label.options());
output_size = height * width;
compute_arrays_kernel<<<(output_size + kThreadsPerBlock - 1) / kThreadsPerBlock, kThreadsPerBlock>>>(
output_size, labelmap, arrays.data(), array_sizes.data(), height, width);
cudaThreadSynchronize();
// compute class indexes
int* array_sizes_host = (int*)malloc(num_classes * sizeof(int));
int* class_indexes_host = (int*)malloc(num_classes * sizeof(int));
cudaMemcpy(array_sizes_host, array_sizes.data(), num_classes * sizeof(int), cudaMemcpyDeviceToHost);
int count = 0;
for (int c = 1; c < num_classes; c++)
{
if (array_sizes_host[c] > labelThreshold)
{
class_indexes_host[count] = c;
count++;
}
// else
// printf("class %d with only pixels %d\n", c, array_sizes_host[c]);
}
if (count == 0)
{
free(array_sizes_host);
free(class_indexes_host);
continue;
}
auto class_indexes = at::zeros({count}, bottom_label.options());
cudaMemcpy(class_indexes.data(), class_indexes_host, count * sizeof(int), cudaMemcpyHostToDevice);
err = cudaGetLastError();
if(cudaSuccess != err)
{
fprintf( stderr, "cudaCheckError() failed compute label index: %s\n", cudaGetErrorString( err ) );
exit( -1 );
}
// step 2: compute the hough space
auto hough_space = at::zeros({count, height, width}, bottom_vertex.options());
auto hough_data = at::zeros({count, height, width, 3}, bottom_vertex.options());
output_size = count * height * width;
compute_hough_kernel<<<(output_size + kThreadsPerBlock - 1) / kThreadsPerBlock, kThreadsPerBlock>>>(
output_size, hough_space.data(), hough_data.data(), labelmap, vertmap, extents.data(), meta_data,
arrays.data(), array_sizes.data(), class_indexes.data(), height, width, num_classes, count, inlierThreshold, skip_pixels);
cudaThreadSynchronize();
err = cudaGetLastError();
if(cudaSuccess != err)
{
fprintf( stderr, "cudaCheckError() failed compute hough space: %s\n", cudaGetErrorString( err ) );
exit( -1 );
}
// step 3: find the maximum in hough space
auto num_max = at::zeros({1}, bottom_label.options());
auto max_indexes = at::zeros({index_size}, bottom_label.options());
if (votingThreshold > 0)
{
output_size = count * height * width;
compute_max_indexes_kernel<<<(output_size + kThreadsPerBlock - 1) / kThreadsPerBlock, kThreadsPerBlock>>>(
output_size, max_indexes.data(), index_size, num_max.data(),
hough_space.data(), hough_data.data(), height, width, votingThreshold, perThreshold, is_train);
cudaThreadSynchronize();
}
else
{
int* max_indexes_host = (int*)malloc(count * sizeof(int));
memset(max_indexes_host, 0, count * sizeof(int));
for (int i = 0; i < count; i++)
{
float *hmax = thrust::max_element(thrust::device, hough_space.data() + i * height * width,
hough_space.data() + (i+1) * height * width);
max_indexes_host[i] = hmax - hough_space.data();
}
cudaMemcpy(num_max.data(), &count, sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(max_indexes.data(), max_indexes_host, count * sizeof(int), cudaMemcpyHostToDevice);
free(max_indexes_host);
}
err = cudaGetLastError();
if(cudaSuccess != err)
{
fprintf( stderr, "cudaCheckError() failed compute maximum: %s\n", cudaGetErrorString( err ) );
exit( -1 );
}
// step 4: compute outputs
int num_max_host;
cudaMemcpy(&num_max_host, num_max.data(), sizeof(int), cudaMemcpyDeviceToHost);
if (num_max_host >= index_size)
{
printf("hough voting num_max: %d exceeds capacity %d\n", num_max_host, index_size);
num_max_host = index_size;
}
if (num_max_host > 0)
{
output_size = num_max_host;
compute_rois_kernel<<<(output_size + kThreadsPerBlock - 1) / kThreadsPerBlock, kThreadsPerBlock>>>(
output_size, top_box.data(), top_pose.data(), meta_data, hough_space.data(),
hough_data.data(), max_indexes.data(), class_indexes.data(),
batch_index, height, width, num_classes, num_rois.data(), is_train);
cudaThreadSynchronize();
}
// clean up
free(array_sizes_host);
free(class_indexes_host);
err = cudaGetLastError();
if(cudaSuccess != err)
{
fprintf( stderr, "cudaCheckError() failed compute outputs: %s\n", cudaGetErrorString( err ) );
exit( -1 );
}
}
// copy outputs
int num_rois_host;
cudaMemcpy(&num_rois_host, num_rois.data(), sizeof(int), cudaMemcpyDeviceToHost);
if (num_rois_host == 0)
num_rois_host = 1;
auto top_box_final = at::zeros({num_rois_host, 7}, bottom_vertex.options());
auto top_pose_final = at::zeros({num_rois_host, 7}, bottom_vertex.options());
cudaMemcpy(top_box_final.data(), top_box.data(), num_rois_host * 7 * sizeof(float), cudaMemcpyDeviceToDevice);
cudaMemcpy(top_pose_final.data(), top_pose.data(), num_rois_host * 7 * sizeof(float), cudaMemcpyDeviceToDevice);
return {top_box_final, top_pose_final};
}
================================================
FILE: lib/layers/point_matching_loss.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import math
from torch import nn
from torch.autograd import Function
import torch
import posecnn_cuda
class PMLossFunction(Function):
@staticmethod
def forward(ctx, prediction, target, weight, points, symmetry, hard_angle):
outputs = posecnn_cuda.pml_forward(prediction, target, weight, points, symmetry, hard_angle)
loss = outputs[0]
variables = outputs[1:]
ctx.save_for_backward(*variables)
return loss
@staticmethod
def backward(ctx, grad_loss):
outputs = posecnn_cuda.pml_backward(grad_loss, *ctx.saved_variables)
d_rotation = outputs[0]
return d_rotation, None, None, None, None, None
class PMLoss(nn.Module):
def __init__(self, hard_angle=15):
super(PMLoss, self).__init__()
self.hard_angle = hard_angle
def forward(self, prediction, target, weight, points, symmetry):
return PMLossFunction.apply(prediction, target, weight, points, symmetry, self.hard_angle)
================================================
FILE: lib/layers/point_matching_loss_kernel.cu
================================================
// Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
// This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
// text can be found in LICENSE.md
#include
#include
#include
#include
#include
#define CUDA_1D_KERNEL_LOOP(i, n) \
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \
i += blockDim.x * gridDim.x)
#define POSE_CHANNELS 4
template
__global__ void AveragedistanceForward(const int nthreads, const Dtype* prediction, const Dtype* target,
const Dtype* weight, const Dtype* point, const Dtype* symmetry, const int batch_size, const int num_classes,
const int num_points, const float hard_angle, Dtype* rotations, Dtype* losses, Dtype* diffs, Dtype* angles_batch)
{
CUDA_1D_KERNEL_LOOP(index_thread, nthreads)
{
// batch index
int n = index_thread / num_points;
int p = index_thread % num_points;
// find the class label and pose of this object
int index_cls = -1, ind;
Dtype s, u, v, w;
for (int i = 0; i < POSE_CHANNELS * num_classes; i += POSE_CHANNELS)
{
int index = n * POSE_CHANNELS * num_classes + i;
if (weight[index] > 0)
{
index_cls = i / POSE_CHANNELS;
// gt quaternion
s = target[index + 0];
u = target[index + 1];
v = target[index + 2];
w = target[index + 3];
// gt rotation matrix
ind = n * num_points * 6 * 9 + p * 6 * 9;
rotations[ind + 0] = s * s + u * u - v * v - w * w;
rotations[ind + 1] = 2 * (u * v - s * w);
rotations[ind + 2] = 2 * (u * w + s * v);
rotations[ind + 3] = 2 * (u * v + s * w);
rotations[ind + 4] = s * s - u * u + v * v - w * w;
rotations[ind + 5] = 2 * (v * w - s * u);
rotations[ind + 6] = 2 * (u * w - s * v);
rotations[ind + 7] = 2 * (v * w + s * u);
rotations[ind + 8] = s * s - u * u - v * v + w * w;
// predicted quaternion
s = prediction[index + 0];
u = prediction[index + 1];
v = prediction[index + 2];
w = prediction[index + 3];
// predicted rotation matrix
ind = n * num_points * 6 * 9 + p * 6 * 9 + 9;
rotations[ind + 0] = s * s + u * u - v * v - w * w;
rotations[ind + 1] = 2 * (u * v - s * w);
rotations[ind + 2] = 2 * (u * w + s * v);
rotations[ind + 3] = 2 * (u * v + s * w);
rotations[ind + 4] = s * s - u * u + v * v - w * w;
rotations[ind + 5] = 2 * (v * w - s * u);
rotations[ind + 6] = 2 * (u * w - s * v);
rotations[ind + 7] = 2 * (v * w + s * u);
rotations[ind + 8] = s * s - u * u - v * v + w * w;
// compute the angular distance between quarternions
if (p == 0)
{
Dtype d = target[index + 0] * prediction[index + 0] + target[index + 1] * prediction[index + 1]
+ target[index + 2] * prediction[index + 2] + target[index + 3] * prediction[index + 3];
Dtype angle = acos(2 * d * d - 1) * 180.0 / 3.14159265;
if (angle > hard_angle)
angles_batch[n] = 1.0;
}
break;
}
}
if (index_cls == -1)
continue;
// derivatives of Ru to quaternion
ind = n * num_points * 6 * 9 + p * 6 * 9 + 18;
rotations[ind + 0] = 2 * s;
rotations[ind + 1] = -2 * w;
rotations[ind + 2] = 2 * v;
rotations[ind + 3] = 2 * w;
rotations[ind + 4] = 2 * s;
rotations[ind + 5] = -2 * u;
rotations[ind + 6] = -2 * v;
rotations[ind + 7] = 2 * u;
rotations[ind + 8] = 2 * s;
ind = n * num_points * 6 * 9 + p * 6 * 9 + 27;
rotations[ind + 0] = 2 * u;
rotations[ind + 1] = 2 * v;
rotations[ind + 2] = 2 * w;
rotations[ind + 3] = 2 * v;
rotations[ind + 4] = -2 * u;
rotations[ind + 5] = -2 * s;
rotations[ind + 6] = 2 * w;
rotations[ind + 7] = 2 * s;
rotations[ind + 8] = -2 * u;
ind = n * num_points * 6 * 9 + p * 6 * 9 + 36;
rotations[ind + 0] = -2 * v;
rotations[ind + 1] = 2 * u;
rotations[ind + 2] = 2 * s;
rotations[ind + 3] = 2 * u;
rotations[ind + 4] = 2 * v;
rotations[ind + 5] = 2 * w;
rotations[ind + 6] = -2 * s;
rotations[ind + 7] = 2 * w;
rotations[ind + 8] = -2 * v;
ind = n * num_points * 6 * 9 + p * 6 * 9 + 45;
rotations[ind + 0] = -2 * w;
rotations[ind + 1] = -2 * s;
rotations[ind + 2] = 2 * u;
rotations[ind + 3] = 2 * s;
rotations[ind + 4] = -2 * w;
rotations[ind + 5] = 2 * v;
rotations[ind + 6] = 2 * u;
rotations[ind + 7] = 2 * v;
rotations[ind + 8] = 2 * w;
// for the point
int index = index_cls * num_points * 3 + p * 3;
ind = n * num_points * 6 * 9 + p * 6 * 9;
// rotate the first point
Dtype x1 = rotations[ind + 9 + 0] * point[index + 0] + rotations[ind + 9 + 1] * point[index + 1] + rotations[ind + 9 + 2] * point[index + 2];
Dtype y1 = rotations[ind + 9 + 3] * point[index + 0] + rotations[ind + 9 + 4] * point[index + 1] + rotations[ind + 9 + 5] * point[index + 2];
Dtype z1 = rotations[ind + 9 + 6] * point[index + 0] + rotations[ind + 9 + 7] * point[index + 1] + rotations[ind + 9 + 8] * point[index + 2];
int index_min;
Dtype x2, y2, z2;
if (symmetry[index_cls] > 0)
{
// find the closet point for symmetry object
Dtype dmin = FLT_MAX;
for (int i = 0; i < num_points; i++)
{
int index2 = index_cls * num_points * 3 + i * 3;
x2 = rotations[ind + 0] * point[index2 + 0] + rotations[ind + 1] * point[index2 + 1] + rotations[ind + 2] * point[index2 + 2];
y2 = rotations[ind + 3] * point[index2 + 0] + rotations[ind + 4] * point[index2 + 1] + rotations[ind + 5] * point[index2 + 2];
z2 = rotations[ind + 6] * point[index2 + 0] + rotations[ind + 7] * point[index2 + 1] + rotations[ind + 8] * point[index2 + 2];
Dtype distance = (x1 - x2) * (x1 - x2) + (y1 - y2) * (y1 - y2) + (z1 - z2) * (z1 - z2);
if (distance < dmin)
{
dmin = distance;
index_min = index2;
}
}
}
else
index_min = index;
x2 = rotations[ind + 0] * point[index_min + 0] + rotations[ind + 1] * point[index_min + 1] + rotations[ind + 2] * point[index_min + 2];
y2 = rotations[ind + 3] * point[index_min + 0] + rotations[ind + 4] * point[index_min + 1] + rotations[ind + 5] * point[index_min + 2];
z2 = rotations[ind + 6] * point[index_min + 0] + rotations[ind + 7] * point[index_min + 1] + rotations[ind + 8] * point[index_min + 2];
// smooth l1 loss
Dtype distance = 0;
int index_diff = n * num_points * POSE_CHANNELS * num_classes + p * POSE_CHANNELS * num_classes + POSE_CHANNELS * index_cls;
for (int j = 0; j < 3; j++)
{
Dtype diff, df;
if (j == 0)
diff = x1 - x2;
else if (j == 1)
diff = y1 - y2;
else
diff = z1 - z2;
if (fabs(diff) < 1)
{
distance += 0.5 * diff * diff;
df = diff;
}
else
{
distance += fabs(diff) - 0.5;
if (diff > 0)
df = 1.0;
else
df = -1.0;
}
for (int k = 0; k < 3; k++)
{
ind = n * num_points * 6 * 9 + p * 6 * 9 + 18;
diffs[index_diff + 0] += df * point[index + k] * rotations[ind + j * 3 + k] / num_points;
ind = n * num_points * 6 * 9 + p * 6 * 9 + 27;
diffs[index_diff + 1] += df * point[index + k] * rotations[ind + j * 3 + k] / num_points;
ind = n * num_points * 6 * 9 + p * 6 * 9 + 36;
diffs[index_diff + 2] += df * point[index + k] * rotations[ind + j * 3 + k] / num_points;
ind = n * num_points * 6 * 9 + p * 6 * 9 + 45;
diffs[index_diff + 3] += df * point[index + k] * rotations[ind + j * 3 + k] / num_points;
}
}
losses[index_thread] = distance / num_points;
}
}
template
__global__ void sum_losses_gradients(const int nthreads, const Dtype* losses, const Dtype* diffs,
const int num_classes, const int num_points, const float batch_hard, Dtype* angles, Dtype* loss_batch, Dtype* bottom_diff)
{
CUDA_1D_KERNEL_LOOP(index, nthreads)
{
int n = index / (POSE_CHANNELS * num_classes);
int c = index % (POSE_CHANNELS * num_classes);
bottom_diff[index] = 0;
if (angles[n] > 0)
{
for (int p = 0; p < num_points; p++)
{
int index_diff = n * num_points * POSE_CHANNELS * num_classes + p * POSE_CHANNELS * num_classes + c;
bottom_diff[index] += diffs[index_diff] / batch_hard;
}
}
if (c == 0)
{
loss_batch[n] = 0;
if (angles[n] > 0)
{
for (int p = 0; p < num_points; p++)
loss_batch[n] += losses[n * num_points + p] / batch_hard;
}
}
}
}
std::vector pml_cuda_forward(
at::Tensor bottom_prediction,
at::Tensor bottom_target,
at::Tensor bottom_weight,
at::Tensor points,
at::Tensor symmetry,
float hard_angle)
{
// run kernels
cudaError_t err;
const int kThreadsPerBlock = 512;
int output_size;
// temp losses
const int batch_size = bottom_prediction.size(0);
const int num_classes = points.size(1);
const int num_points = points.size(2);
auto losses = at::zeros({batch_size, num_points}, points.options());
auto losses_batch = at::zeros({batch_size}, points.options());
auto angles_batch = at::zeros({batch_size}, points.options());
auto top_data = at::zeros({1}, points.options());
// temp diffs
auto diffs = at::zeros({batch_size, num_points, POSE_CHANNELS * num_classes}, points.options());
auto bottom_diff = at::zeros({batch_size, POSE_CHANNELS * num_classes}, points.options());
// temp rotations
auto rotations = at::zeros({batch_size, num_points, 6 * 9}, points.options());
// compute the losses and gradients
output_size = batch_size * num_points;
AveragedistanceForward<<<(output_size + kThreadsPerBlock - 1) / kThreadsPerBlock, kThreadsPerBlock>>>(
output_size, bottom_prediction.data(), bottom_target.data(), bottom_weight.data(),
points.data(), symmetry.data(),
batch_size, num_classes, num_points, hard_angle, rotations.data(), losses.data(), diffs.data(), angles_batch.data());
cudaDeviceSynchronize();
// sum the angle flags
thrust::device_ptr angles_ptr(angles_batch.data());
float batch_hard = thrust::reduce(angles_ptr, angles_ptr + batch_size);
err = cudaGetLastError();
if(cudaSuccess != err)
{
fprintf( stderr, "cudaCheckError() failed: %s\n", cudaGetErrorString( err ) );
exit( -1 );
}
// sum the diffs
output_size = batch_size * POSE_CHANNELS * num_classes;
sum_losses_gradients<<<(output_size + kThreadsPerBlock - 1) / kThreadsPerBlock, kThreadsPerBlock>>>(
output_size, losses.data(), diffs.data(), num_classes,
num_points, batch_hard, angles_batch.data(), losses_batch.data(), bottom_diff.data());
cudaDeviceSynchronize();
// sum the loss
thrust::device_ptr losses_ptr(losses_batch.data());
float loss = thrust::reduce(losses_ptr, losses_ptr + batch_size);
cudaMemcpy(top_data.data(), &loss, sizeof(float), cudaMemcpyHostToDevice);
err = cudaGetLastError();
if(cudaSuccess != err)
{
fprintf( stderr, "cudaCheckError() failed: %s\n", cudaGetErrorString( err ) );
exit( -1 );
}
return {top_data, bottom_diff};
}
template
__global__ void AveragedistanceBackward(const int nthreads, const Dtype* top_diff,
const Dtype* bottom_diff, Dtype* output)
{
CUDA_1D_KERNEL_LOOP(index, nthreads)
{
output[index] = top_diff[0] * bottom_diff[index];
}
}
std::vector pml_cuda_backward(
at::Tensor grad_loss,
at::Tensor bottom_diff)
{
cudaError_t err;
const int kThreadsPerBlock = 512;
int output_size;
const int batch_size = bottom_diff.size(0);
const int num_classes = bottom_diff.size(1) / POSE_CHANNELS;
auto grad_rotation = at::zeros({batch_size, POSE_CHANNELS * num_classes}, bottom_diff.options());
output_size = batch_size * POSE_CHANNELS * num_classes;
AveragedistanceBackward<<<(output_size + kThreadsPerBlock - 1) / kThreadsPerBlock, kThreadsPerBlock>>>(
output_size, grad_loss.data(), bottom_diff.data(), grad_rotation.data());
cudaDeviceSynchronize();
err = cudaGetLastError();
if(cudaSuccess != err)
{
fprintf( stderr, "cudaCheckError() failed : %s\n", cudaGetErrorString( err ) );
exit( -1 );
}
return {grad_rotation};
}
================================================
FILE: lib/layers/pose_target_layer.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import torch
import numpy as np
from fcn.config import cfg
from utils.bbox_transform import bbox_transform_inv
from utils.cython_bbox import bbox_overlaps
# rpn_rois: (batch_ids, cls, x1, y1, x2, y2, scores)
# gt_boxes: batch * num_classes * (x1, y1, x2, y2, cls)
def pose_target_layer(rois, bbox_prob, bbox_pred, gt_boxes, poses, is_training):
rois = rois.detach().cpu().numpy()
bbox_prob = bbox_prob.detach().cpu().numpy()
bbox_pred = bbox_pred.detach().cpu().numpy()
gt_boxes = gt_boxes.detach().cpu().numpy()
num_classes = bbox_prob.shape[1]
# process boxes
if cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED:
stds = np.tile(np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS), (num_classes))
means = np.tile(np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS), (num_classes))
bbox_pred *= stds
bbox_pred += means
boxes = rois[:, 2:6].copy()
pred_boxes = bbox_transform_inv(boxes, bbox_pred)
# assign boxes
for i in range(rois.shape[0]):
cls = int(rois[i, 1])
rois[i, 2:6] = pred_boxes[i, cls*4:cls*4+4]
rois[i, 6] = bbox_prob[i, cls]
# convert boxes to (batch_ids, x1, y1, x2, y2, cls)
roi_blob = rois[:, (0, 2, 3, 4, 5, 1)]
gt_box_blob = np.zeros((0, 6), dtype=np.float32)
pose_blob = np.zeros((0, 9), dtype=np.float32)
for i in range(gt_boxes.shape[0]):
for j in range(gt_boxes.shape[1]):
if gt_boxes[i, j, -1] > 0:
gt_box = np.zeros((1, 6), dtype=np.float32)
gt_box[0, 0] = i
gt_box[0, 1:5] = gt_boxes[i, j, :4]
gt_box[0, 5] = gt_boxes[i, j, 4]
gt_box_blob = np.concatenate((gt_box_blob, gt_box), axis=0)
poses[i, j, 0] = i
pose_blob = np.concatenate((pose_blob, poses[i, j, :].cpu().reshape(1, 9)), axis=0)
if gt_box_blob.shape[0] == 0:
num = rois.shape[0]
poses_target = np.zeros((num, 4 * num_classes), dtype=np.float32)
poses_weight = np.zeros((num, 4 * num_classes), dtype=np.float32)
else:
# overlaps: (rois x gt_boxes)
overlaps = bbox_overlaps(
np.ascontiguousarray(roi_blob[:, :5], dtype=np.float),
np.ascontiguousarray(gt_box_blob[:, :5], dtype=np.float))
gt_assignment = overlaps.argmax(axis=1)
max_overlaps = overlaps.max(axis=1)
labels = gt_box_blob[gt_assignment, 5]
quaternions = pose_blob[gt_assignment, 2:6]
# Select foreground RoIs as those with >= FG_THRESH overlap
bg_inds = np.where(max_overlaps < cfg.TRAIN.FG_THRESH_POSE)[0]
labels[bg_inds] = 0
bg_inds = np.where(roi_blob[:, -1] != labels)[0]
labels[bg_inds] = 0
# in training, only use the positive boxes for pose regression
if is_training:
fg_inds = np.where(labels > 0)[0]
if len(fg_inds) > 0:
rois = rois[fg_inds, :]
quaternions = quaternions[fg_inds, :]
labels = labels[fg_inds]
# pose regression targets and weights
poses_target, poses_weight = _compute_pose_targets(quaternions, labels, num_classes)
return torch.from_numpy(rois).cuda(), torch.from_numpy(poses_target).cuda(), torch.from_numpy(poses_weight).cuda()
def _compute_pose_targets(quaternions, labels, num_classes):
"""Compute pose regression targets for an image."""
num = quaternions.shape[0]
poses_target = np.zeros((num, 4 * num_classes), dtype=np.float32)
poses_weight = np.zeros((num, 4 * num_classes), dtype=np.float32)
for i in range(num):
cls = labels[i]
if cls > 0 and np.linalg.norm(quaternions[i, :]) > 0:
start = int(4 * cls)
end = start + 4
poses_target[i, start:end] = quaternions[i, :]
poses_weight[i, start:end] = 1.0
return poses_target, poses_weight
================================================
FILE: lib/layers/posecnn_layers.cpp
================================================
// Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
// This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
// text can be found in LICENSE.md
#include
#include
#include
#include
#include
#define CHECK_CUDA(x) AT_ASSERT(x.type().is_cuda())
#define CHECK_CONTIGUOUS(x) AT_ASSERT(x.is_contiguous())
#define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
/************************************************************
backproject depth to 3D points
*************************************************************/
std::vector backproject_cuda_forward(
float fx, float fy, float px, float py,
at::Tensor depth);
std::vector backproject_forward(
float fx, float fy, float px, float py,
at::Tensor depth)
{
CHECK_INPUT(depth);
return backproject_cuda_forward(fx, fy, px, py, depth);
}
/************************************************************
hard label layer
*************************************************************/
std::vector hard_label_cuda_forward(
float threshold,
float sample_percentage,
at::Tensor bottom_prob,
at::Tensor bottom_label,
at::Tensor bottom_rand);
std::vector hard_label_cuda_backward(
at::Tensor top_diff);
std::vector hard_label_forward(
float threshold,
float sample_percentage,
at::Tensor bottom_prob,
at::Tensor bottom_label,
at::Tensor bottom_rand)
{
CHECK_INPUT(bottom_prob);
CHECK_INPUT(bottom_label);
CHECK_INPUT(bottom_rand);
return hard_label_cuda_forward(threshold, sample_percentage, bottom_prob, bottom_label, bottom_rand);
}
std::vector hard_label_backward(
at::Tensor top_diff) {
CHECK_INPUT(top_diff);
return hard_label_cuda_backward(top_diff);
}
/************************************************************
hough voting layer
*************************************************************/
std::vector hough_voting_cuda_forward(
at::Tensor bottom_label,
at::Tensor bottom_verex,
at::Tensor meta_data,
at::Tensor extents,
int is_train,
int skip_pixels,
int labelThreshold,
float inlierThreshold,
float votingThreshold,
float perThreshold);
std::vector hough_voting_forward(
at::Tensor bottom_label,
at::Tensor bottom_vertex,
at::Tensor meta_data,
at::Tensor extents,
int is_train,
int skip_pixels,
int labelThreshold,
float inlierThreshold,
float votingThreshold,
float perThreshold)
{
CHECK_INPUT(bottom_label);
CHECK_INPUT(bottom_vertex);
CHECK_INPUT(extents);
CHECK_INPUT(meta_data);
return hough_voting_cuda_forward(bottom_label, bottom_vertex, meta_data, extents,
is_train, skip_pixels, labelThreshold, inlierThreshold, votingThreshold, perThreshold);
}
/************************************************************
roi pool layer
*************************************************************/
std::vector roi_pool_cuda_forward(
int pooled_height,
int pooled_width,
float spatial_scale,
at::Tensor bottom_features,
at::Tensor bottom_rois);
std::vector roi_pool_cuda_backward(
int batch_size,
int height,
int width,
float spatial_scale,
at::Tensor top_diff,
at::Tensor bottom_rois,
at::Tensor argmax_data);
std::vector roi_pool_forward(
int pooled_height,
int pooled_width,
float spatial_scale,
at::Tensor bottom_features,
at::Tensor bottom_rois)
{
CHECK_INPUT(bottom_features);
CHECK_INPUT(bottom_rois);
return roi_pool_cuda_forward(pooled_height, pooled_width, spatial_scale, bottom_features, bottom_rois);
}
std::vector roi_pool_backward(
int batch_size,
int height,
int width,
float spatial_scale,
at::Tensor top_diff,
at::Tensor bottom_rois,
at::Tensor argmax_data)
{
CHECK_INPUT(top_diff);
CHECK_INPUT(bottom_rois);
CHECK_INPUT(argmax_data);
return roi_pool_cuda_backward(batch_size, height, width, spatial_scale, top_diff, bottom_rois, argmax_data);
}
/************************************************************
roi align layer
*************************************************************/
at::Tensor ROIAlign_forward_cuda(const at::Tensor& input,
const at::Tensor& rois,
const float spatial_scale,
const int pooled_height,
const int pooled_width,
const int sampling_ratio);
at::Tensor ROIAlign_backward_cuda(const at::Tensor& grad,
const at::Tensor& rois,
const float spatial_scale,
const int pooled_height,
const int pooled_width,
const int batch_size,
const int channels,
const int height,
const int width,
const int sampling_ratio);
// Interface for Python
at::Tensor ROIAlign_forward(const at::Tensor& input,
const at::Tensor& rois,
const float spatial_scale,
const int pooled_height,
const int pooled_width,
const int sampling_ratio)
{
return ROIAlign_forward_cuda(input, rois, spatial_scale, pooled_height, pooled_width, sampling_ratio);
}
at::Tensor ROIAlign_backward(const at::Tensor& grad,
const at::Tensor& rois,
const float spatial_scale,
const int pooled_height,
const int pooled_width,
const int batch_size,
const int channels,
const int height,
const int width,
const int sampling_ratio)
{
return ROIAlign_backward_cuda(grad, rois, spatial_scale, pooled_height, pooled_width, batch_size, channels, height, width, sampling_ratio);
}
/************************************************************
point matching loss layer
*************************************************************/
std::vector pml_cuda_forward(
at::Tensor bottom_prediction,
at::Tensor bottom_target,
at::Tensor bottom_weight,
at::Tensor points,
at::Tensor symmetry,
float hard_angle);
std::vector pml_cuda_backward(
at::Tensor grad_loss,
at::Tensor bottom_diff);
std::vector pml_forward(
at::Tensor bottom_prediction,
at::Tensor bottom_target,
at::Tensor bottom_weight,
at::Tensor points,
at::Tensor symmetry,
float hard_angle)
{
CHECK_INPUT(bottom_prediction);
CHECK_INPUT(bottom_target);
CHECK_INPUT(bottom_weight);
CHECK_INPUT(points);
CHECK_INPUT(symmetry);
return pml_cuda_forward(bottom_prediction, bottom_target, bottom_weight, points, symmetry, hard_angle);
}
std::vector pml_backward(
at::Tensor grad_loss,
at::Tensor bottom_diff)
{
CHECK_INPUT(grad_loss);
CHECK_INPUT(bottom_diff);
return pml_cuda_backward(grad_loss, bottom_diff);
}
/************************************************************
sdf matching loss layer
*************************************************************/
std::vector sdf_loss_cuda_forward(
at::Tensor pose_delta,
at::Tensor pose_init,
at::Tensor sdf_grids,
at::Tensor sdf_limits,
at::Tensor points,
at::Tensor regularization);
std::vector sdf_loss_cuda_backward(
at::Tensor grad_loss,
at::Tensor bottom_diff);
std::vector sdf_loss_forward(
at::Tensor pose_delta,
at::Tensor pose_init,
at::Tensor sdf_grids,
at::Tensor sdf_limits,
at::Tensor points,
at::Tensor regularization)
{
CHECK_INPUT(pose_delta);
CHECK_INPUT(pose_init);
CHECK_INPUT(sdf_grids);
CHECK_INPUT(sdf_limits);
CHECK_INPUT(points);
CHECK_INPUT(regularization);
return sdf_loss_cuda_forward(pose_delta, pose_init, sdf_grids, sdf_limits, points, regularization);
}
std::vector sdf_loss_backward(
at::Tensor grad_loss,
at::Tensor bottom_diff)
{
CHECK_INPUT(grad_loss);
CHECK_INPUT(bottom_diff);
return sdf_loss_cuda_backward(grad_loss, bottom_diff);
}
/********* python interface ***********/
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
m.def("backproject_forward", &backproject_forward, "backproject forward (CUDA)");
m.def("hard_label_forward", &hard_label_forward, "hard_label forward (CUDA)");
m.def("hard_label_backward", &hard_label_backward, "hard_label backward (CUDA)");
m.def("hough_voting_forward", &hough_voting_forward, "hough_voting forward (CUDA)");
m.def("roi_pool_forward", &roi_pool_forward, "roi_pool forward (CUDA)");
m.def("roi_pool_backward", &roi_pool_backward, "roi_pool backward (CUDA)");
m.def("roi_align_forward", &ROIAlign_forward, "ROIAlign_forward");
m.def("roi_align_backward", &ROIAlign_backward, "ROIAlign_backward");
m.def("pml_forward", &pml_forward, "pml forward (CUDA)");
m.def("pml_backward", &pml_backward, "pml backward (CUDA)");
m.def("sdf_loss_forward", &sdf_loss_forward, "SDF loss forward (CUDA)");
m.def("sdf_loss_backward", &sdf_loss_backward, "SDF loss backward (CUDA)");
}
================================================
FILE: lib/layers/roi_align.py
================================================
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
import torch
from torch import nn
from torch.autograd import Function
from torch.autograd.function import once_differentiable
from torch.nn.modules.utils import _pair
import posecnn_cuda
class _ROIAlign(Function):
@staticmethod
def forward(ctx, input, roi, output_size, spatial_scale, sampling_ratio):
ctx.save_for_backward(roi)
ctx.output_size = _pair(output_size)
ctx.spatial_scale = spatial_scale
ctx.sampling_ratio = sampling_ratio
ctx.input_shape = input.size()
output = posecnn_cuda.roi_align_forward(input, roi, spatial_scale,
output_size[0], output_size[1],
sampling_ratio)
return output
@staticmethod
@once_differentiable
def backward(ctx, grad_output):
rois, = ctx.saved_tensors
output_size = ctx.output_size
spatial_scale = ctx.spatial_scale
sampling_ratio = ctx.sampling_ratio
bs, ch, h, w = ctx.input_shape
grad_input = posecnn_cuda.roi_align_backward(
grad_output,
rois,
spatial_scale,
output_size[0],
output_size[1],
bs,
ch,
h,
w,
sampling_ratio,
)
return grad_input, None, None, None, None
roi_align = _ROIAlign.apply
class ROIAlign(nn.Module):
def __init__(self, output_size, spatial_scale, sampling_ratio):
super(ROIAlign, self).__init__()
self.output_size = output_size
self.spatial_scale = spatial_scale
self.sampling_ratio = sampling_ratio
def forward(self, input, rois):
return roi_align(input, rois, self.output_size, self.spatial_scale,
self.sampling_ratio)
def __repr__(self):
tmpstr = self.__class__.__name__ + "("
tmpstr += "output_size=" + str(self.output_size)
tmpstr += ", spatial_scale=" + str(self.spatial_scale)
tmpstr += ", sampling_ratio=" + str(self.sampling_ratio)
tmpstr += ")"
return tmpstr
================================================
FILE: lib/layers/roi_pooling.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import math
from torch import nn
from torch.autograd import Function
import torch
import posecnn_cuda
class RoIPoolFunction(Function):
@staticmethod
def forward(ctx, features, rois, pool_height, pool_width, spatial_scale):
outputs = posecnn_cuda.roi_pool_forward(pool_height, pool_width, spatial_scale, features, rois)
top_data = outputs[0]
variables = outputs[1:]
variables.append(rois)
ctx.feature_size = features.size()
ctx.spatial_scale = spatial_scale
ctx.save_for_backward(*variables)
return top_data
@staticmethod
def backward(ctx, top_diff):
argmax_data = ctx.saved_variables[0]
rois = ctx.saved_variables[1]
batch_size, num_channels, data_height, data_width = ctx.feature_size
spatial_scale = ctx.spatial_scale
outputs = posecnn_cuda.roi_pool_backward(batch_size, data_height, data_width, spatial_scale, top_diff, rois, argmax_data)
d_features = outputs[0]
return d_features, None, None, None, None
class RoIPool(nn.Module):
def __init__(self, pool_height, pool_width, spatial_scale):
super(RoIPool, self).__init__()
self.pool_width = int(pool_width)
self.pool_height = int(pool_height)
self.spatial_scale = float(spatial_scale)
def forward(self, features, rois):
return RoIPoolFunction.apply(features, rois, self.pool_height, self.pool_width, self.spatial_scale)
================================================
FILE: lib/layers/roi_pooling_kernel.cu
================================================
// Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
// This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
// text can be found in LICENSE.md
#include
#include
#include
#include
#include
#define CUDA_1D_KERNEL_LOOP(i, n) \
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \
i += blockDim.x * gridDim.x)
__global__ void ROIPoolForward(const int nthreads, const float* bottom_data, const float spatial_scale, const int height, const int width,
const int channels, const int channel_rois, const int pool_height, const int pool_width,
const float* bottom_rois, float* top_data, float* argmax_data)
{
CUDA_1D_KERNEL_LOOP(index, nthreads)
{
// (n, c, ph, pw) is an element in the pool output
// int n = index;
// int pw = n % pool_width;
// n /= pool_width;
// int ph = n % pool_height;
// n /= pool_height;
// int c = n % channels;
// n /= channels;
// (n, c, ph, pw) is an element in the pooled output
int pw = index % pool_width;
int ph = (index / pool_width) % pool_height;
int c = (index / pool_width / pool_height) % channels;
int n = index / pool_width / pool_height / channels;
float roi_batch_ind = bottom_rois[n * channel_rois + 0];
float roi_start_w = bottom_rois[n * channel_rois + 2] * spatial_scale;
float roi_start_h = bottom_rois[n * channel_rois + 3] * spatial_scale;
float roi_end_w = bottom_rois[n * channel_rois + 4] * spatial_scale;
float roi_end_h = bottom_rois[n * channel_rois + 5] * spatial_scale;
// Force malformed ROIs to be 1x1
int roi_width = max(roi_end_w - roi_start_w + 1, 1.0);
int roi_height = max(roi_end_h - roi_start_h + 1, 1.0);
float bin_size_h = static_cast(roi_height) / static_cast(pool_height);
float bin_size_w = static_cast(roi_width) / static_cast(pool_width);
int hstart = static_cast(floor(static_cast(ph) * bin_size_h));
int wstart = static_cast(floor(static_cast(pw) * bin_size_w));
int hend = static_cast(ceil(static_cast(ph + 1) * bin_size_h));
int wend = static_cast(ceil(static_cast(pw + 1) * bin_size_w));
// Add roi offsets and clip to input boundaries
hstart = min(max(int(hstart + roi_start_h), 0), height);
hend = min(max(int(hend + roi_start_h), 0), height);
wstart = min(max(int(wstart + roi_start_w), 0), width);
wend = min(max(int(wend + roi_start_w), 0), width);
bool is_empty = (hend <= hstart) || (wend <= wstart);
// Define an empty pooling region to be zero
float maxval = is_empty ? 0 : -FLT_MAX;
// If nothing is pooled, argmax = -1 causes nothing to be backprop'd
int maxidx = -1;
int offset = roi_batch_ind * channels * height * width;
const float* offset_bottom_data = bottom_data + offset;
for (int h = hstart; h < hend; ++h)
{
for (int w = wstart; w < wend; ++w)
{
int bottom_index = c * height * width + h * width + w;
if (offset_bottom_data[bottom_index] > maxval) {
maxval = offset_bottom_data[bottom_index];
maxidx = bottom_index;
}
}
}
top_data[index] = maxval;
argmax_data[index] = maxidx;
}
}
std::vector roi_pool_cuda_forward(
int pool_height,
int pool_width,
float spatial_scale,
at::Tensor bottom_features,
at::Tensor bottom_rois)
{
// run kernels
const int kThreadsPerBlock = 1024;
cudaError_t err;
const int batch_size = bottom_features.size(0);
const int num_channels = bottom_features.size(1);
const int height = bottom_features.size(2);
const int width = bottom_features.size(3);
const int num_rois = bottom_rois.size(0);
const int channel_rois = bottom_rois.size(1);
auto top_data = at::zeros({num_rois, num_channels, pool_height, pool_width}, bottom_features.options());
auto top_argmax = at::zeros({num_rois, num_channels, pool_height, pool_width}, bottom_features.options());
const int output_size = num_rois * num_channels * pool_height * pool_width;
ROIPoolForward<<<(output_size + kThreadsPerBlock - 1) / kThreadsPerBlock, kThreadsPerBlock>>>(
output_size, bottom_features.data(), spatial_scale, height, width, num_channels, channel_rois,
pool_height, pool_width, bottom_rois.data(), top_data.data(), top_argmax.data());
err = cudaGetLastError();
if(cudaSuccess != err)
{
fprintf( stderr, "cudaCheckError() failed : %s\n", cudaGetErrorString( err ) );
exit( -1 );
}
return {top_data, top_argmax};
}
__global__ void ROIPoolBackward(const int nthreads, const float* top_diff, const float spatial_scale, const int height, const int width,
const int num_rois, const int channels, const int channel_rois,
const int pool_height, const int pool_width, float* bottom_diff,
const float* bottom_rois, const float* argmax_data)
{
CUDA_1D_KERNEL_LOOP(index, nthreads)
{
// (n, c, h, w) coords in bottom data
int w = index % width;
int h = (index / width) % height;
int c = (index / width / height) % channels;
int n = index / width / height / channels;
float gradient = 0;
// Accumulate gradient over all ROIs that pooled this element
for (int roi_n = 0; roi_n < num_rois; roi_n++)
{
const float* offset_bottom_rois = bottom_rois + roi_n * channel_rois;
int roi_batch_ind = int(offset_bottom_rois[0]);
// Skip if ROI's batch index doesn't match n
if (n != roi_batch_ind)
continue;
int roi_start_w = round(offset_bottom_rois[2] * spatial_scale);
int roi_start_h = round(offset_bottom_rois[3] * spatial_scale);
int roi_end_w = round(offset_bottom_rois[4] * spatial_scale);
int roi_end_h = round(offset_bottom_rois[5] * spatial_scale);
// Skip if ROI doesn't include (h, w)
const bool in_roi = (w >= roi_start_w && w <= roi_end_w &&
h >= roi_start_h && h <= roi_end_h);
if (!in_roi)
continue;
int offset = roi_n * channels * pool_height * pool_width;
const float* offset_top_diff = top_diff + offset;
const float* offset_argmax_data = argmax_data + offset;
// Compute feasible set of pooled units that could have pooled
// this bottom unit
// Force malformed ROIs to be 1x1
int roi_width = max(roi_end_w - roi_start_w + 1, 1);
int roi_height = max(roi_end_h - roi_start_h + 1, 1);
float bin_size_h = static_cast(roi_height) / static_cast(pool_height);
float bin_size_w = static_cast(roi_width) / static_cast(pool_width);
int phstart = floor(static_cast(h - roi_start_h) / bin_size_h);
int phend = ceil(static_cast(h - roi_start_h + 1) / bin_size_h);
int pwstart = floor(static_cast(w - roi_start_w) / bin_size_w);
int pwend = ceil(static_cast(w - roi_start_w + 1) / bin_size_w);
phstart = min(max(phstart, 0), pool_height);
phend = min(max(phend, 0), pool_height);
pwstart = min(max(pwstart, 0), pool_width);
pwend = min(max(pwend, 0), pool_width);
for (int ph = phstart; ph < phend; ++ph)
{
for (int pw = pwstart; pw < pwend; ++pw)
{
if (int(offset_argmax_data[c * pool_height * pool_width + ph * pool_width + pw]) == c * height * width + h * width + w)
gradient += offset_top_diff[c * pool_height * pool_width + ph * pool_width + pw];
}
}
}
bottom_diff[index] = gradient;
}
}
std::vector roi_pool_cuda_backward(
int batch_size,
int height,
int width,
float spatial_scale,
at::Tensor top_diff,
at::Tensor bottom_rois,
at::Tensor argmax_data)
{
const int kThreadsPerBlock = 1024;
cudaError_t err;
const int num_rois = top_diff.size(0);
const int num_channels = top_diff.size(1);
const int pool_height = top_diff.size(2);
const int pool_width = top_diff.size(3);
const int channel_rois = bottom_rois.size(1);
auto bottom_diff = at::zeros({batch_size, num_channels, height, width}, top_diff.options());
const int output_size = batch_size * num_channels * height * width;
ROIPoolBackward<<<(output_size + kThreadsPerBlock - 1) / kThreadsPerBlock, kThreadsPerBlock>>>(
output_size, top_diff.data(), spatial_scale, height, width, num_rois, num_channels, channel_rois,
pool_height, pool_width, bottom_diff.data(), bottom_rois.data(), argmax_data.data());
err = cudaGetLastError();
if(cudaSuccess != err)
{
fprintf( stderr, "cudaCheckError() failed : %s\n", cudaGetErrorString( err ) );
exit(-1);
}
return {bottom_diff};
}
================================================
FILE: lib/layers/roi_target_layer.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
from __future__ import absolute_import
from __future__ import division
import torch
import numpy as np
import numpy.random as npr
from fcn.config import cfg
from utils.bbox_transform import bbox_transform
from utils.cython_bbox import bbox_overlaps
# rpn_rois: (batch_ids, cls, x1, y1, x2, y2, scores)
# gt_boxes: batch * num_classes * (x1, y1, x2, y2, cls)
def roi_target_layer(rpn_rois, gt_boxes):
"""
Assign object detection proposals to ground-truth targets. Produces proposal
classification labels and bounding-box regression targets.
"""
rpn_rois = rpn_rois.detach().cpu().numpy()
gt_boxes = gt_boxes.detach().cpu().numpy()
num_classes = gt_boxes.shape[1]
# convert boxes to (batch_ids, x1, y1, x2, y2, cls)
roi_blob = rpn_rois[:, (0, 2, 3, 4, 5, 1)]
gt_box_blob = np.zeros((0, 6), dtype=np.float32)
for i in range(gt_boxes.shape[0]):
for j in range(gt_boxes.shape[1]):
if gt_boxes[i, j, -1] > 0:
gt_box = np.zeros((1, 6), dtype=np.float32)
gt_box[0, 0] = i
gt_box[0, 1:5] = gt_boxes[i, j, :4]
gt_box[0, 5] = gt_boxes[i, j, 4]
gt_box_blob = np.concatenate((gt_box_blob, gt_box), axis=0)
# sample rois with classification labels and bounding box regression targets
labels, bbox_targets, bbox_inside_weights = _sample_rois(roi_blob, gt_box_blob, num_classes)
bbox_outside_weights = np.array(bbox_inside_weights > 0).astype(np.float32)
# convert labels
num = labels.shape[0]
label_blob = np.zeros((num, num_classes), dtype=np.float32)
if np.any(roi_blob[:, -1] > 0):
for i in range(num):
label_blob[i, int(labels[i])] = 1.0
return torch.from_numpy(label_blob).cuda(), torch.from_numpy(bbox_targets).cuda(), \
torch.from_numpy(bbox_inside_weights).cuda(), torch.from_numpy(bbox_outside_weights).cuda()
def _get_bbox_regression_labels(bbox_target_data, num_classes):
"""Bounding-box regression targets (bbox_target_data) are stored in a
compact form N x (class, tx, ty, tw, th)
This function expands those targets into the 4-of-4*K representation used
by the network (i.e. only one class has non-zero targets).
Returns:
bbox_target (ndarray): N x 4K blob of regression targets
bbox_inside_weights (ndarray): N x 4K blob of loss weights
"""
clss = bbox_target_data[:, 0]
bbox_targets = np.zeros((clss.size, 4 * num_classes), dtype=np.float32)
bbox_inside_weights = np.zeros(bbox_targets.shape, dtype=np.float32)
inds = np.where(clss > 0)[0]
for ind in inds:
cls = clss[ind]
start = int(4 * cls)
end = start + 4
bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
return bbox_targets, bbox_inside_weights
def _compute_targets(ex_rois, gt_rois, labels):
"""Compute bounding-box regression targets for an image."""
assert ex_rois.shape[0] == gt_rois.shape[0]
assert ex_rois.shape[1] == 4
assert gt_rois.shape[1] == 4
targets = bbox_transform(ex_rois, gt_rois)
if cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED:
# Optionally normalize targets by a precomputed mean and stdev
targets = ((targets - np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS))
/ np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS))
return np.hstack(
(labels[:, np.newaxis], targets)).astype(np.float32, copy=False)
def _sample_rois(all_rois, gt_boxes, num_classes):
"""Generate a random sample of RoIs comprising foreground and background
examples.
"""
# all_rois (batch_ids, x1, y1, x2, y2, cls)
# gt_boxes (batch_ids, x1, y1, x2, y2, cls)
# overlaps: (rois x gt_boxes)
if gt_boxes.shape[0] == 0:
num = all_rois.shape[0]
labels = np.zeros((num, 1), dtype=np.float32)
bbox_targets = np.zeros((num, 4 * num_classes), dtype=np.float32)
bbox_inside_weights = np.zeros(bbox_targets.shape, dtype=np.float32)
else:
overlaps = bbox_overlaps(
np.ascontiguousarray(all_rois[:, :5], dtype=np.float),
np.ascontiguousarray(gt_boxes[:, :5], dtype=np.float))
gt_assignment = overlaps.argmax(axis=1)
max_overlaps = overlaps.max(axis=1)
labels = gt_boxes[gt_assignment, 5]
# Select foreground RoIs as those with >= FG_THRESH overlap
# fg_inds = np.where(max_overlaps >= cfg.TRAIN.FG_THRESH)[0]
bg_inds = np.where(max_overlaps < cfg.TRAIN.FG_THRESH)[0]
labels[bg_inds] = 0
# print '{:d} rois, {:d} fg, {:d} bg'.format(all_rois.shape[0], all_rois.shape[0]-len(bg_inds), len(bg_inds))
# print all_rois
bbox_target_data = _compute_targets(
all_rois[:, 1:5], gt_boxes[gt_assignment, 1:5], labels)
bbox_targets, bbox_inside_weights = \
_get_bbox_regression_labels(bbox_target_data, num_classes)
return labels, bbox_targets, bbox_inside_weights
================================================
FILE: lib/layers/sdf_matching_loss.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import math
from torch import nn
from torch.autograd import Function
import torch
import posecnn_cuda
class SDFLossFunction(Function):
@staticmethod
def forward(ctx, pose_delta, pose_init, sdf_grids, sdf_limits, points, regularization):
outputs = posecnn_cuda.sdf_loss_forward(pose_delta, pose_init, sdf_grids, sdf_limits, points, regularization)
loss = outputs[0]
sdf_values = outputs[1]
se3 = outputs[2]
dalpha = outputs[3]
J = outputs[4]
variables = outputs[4:]
ctx.save_for_backward(*variables)
return loss, sdf_values, se3, dalpha, J
@staticmethod
def backward(ctx, grad_loss, grad_sdf_values, grad_se3, grad_JTJ, grad_J):
outputs = posecnn_cuda.sdf_loss_backward(grad_loss, *ctx.saved_variables)
d_delta = outputs[0]
return d_delta, None, None, None, None
class SDFLoss(nn.Module):
def __init__(self):
super(SDFLoss, self).__init__()
def forward(self, pose_delta, pose_init, sdf_grids, sdf_limits, points, regularization):
return SDFLossFunction.apply(pose_delta, pose_init, sdf_grids, sdf_limits, points, regularization)
================================================
FILE: lib/layers/sdf_matching_loss_kernel.cu
================================================
// Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
// This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
// text can be found in LICENSE.md
#include
#include
#include
#include
#include
#include
#include
#include
#define CUDA_1D_KERNEL_LOOP(i, n) \
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \
i += blockDim.x * gridDim.x)
inline __device__ __host__ float lerp(float a, float b, float t)
{
return a + t*(b-a);
}
__device__ __host__ float3 operator+(const float3 &a, const float3 &b)
{
return make_float3(a.x+b.x, a.y+b.y, a.z+b.z);
}
__device__ __host__ float3 operator-(const float3 &a, const float3 &b)
{
return make_float3(a.x-b.x, a.y-b.y, a.z-b.z);
}
template
inline __device__ __host__ const Dtype & getValue(const int3 & v, const int3 & dim, const Dtype* sdf_grids)
{
return sdf_grids[v.x * dim.y * dim.z + v.y * dim.z + v.z];
}
template
inline __device__ __host__ Dtype getValueInterpolated(const float3 & pGrid, const int3 & dim, const Dtype* sdf_grids)
{
const int x0 = (int)(pGrid.x - 0.5); const float fx = (pGrid.x - 0.5) - x0;
const int y0 = (int)(pGrid.y - 0.5); const float fy = (pGrid.y - 0.5) - y0;
const int z0 = (int)(pGrid.z - 0.5); const float fz = (pGrid.z - 0.5) - z0;
const int x1 = x0 + 1;
const int y1 = y0 + 1;
const int z1 = z0 + 1;
if ( !(x0 >= 0 && x1 < dim.x && y0 >= 0 && y1 < dim.y && z0 >=0 && z1 < dim.z) )
return 0.1;
const float dx00 = lerp( getValue(make_int3(x0,y0,z0), dim, sdf_grids), getValue(make_int3(x1,y0,z0), dim, sdf_grids), fx);
const float dx01 = lerp( getValue(make_int3(x0,y0,z1), dim, sdf_grids), getValue(make_int3(x1,y0,z1), dim, sdf_grids), fx);
const float dx10 = lerp( getValue(make_int3(x0,y1,z0), dim, sdf_grids), getValue(make_int3(x1,y1,z0), dim, sdf_grids), fx);
const float dx11 = lerp( getValue(make_int3(x0,y1,z1), dim, sdf_grids), getValue(make_int3(x1,y1,z1), dim, sdf_grids), fx);
const float dxy0 = lerp( dx00, dx10, fy );
const float dxy1 = lerp( dx01, dx11, fy );
float dxyz = lerp( dxy0, dxy1, fz );
// penalize inside objects
// if (dxyz < 0)
// dxyz *= 10;
return dxyz;
}
template
inline __device__ __host__ float3 getGradientInterpolated(const float3 & pGrid, const int3 & dim, const Dtype* sdf_grids)
{
const float3 delta_x = make_float3(1,0,0);
const float3 delta_y = make_float3(0,1,0);
const float3 delta_z = make_float3(0,0,1);
Dtype f_px = getValueInterpolated(pGrid + delta_x, dim, sdf_grids);
Dtype f_py = getValueInterpolated(pGrid + delta_y, dim, sdf_grids);
Dtype f_pz = getValueInterpolated(pGrid + delta_z, dim, sdf_grids);
Dtype f_mx = getValueInterpolated(pGrid - delta_x, dim, sdf_grids);
Dtype f_my = getValueInterpolated(pGrid - delta_y, dim, sdf_grids);
Dtype f_mz = getValueInterpolated(pGrid - delta_z, dim, sdf_grids);
float3 grad;
grad.x = 0.5*(f_px - f_mx);
grad.y = 0.5*(f_py - f_my);
grad.z = 0.5*(f_pz - f_mz);
return grad;
}
/*******************************************/
/* pose_delta: num_objects x 6 */
/* pose_init: num_objects x 4 x 4 */
/* sdf_grid: num_classes x c x h x w */
/* sdf_limits: num_classes x 9 */
/* points: num_points x 5 */
/*******************************************/
template
__global__ void SDFdistanceForward(const int nthreads, const Dtype* pose_delta, const Dtype* pose_init,
const Dtype* sdf_grids, const Dtype* sdf_limits, const Dtype* points,
const int num_points, Dtype* losses, Dtype* top_values,
Dtype* diffs, Dtype* JTJ, Dtype* top_se3)
{
typedef Sophus::SE3 SE3;
typedef Eigen::Matrix Vec3;
// index is the index of point
CUDA_1D_KERNEL_LOOP(index, nthreads)
{
int cls_index = int(points[5 * index + 3]);
int obj_index = int(points[5 * index + 4]);
int start_index;
// convert delta pose
Eigen::Matrix deltaPose;
start_index = 6 * obj_index;
deltaPose << pose_delta[start_index + 0], pose_delta[start_index + 1], pose_delta[start_index + 2],
pose_delta[start_index + 3], pose_delta[start_index + 4], pose_delta[start_index + 5];
SE3 deltaPoseMatrix = SE3::exp(deltaPose);
// convert initial pose
Eigen::Matrix initialPose;
start_index = 16 * obj_index;
initialPose << pose_init[start_index + 0], pose_init[start_index + 1], pose_init[start_index + 2], pose_init[start_index + 3],
pose_init[start_index + 4], pose_init[start_index + 5], pose_init[start_index + 6], pose_init[start_index + 7],
pose_init[start_index + 8], pose_init[start_index + 9], pose_init[start_index + 10], pose_init[start_index + 11],
pose_init[start_index + 12], pose_init[start_index + 13], pose_init[start_index + 14], pose_init[start_index + 15];
SE3 initialPoseMatrix = SE3(initialPose);
// start point of a new object
if (index == 0 || int(points[5 * (index-1) + 4]) != obj_index)
{
SE3 pose = deltaPoseMatrix * initialPoseMatrix;
Eigen::Matrix matrix = pose.matrix3x4();
int count = 0;
start_index = 16 * obj_index;
for (int i = 0; i < 3; i++)
{
for (int j = 0; j < 4; j++)
top_se3[start_index + count++] = matrix(i, j);
}
top_se3[start_index + 15] = 1.0;
}
// convert point
Vec3 point;
point << points[5 * index], points[5 * index + 1], points[5 * index + 2];
// transform the point
const Vec3 updatedPoint = deltaPoseMatrix * initialPoseMatrix * point;
// obtain sdf value
start_index = 9 * cls_index;
int d0 = int(sdf_limits[start_index + 6]);
int d1 = int(sdf_limits[start_index + 7]);
int d2 = int(sdf_limits[start_index + 8]);
float px = (updatedPoint(0) - sdf_limits[start_index + 0]) / (sdf_limits[start_index + 3] - sdf_limits[start_index + 0]) * d0;
float py = (updatedPoint(1) - sdf_limits[start_index + 1]) / (sdf_limits[start_index + 4] - sdf_limits[start_index + 1]) * d1;
float pz = (updatedPoint(2) - sdf_limits[start_index + 2]) / (sdf_limits[start_index + 5] - sdf_limits[start_index + 2]) * d2;
float3 pGrid = make_float3(px, py, pz);
int3 dim = make_int3(d0, d1, d2);
Dtype value = getValueInterpolated(pGrid, dim, sdf_grids + cls_index * d0 * d1 * d2);
// L2 loss
int flag = 1;
if (value < 0)
flag = -1;
value *= flag;
losses[index] = 0.5 * value * value;
top_values[index] = losses[index];
// L2 penalty on translation
// float lambda = 0.1;
// losses[index] += 0.5 * lambda * (pose_delta[0] * pose_delta[0] + pose_delta[1] * pose_delta[1] + pose_delta[2] * pose_delta[2]);
// compute gradient
float3 grad = getGradientInterpolated(pGrid, dim, sdf_grids + cls_index * d0 * d1 * d2);
Vec3 sdfUpdate;
sdfUpdate << grad.x, grad.y, grad.z;
Eigen::Matrix dUpdate;
dUpdate << 1, 0, 0, 0, updatedPoint(2), -updatedPoint(1),
0, 1, 0, -updatedPoint(2), 0, updatedPoint(0),
0, 0, 1, updatedPoint(1), -updatedPoint(0), 0;
Eigen::Matrix J = flag * sdfUpdate.transpose() * dUpdate;
// assign gradient
for (int i = 0; i < 6; i++)
diffs[6 * index + i] = value * J(i);
// L2 penalty on translation
// diffs[6 * index + 0] += lambda * pose_delta[0];
// diffs[6 * index + 1] += lambda * pose_delta[1];
// diffs[6 * index + 2] += lambda * pose_delta[2];
// compute JTJ
Eigen::Matrix result = J.transpose() * J;
for (int i = 0; i < 6; i++)
{
for (int j = 0; j < 6; j++)
JTJ[36 * index + i * 6 + j] = result(i, j);
}
}
}
/* diffs: num_points x num_channels */
/* bottom_diff: num_objects x num_channels */
template
__global__ void sum_gradients(const int nthreads, const Dtype* diffs, const int num_channels, const Dtype* points, Dtype* bottom_diff)
{
CUDA_1D_KERNEL_LOOP(index, nthreads)
{
int p = index / num_channels;
int c = index % num_channels;
int obj_index = int(points[5 * p + 4]);
atomicAdd(bottom_diff + obj_index * num_channels + c, diffs[index]);
}
}
/*******************************************/
/* pose_delta: num_objects x 6 */
/* pose_init: num_objects x 4 x 4 */
/* sdf_grid: num_classes x c x h x w */
/* sdf_limits: num_classes x 9 */
/* points: num_points x 5 */
/*******************************************/
std::vector sdf_loss_cuda_forward(
at::Tensor pose_delta,
at::Tensor pose_init,
at::Tensor sdf_grids,
at::Tensor sdf_limits,
at::Tensor points,
at::Tensor regularization)
{
// run kernels
cudaError_t err;
const int kThreadsPerBlock = 512;
const int num_channels = 6;
int output_size;
// sizes
const int num_objects = pose_delta.size(0);
const int num_classes = sdf_grids.size(0);
const int num_points = points.size(0);
// temp losses
auto losses = at::zeros({num_points}, points.options());
auto top_values = at::zeros({num_points}, points.options());
auto top_data = at::zeros({1}, points.options());
auto top_se3 = at::zeros({num_objects, 4, 4}, points.options());
// temp diffs
auto diffs = at::zeros({num_points, num_channels}, points.options());
auto JTJ = at::zeros({num_points, num_channels, num_channels}, points.options());
auto bottom_diff = at::zeros({num_objects, num_channels}, points.options());
auto bottom_JTJ = at::zeros({num_objects, num_channels, num_channels}, points.options());
// compute the losses and gradients
output_size = num_points;
SDFdistanceForward<<<(output_size + kThreadsPerBlock - 1) / kThreadsPerBlock, kThreadsPerBlock>>>(
output_size, pose_delta.data(), pose_init.data(), sdf_grids.data(), sdf_limits.data(),
points.data(), num_points, losses.data(), top_values.data(),
diffs.data(), JTJ.data(), top_se3.data());
cudaDeviceSynchronize();
err = cudaGetLastError();
if(cudaSuccess != err)
{
fprintf( stderr, "cudaCheckError() failed: %s\n", cudaGetErrorString( err ) );
exit( -1 );
}
// sum the diffs
output_size = num_points * num_channels;
sum_gradients<<<(output_size + kThreadsPerBlock - 1) / kThreadsPerBlock, kThreadsPerBlock>>>(
output_size, diffs.data(), num_channels, points.data(), bottom_diff.data());
output_size = num_points * num_channels * num_channels;
sum_gradients<<<(output_size + kThreadsPerBlock - 1) / kThreadsPerBlock, kThreadsPerBlock>>>(
output_size, JTJ.data(), num_channels * num_channels, points.data(), bottom_JTJ.data());
cudaDeviceSynchronize();
// sum the loss
thrust::device_ptr losses_ptr(losses.data());
float loss = thrust::reduce(losses_ptr, losses_ptr + num_points) / num_points;
cudaMemcpy(top_data.data(), &loss, sizeof(float), cudaMemcpyHostToDevice);
err = cudaGetLastError();
if(cudaSuccess != err)
{
fprintf( stderr, "cudaCheckError() failed: %s\n", cudaGetErrorString( err ) );
exit( -1 );
}
// compute Gauss Newton update
float* bottom_diff_host = (float*)malloc(num_objects * num_channels * sizeof(float));
float* regularization_host = (float*)malloc(num_channels * sizeof(float));
float* bottom_JTJ_host = (float*)malloc(num_objects * num_channels * num_channels * sizeof(float));
cudaMemcpy(bottom_diff_host, bottom_diff.data(), num_objects * num_channels * sizeof(float), cudaMemcpyDeviceToHost);
cudaMemcpy(regularization_host, regularization.data(), num_channels * sizeof(float), cudaMemcpyDeviceToHost);
cudaMemcpy(bottom_JTJ_host, bottom_JTJ.data(), num_objects * num_channels * num_channels * sizeof(float), cudaMemcpyDeviceToHost);
Eigen::Matrix J_eigen;
Eigen::Matrix JTJ_eigen;
float* dalpha_all = (float*)malloc(num_objects * num_channels * sizeof(float));
for (int k = 0; k < num_objects; k++)
{
for (int i = 0; i < num_channels; i++)
{
J_eigen(i) = bottom_diff_host[k * num_channels + i];
for (int j = 0; j < num_channels; j++)
JTJ_eigen(i, j) = bottom_JTJ_host[k * num_channels * num_channels + i * num_channels + j];
JTJ_eigen(i, i) += regularization_host[i];
}
Eigen::Matrix dalpha = JTJ_eigen.ldlt().solve(J_eigen);
for (int i = 0; i < num_channels; i++)
dalpha_all[k * num_channels + i] = dalpha(i);
}
auto bottom_delta = at::zeros({num_objects, num_channels}, points.options());
cudaMemcpy(bottom_delta.data(), dalpha_all, num_objects * num_channels * sizeof(float), cudaMemcpyHostToDevice);
free(bottom_diff_host);
free(regularization_host);
free(bottom_JTJ_host);
free(dalpha_all);
return {top_data, top_values, top_se3, bottom_delta, bottom_diff};
}
template
__global__ void SDFdistanceBackward(const int nthreads, const Dtype* top_diff,
const Dtype* bottom_diff, Dtype* output)
{
CUDA_1D_KERNEL_LOOP(index, nthreads)
{
output[index] = top_diff[0] * bottom_diff[index];
}
}
std::vector sdf_loss_cuda_backward(
at::Tensor grad_loss,
at::Tensor bottom_diff)
{
cudaError_t err;
const int kThreadsPerBlock = 512;
int output_size;
const int batch_size = bottom_diff.size(0);
const int num_channels = bottom_diff.size(1);
auto grad_pose = at::zeros({batch_size, num_channels}, bottom_diff.options());
output_size = batch_size * num_channels;
SDFdistanceBackward<<<(output_size + kThreadsPerBlock - 1) / kThreadsPerBlock, kThreadsPerBlock>>>(
output_size, grad_loss.data(), bottom_diff.data(), grad_pose.data());
cudaDeviceSynchronize();
err = cudaGetLastError();
if(cudaSuccess != err)
{
fprintf( stderr, "cudaCheckError() failed : %s\n", cudaGetErrorString( err ) );
exit( -1 );
}
return {grad_pose};
}
================================================
FILE: lib/layers/setup.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
from setuptools import setup
from torch.utils.cpp_extension import BuildExtension, CUDAExtension
setup(
name='posecnn',
ext_modules=[
CUDAExtension(
name='posecnn_cuda',
sources = [
'backproject_kernel.cu',
'sdf_matching_loss_kernel.cu',
'posecnn_layers.cpp',
'hard_label_kernel.cu',
'hough_voting_kernel.cu',
'roi_pooling_kernel.cu',
'ROIAlign_cuda.cu',
'point_matching_loss_kernel.cu'],
include_dirs = ['/usr/local/include/eigen3', '/usr/local/include'])
],
cmdclass={
'build_ext': BuildExtension
})
================================================
FILE: lib/networks/PoseCNN.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import torch
import torch.nn as nn
import torchvision.models as models
import math
import sys
import copy
from torch.nn.init import kaiming_normal_
from layers.hard_label import HardLabel
from layers.hough_voting import HoughVoting
from layers.roi_pooling import RoIPool
from layers.point_matching_loss import PMLoss
from layers.roi_target_layer import roi_target_layer
from layers.pose_target_layer import pose_target_layer
from fcn.config import cfg
__all__ = [
'posecnn',
]
vgg16 = models.vgg16(pretrained=False)
def log_softmax_high_dimension(input):
num_classes = input.size()[1]
m = torch.max(input, dim=1, keepdim=True)[0]
if input.dim() == 4:
d = input - m.repeat(1, num_classes, 1, 1)
else:
d = input - m.repeat(1, num_classes)
e = torch.exp(d)
s = torch.sum(e, dim=1, keepdim=True)
if input.dim() == 4:
output = d - torch.log(s.repeat(1, num_classes, 1, 1))
else:
output = d - torch.log(s.repeat(1, num_classes))
return output
def softmax_high_dimension(input):
num_classes = input.size()[1]
m = torch.max(input, dim=1, keepdim=True)[0]
if input.dim() == 4:
e = torch.exp(input - m.repeat(1, num_classes, 1, 1))
else:
e = torch.exp(input - m.repeat(1, num_classes))
s = torch.sum(e, dim=1, keepdim=True)
if input.dim() == 4:
output = torch.div(e, s.repeat(1, num_classes, 1, 1))
else:
output = torch.div(e, s.repeat(1, num_classes))
return output
def conv(in_planes, out_planes, kernel_size=3, stride=1, relu=True):
if relu:
return nn.Sequential(
nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=(kernel_size-1)//2, bias=True),
nn.ReLU(inplace=True))
else:
return nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=(kernel_size-1)//2, bias=True)
def fc(in_planes, out_planes, relu=True):
if relu:
return nn.Sequential(
nn.Linear(in_planes, out_planes),
nn.LeakyReLU(0.1, inplace=True))
else:
return nn.Linear(in_planes, out_planes)
def upsample(scale_factor):
return nn.Upsample(scale_factor=scale_factor, mode='bilinear')
class PoseCNN(nn.Module):
def __init__(self, num_classes, num_units):
super(PoseCNN, self).__init__()
self.num_classes = num_classes
# conv features
features = list(vgg16.features)[:30]
# change the first conv layer for RGBD
if cfg.INPUT == 'RGBD':
conv0 = conv(6, 64, kernel_size=3, relu=False)
conv0.weight.data[:, :3, :, :] = features[0].weight.data
conv0.weight.data[:, 3:, :, :] = features[0].weight.data
conv0.bias.data = features[0].bias.data
features[0] = conv0
self.features = nn.ModuleList(features)
self.classifier = vgg16.classifier[:-1]
if cfg.TRAIN.SLIM:
dim_fc = 256
self.classifier[0] = nn.Linear(512*7*7, 256)
self.classifier[3] = nn.Linear(256, 256)
else:
dim_fc = 4096
print(self.features)
print(self.classifier)
# freeze some layers
if cfg.TRAIN.FREEZE_LAYERS:
for i in [0, 2, 5, 7, 10, 12, 14]:
self.features[i].weight.requires_grad = False
self.features[i].bias.requires_grad = False
# semantic labeling branch
self.conv4_embed = conv(512, num_units, kernel_size=1)
self.conv5_embed = conv(512, num_units, kernel_size=1)
self.upsample_conv5_embed = upsample(2.0)
self.upsample_embed = upsample(8.0)
self.conv_score = conv(num_units, num_classes, kernel_size=1)
self.hard_label = HardLabel(threshold=cfg.TRAIN.HARD_LABEL_THRESHOLD, sample_percentage=cfg.TRAIN.HARD_LABEL_SAMPLING)
self.dropout = nn.Dropout()
if cfg.TRAIN.VERTEX_REG:
# center regression branch
self.conv4_vertex_embed = conv(512, 2*num_units, kernel_size=1, relu=False)
self.conv5_vertex_embed = conv(512, 2*num_units, kernel_size=1, relu=False)
self.upsample_conv5_vertex_embed = upsample(2.0)
self.upsample_vertex_embed = upsample(8.0)
self.conv_vertex_score = conv(2*num_units, 3*num_classes, kernel_size=1, relu=False)
# hough voting
self.hough_voting = HoughVoting(is_train=0, skip_pixels=10, label_threshold=100, \
inlier_threshold=0.9, voting_threshold=-1, per_threshold=0.01)
self.roi_pool_conv4 = RoIPool(pool_height=7, pool_width=7, spatial_scale=1.0 / 8.0)
self.roi_pool_conv5 = RoIPool(pool_height=7, pool_width=7, spatial_scale=1.0 / 16.0)
self.fc8 = fc(dim_fc, num_classes)
self.fc9 = fc(dim_fc, 4 * num_classes, relu=False)
if cfg.TRAIN.POSE_REG:
self.fc10 = fc(dim_fc, 4 * num_classes, relu=False)
self.pml = PMLoss(hard_angle=cfg.TRAIN.HARD_ANGLE)
for m in self.modules():
if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
kaiming_normal_(m.weight.data)
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
def forward(self, x, label_gt, meta_data, extents, gt_boxes, poses, points, symmetry):
# conv features
for i, model in enumerate(self.features):
x = model(x)
if i == 22:
out_conv4_3 = x
if i == 29:
out_conv5_3 = x
# semantic labeling branch
out_conv4_embed = self.conv4_embed(out_conv4_3)
out_conv5_embed = self.conv5_embed(out_conv5_3)
out_conv5_embed_up = self.upsample_conv5_embed(out_conv5_embed)
out_embed = self.dropout(out_conv4_embed + out_conv5_embed_up)
out_embed_up = self.upsample_embed(out_embed)
out_score = self.conv_score(out_embed_up)
out_logsoftmax = log_softmax_high_dimension(out_score)
out_prob = softmax_high_dimension(out_score)
out_label = torch.max(out_prob, dim=1)[1].type(torch.IntTensor).cuda()
out_weight = self.hard_label(out_prob, label_gt, torch.rand(out_prob.size()).cuda())
if cfg.TRAIN.VERTEX_REG:
# center regression branch
out_conv4_vertex_embed = self.conv4_vertex_embed(out_conv4_3)
out_conv5_vertex_embed = self.conv5_vertex_embed(out_conv5_3)
out_conv5_vertex_embed_up = self.upsample_conv5_vertex_embed(out_conv5_vertex_embed)
out_vertex_embed = self.dropout(out_conv4_vertex_embed + out_conv5_vertex_embed_up)
out_vertex_embed_up = self.upsample_vertex_embed(out_vertex_embed)
out_vertex = self.conv_vertex_score(out_vertex_embed_up)
# hough voting
if self.training:
self.hough_voting.is_train = 1
self.hough_voting.label_threshold = cfg.TRAIN.HOUGH_LABEL_THRESHOLD
self.hough_voting.voting_threshold = cfg.TRAIN.HOUGH_VOTING_THRESHOLD
self.hough_voting.skip_pixels = cfg.TRAIN.HOUGH_SKIP_PIXELS
self.hough_voting.inlier_threshold = cfg.TRAIN.HOUGH_INLIER_THRESHOLD
else:
self.hough_voting.is_train = 0
self.hough_voting.label_threshold = cfg.TEST.HOUGH_LABEL_THRESHOLD
self.hough_voting.voting_threshold = cfg.TEST.HOUGH_VOTING_THRESHOLD
self.hough_voting.skip_pixels = cfg.TEST.HOUGH_SKIP_PIXELS
self.hough_voting.inlier_threshold = cfg.TEST.HOUGH_INLIER_THRESHOLD
out_box, out_pose = self.hough_voting(out_label, out_vertex, meta_data, extents)
# bounding box classification and regression branch
bbox_labels, bbox_targets, bbox_inside_weights, bbox_outside_weights = roi_target_layer(out_box, gt_boxes)
out_roi_conv4 = self.roi_pool_conv4(out_conv4_3, out_box)
out_roi_conv5 = self.roi_pool_conv5(out_conv5_3, out_box)
out_roi = out_roi_conv4 + out_roi_conv5
out_roi_flatten = out_roi.view(out_roi.size(0), -1)
out_fc7 = self.classifier(out_roi_flatten)
out_fc8 = self.fc8(out_fc7)
out_logsoftmax_box = log_softmax_high_dimension(out_fc8)
bbox_prob = softmax_high_dimension(out_fc8)
bbox_label_weights = self.hard_label(bbox_prob, bbox_labels, torch.rand(bbox_prob.size()).cuda())
bbox_pred = self.fc9(out_fc7)
# rotation regression branch
rois, poses_target, poses_weight = pose_target_layer(out_box, bbox_prob, bbox_pred, gt_boxes, poses, self.training)
if cfg.TRAIN.POSE_REG:
out_qt_conv4 = self.roi_pool_conv4(out_conv4_3, rois)
out_qt_conv5 = self.roi_pool_conv5(out_conv5_3, rois)
out_qt = out_qt_conv4 + out_qt_conv5
out_qt_flatten = out_qt.view(out_qt.size(0), -1)
out_qt_fc7 = self.classifier(out_qt_flatten)
out_quaternion = self.fc10(out_qt_fc7)
# point matching loss
poses_pred = nn.functional.normalize(torch.mul(out_quaternion, poses_weight))
if self.training:
loss_pose = self.pml(poses_pred, poses_target, poses_weight, points, symmetry)
if self.training:
if cfg.TRAIN.VERTEX_REG:
if cfg.TRAIN.POSE_REG:
return out_logsoftmax, out_weight, out_vertex, out_logsoftmax_box, bbox_label_weights, \
bbox_pred, bbox_targets, bbox_inside_weights, loss_pose, poses_weight
else:
return out_logsoftmax, out_weight, out_vertex, out_logsoftmax_box, bbox_label_weights, \
bbox_pred, bbox_targets, bbox_inside_weights
else:
return out_logsoftmax, out_weight
else:
if cfg.TRAIN.VERTEX_REG:
if cfg.TRAIN.POSE_REG:
return out_label, out_vertex, rois, out_pose, out_quaternion
else:
return out_label, out_vertex, rois, out_pose
else:
return out_label
def weight_parameters(self):
return [param for name, param in self.named_parameters() if 'weight' in name]
def bias_parameters(self):
return [param for name, param in self.named_parameters() if 'bias' in name]
def posecnn(num_classes, num_units, data=None):
model = PoseCNN(num_classes, num_units)
if data is not None:
model_dict = model.state_dict()
print('model keys')
print('=================================================')
for k, v in model_dict.items():
print(k)
print('=================================================')
print('data keys')
print('=================================================')
for k, v in data.items():
print(k)
print('=================================================')
pretrained_dict = {k: v for k, v in data.items() if k in model_dict and v.size() == model_dict[k].size()}
print('load the following keys from the pretrained model')
print('=================================================')
for k, v in pretrained_dict.items():
print(k)
print('=================================================')
model_dict.update(pretrained_dict)
model.load_state_dict(model_dict)
return model
================================================
FILE: lib/networks/__init__.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
from .PoseCNN import *
================================================
FILE: lib/sdf/__init__.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
================================================
FILE: lib/sdf/_init_paths.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
"""Set up paths for Fast R-CNN."""
import os.path as osp
import sys
def add_path(path):
if path not in sys.path:
sys.path.insert(0, path)
this_dir = osp.dirname(__file__)
lib_path = osp.join(this_dir, '..')
add_path(lib_path)
================================================
FILE: lib/sdf/multi_sdf_optimizer.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
from sdf_utils import *
class multi_sdf_optimizer():
def __init__(self, sdf_file, lr=0.01, online_calib=True, use_gpu=False):
self.use_gpu = use_gpu
print(' start loading sdf ... ')
sdf_info = read_sdf(sdf_file)
sdf = sdf_info[0]
min_coords = sdf_info[1]
delta = sdf_info[2]
max_coords = min_coords + delta * np.array(sdf.shape)
self.xmin, self.ymin, self.zmin = min_coords
self.xmax, self.ymax, self.zmax = max_coords
self.sdf_torch = torch.from_numpy(sdf).float().permute(1, 0, 2).unsqueeze(0).unsqueeze(1)
if self.use_gpu:
self.sdf_torch = self.sdf_torch.cuda()
print(' sdf size = {}x{}x{}'.format(self.sdf_torch.size(2), self.sdf_torch.size(3), self.sdf_torch.size(4)))
print(' minimal coordinate = ({:.4f}, {:.4f}, {:.4f}) cm'.format(self.xmin * 100, self.ymin * 100, self.zmin * 100))
print(' maximal coordinate = ({:.4f}, {:.4f}, {:.4f}) cm'.format(self.xmax * 100, self.ymax * 100, self.zmax * 100))
print(' finished loading sdf ! ')
if use_gpu:
self.dpose = torch.tensor([0, 0, 0, 1e-12, 1e-12, 1e-12], dtype=torch.float32, requires_grad=True, device=0)
else:
self.dpose = torch.tensor([0, 0, 0, 1e-12, 1e-12, 1e-12], dtype=torch.float32, requires_grad=True)
self.online_calib = online_calib
if online_calib:
if use_gpu:
self.dpose_ext = torch.tensor([0, 0, 0, 1e-12, 1e-12, 1e-12], dtype=torch.float32, requires_grad=True,
device=0)
else:
self.dpose_ext = torch.tensor([0, 0, 0, 1e-12, 1e-12, 1e-12], dtype=torch.float32, requires_grad=True)
else:
self.dpose_ext = torch.tensor([0, 0, 0, 1e-12, 1e-12, 1e-12], dtype=torch.float32, requires_grad=False)
if use_gpu:
self.dpose_ext = self.dpose_ext.cuda()
if online_calib:
self.optimizer = optim.Adam([self.dpose,
self.dpose_ext],
lr=lr)
else:
self.optimizer = optim.Adam([self.dpose],
lr=lr)
self.loss = nn.L1Loss(reduction='sum')
self.loss_extrinsics = nn.L1Loss(reduction='mean')
if use_gpu:
self.loss = self.loss.cuda()
self.loss_ext_t = self.loss_ext_t.cuda()
self.loss_extrinsics = self.loss_extrinsics.cuda()
def look_up(self, samples_x, samples_y, samples_z):
samples_x = torch.clamp(samples_x, self.xmin, self.xmax)
samples_y = torch.clamp(samples_y, self.ymin, self.ymax)
samples_z = torch.clamp(samples_z, self.zmin, self.zmax)
samples_x = (samples_x - self.xmin) / (self.xmax - self.xmin)
samples_y = (samples_y - self.ymin) / (self.ymax - self.ymin)
samples_z = (samples_z - self.zmin) / (self.zmax - self.zmin)
samples = torch.cat((samples_z.unsqueeze(0).unsqueeze(2).unsqueeze(3).unsqueeze(4),
samples_x.unsqueeze(0).unsqueeze(2).unsqueeze(3).unsqueeze(4),
samples_y.unsqueeze(0).unsqueeze(2).unsqueeze(3).unsqueeze(4)),
dim=4)
samples = samples * 2 - 1
return F.grid_sample(self.sdf_torch, samples)
def compute_dist(self, d_pose, T_oc_0, ps_c):
ps_o = torch.mm(Oplus(T_oc_0, d_pose, self.use_gpu), ps_c.permute(1, 0)).permute(1, 0)[:, :3]
dist = self.look_up(ps_o[:, 0], ps_o[:, 1], ps_o[:, 2])
return dist
def compute_dist_multiview(self, d_pose, d_pose_ext, T_oc_0, T_rc_0, ps_c, T_r0rv):
# convert points to referece camera frame
T_rc = Oplus(T_rc_0, d_pose_ext, self.use_gpu)
T_cr = torch.inverse(T_rc)
T_c0cv = torch.matmul(T_cr.unsqueeze(0), torch.matmul(T_r0rv, T_rc.unsqueeze(0)))
ps_c0_all = []
for i in range(len(ps_c)):
ps_c0_all.append(torch.mm(T_c0cv[i], ps_c[i].permute(1, 0)).permute(1, 0).view(-1, 4))
ps_c0 = torch.cat(ps_c0_all, dim=0)
dist = self.compute_dist(d_pose, T_oc_0, ps_c0)
return dist
def refine_pose_singleview(self, T_co_0, ps_c, steps=100):
# input T_co_0: 4x4
# ps_c: nx4
if self.use_gpu:
T_oc_0 = torch.from_numpy(np.linalg.inv(T_co_0)).cuda()
else:
T_oc_0 = torch.from_numpy(np.linalg.inv(T_co_0))
self.dpose.data[:3] *= 0
self.dpose.data[3:] = self.dpose.data[3:] * 0 + 1e-12
for i in range(steps):
self.optimizer.zero_grad()
dist = self.compute_dist(self.dpose, T_oc_0, ps_c)
dist_target = torch.zeros_like(dist)
if self.use_gpu:
dist_target = dist_target.cuda()
loss = self.loss(dist, dist_target)
loss.backward()
self.optimizer.step()
# print('step: {}, loss = {}'.format(i + 1, loss.data.cpu().item()))
T_oc_opt = Oplus(T_oc_0, self.dpose, self.use_gpu)
T_co_opt = np.linalg.inv(T_oc_opt.cpu().detach().numpy())
dist = torch.mean(torch.abs(dist)).detach().cpu().numpy()
return T_co_opt, dist
def refine_pose_multiview(self, T_co_0, T_rc_0, ps_c, T_r0rv, steps=100):
# input T_co_0: 4x4 Relative pose of
# T_cr_0: 4x4 Extrinsics
# T_r0rv: vx4x4 Relative pose between robot frame at time 0 and time v
# ps_c: tuple vXn_ix4, point cloud in each frame
if self.use_gpu:
T_oc_0 = torch.from_numpy(np.linalg.inv(T_co_0)).cuda()
T_rc_0 = torch.from_numpy(T_rc_0).cuda()
T_r0rv = torch.from_numpy(T_r0rv).cuda()
else:
T_oc_0 = torch.from_numpy(np.linalg.inv(T_co_0))
T_rc_0 = torch.from_numpy(T_rc_0)
T_r0rv = torch.from_numpy(T_r0rv)
self.dpose.data[:3] *= 0
self.dpose.data[3:] = self.dpose.data[3:] * 0 + 1e-12
self.dpose_ext.data[:3] *= 0
self.dpose_ext.data[3:] = self.dpose.data[3:] * 0 + 1e-12
for i in range(steps):
self.optimizer.zero_grad()
dist = self.compute_dist_multiview(self.dpose, self.dpose_ext, T_oc_0, T_rc_0, ps_c, T_r0rv)
dist_target = torch.zeros_like(dist)
if self.use_gpu:
dist_target = dist_target.cuda()
loss = self.loss(dist, dist_target) + \
self.loss_extrinsics(self.dpose_ext[[0, 1, 2, 3, 5]],
torch.zeros_like(self.dpose_ext[[0, 1, 2, 3, 5]])) * 1e8
loss.backward()
self.optimizer.step()
# print('step: {}, loss = {}'.format(i + 1, loss.data.cpu().item()))
T_oc_opt = Oplus(T_oc_0, self.dpose, self.use_gpu)
T_co_opt = np.linalg.inv(T_oc_opt.cpu().detach().numpy())
T_rc_opt = Oplus(T_rc_0, self.dpose_ext, self.use_gpu)
T_rc_opt = T_rc_opt.cpu().detach().numpy()
dist = torch.mean(torch.abs(dist)).detach().cpu().numpy()
return T_co_opt, T_rc_opt, dist
================================================
FILE: lib/sdf/sdf_optimizer.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import sys
import cv2
import time
from .sdf_utils import *
import _init_paths
from fcn.config import cfg
from layers.sdf_matching_loss import SDFLoss
class sdf_optimizer():
def __init__(self, classes, sdf_files, lr=0.01, optimizer='Adam', use_gpu=True):
self.classes = classes
self.sdf_files = sdf_files
self.use_gpu = use_gpu
num = len(sdf_files)
self.xmins = np.zeros((num, ), dtype=np.float32)
self.ymins = np.zeros((num, ), dtype=np.float32)
self.zmins = np.zeros((num, ), dtype=np.float32)
self.xmaxs = np.zeros((num, ), dtype=np.float32)
self.ymaxs = np.zeros((num, ), dtype=np.float32)
self.zmaxs = np.zeros((num, ), dtype=np.float32)
sdf_torch_list = []
for i in range(len(sdf_files)):
sdf_file = sdf_files[i]
print(' start loading sdf from {} ... '.format(sdf_file))
if sdf_file[-3:] == 'sdf':
sdf_info = read_sdf(sdf_file)
sdf = sdf_info[0]
min_coords = sdf_info[1]
delta = sdf_info[2]
max_coords = min_coords + delta * np.array(sdf.shape)
self.xmins[i], self.ymins[i], self.zmins[i] = min_coords
self.xmaxs[i], self.ymaxs[i], self.zmaxs[i] = max_coords
sdf_torch_list.append(torch.from_numpy(sdf).float())
elif sdf_file[-3:] == 'pth':
sdf_info = torch.load(sdf_file)
min_coords = sdf_info['min_coords']
max_coords = sdf_info['max_coords']
self.xmins[i], self.ymins[i], self.zmins[i] = min_coords
self.xmaxs[i], self.ymaxs[i], self.zmaxs[i] = max_coords
sdf_torch_list.append(sdf_info['sdf_torch'][0, 0].permute(1, 0, 2))
print(' minimal coordinate = ({:.4f}, {:.4f}, {:.4f}) cm'.format(self.xmins[i] * 100, self.ymins[i] * 100, self.zmins[i] * 100))
print(' maximal coordinate = ({:.4f}, {:.4f}, {:.4f}) cm'.format(self.xmaxs[i] * 100, self.ymaxs[i] * 100, self.zmaxs[i] * 100))
print(sdf_torch_list[-1].shape)
print(' finished loading sdf ! ')
# combine sdfs
max_shape = np.array([sdf.shape for sdf in sdf_torch_list]).max(axis=0)
self.sdf_torch = torch.ones((num, max_shape[0], max_shape[1], max_shape[2]), dtype=torch.float32)
self.sdf_limits = np.zeros((num, 9), dtype=np.float32)
for i in range(num):
size = sdf_torch_list[i].shape
self.sdf_torch[i, :size[0], :size[1], :size[2]] = sdf_torch_list[i]
self.sdf_limits[i, 0] = self.xmins[i]
self.sdf_limits[i, 1] = self.ymins[i]
self.sdf_limits[i, 2] = self.zmins[i]
self.sdf_limits[i, 3] = self.xmins[i] + (self.xmaxs[i] - self.xmins[i]) * max_shape[0] / size[0]
self.sdf_limits[i, 4] = self.ymins[i] + (self.ymaxs[i] - self.ymins[i]) * max_shape[1] / size[1]
self.sdf_limits[i, 5] = self.zmins[i] + (self.zmaxs[i] - self.zmins[i]) * max_shape[2] / size[2]
self.sdf_limits[i, 6] = max_shape[0]
self.sdf_limits[i, 7] = max_shape[1]
self.sdf_limits[i, 8] = max_shape[2]
self.sdf_limits = torch.from_numpy(self.sdf_limits)
if self.use_gpu:
self.sdf_torch = self.sdf_torch.cuda()
self.sdf_limits = self.sdf_limits.cuda()
self.sdf_loss = SDFLoss()
def look_up(self, samples_x, samples_y, samples_z):
samples_x = torch.clamp(samples_x, self.xmin, self.xmax)
samples_y = torch.clamp(samples_y, self.ymin, self.ymax)
samples_z = torch.clamp(samples_z, self.zmin, self.zmax)
samples_x = (samples_x - self.xmin) / (self.xmax - self.xmin)
samples_y = (samples_y - self.ymin) / (self.ymax - self.ymin)
samples_z = (samples_z - self.zmin) / (self.zmax - self.zmin)
samples = torch.cat((samples_z.unsqueeze(0).unsqueeze(2).unsqueeze(3).unsqueeze(4),
samples_x.unsqueeze(0).unsqueeze(2).unsqueeze(3).unsqueeze(4),
samples_y.unsqueeze(0).unsqueeze(2).unsqueeze(3).unsqueeze(4)),
dim=4)
samples = samples * 2 - 1
return F.grid_sample(self.sdf_torch, samples, padding_mode="border")
def compute_dist(self, d_pose, T_oc_0, ps_c):
ps_o = torch.mm(Oplus(T_oc_0, d_pose, self.use_gpu), ps_c.permute(1, 0)).permute(1, 0)[:, :3]
dist = self.look_up(ps_o[:, 0], ps_o[:, 1], ps_o[:, 2])
return torch.abs(dist)
def refine_pose(self, T_co_0, ps_c, steps=100):
# input T_co_0: 4x4
# ps_c: nx4
if self.use_gpu:
T_oc_0 = torch.from_numpy(np.linalg.inv(T_co_0)).cuda()
else:
T_oc_0 = torch.from_numpy(np.linalg.inv(T_co_0))
self.dpose.data[:3] *= 0
self.dpose.data[3:] = self.dpose.data[3:] * 0 + 1e-12
self.dist = torch.zeros((ps_c.size(0),))
if self.use_gpu:
self.dist = self.dist.cuda()
for i in range(steps):
if self.optimizer_type == 'LBFGS':
def closure():
self.optimizer.zero_grad()
dist = self.compute_dist(self.dpose, T_oc_0, ps_c)
self.dist = dist.detach()
dist_target = torch.zeros_like(dist)
if self.use_gpu:
dist_target = dist_target.cuda()
loss = self.loss(dist, dist_target)
loss.backward()
return loss
self.optimizer.step(closure)
elif self.optimizer_type == 'Adam':
self.optimizer.zero_grad()
dist = self.compute_dist(self.dpose, T_oc_0, ps_c)
self.dist = dist.detach()
dist_target = torch.zeros_like(dist)
if self.use_gpu:
dist_target = dist_target.cuda()
loss = self.loss(dist, dist_target)
loss.backward()
self.optimizer.step()
# print('step: {}, loss = {}'.format(i + 1, loss.data.cpu().item()))
T_oc_opt = Oplus(T_oc_0, self.dpose, self.use_gpu)
T_co_opt = np.linalg.inv(T_oc_opt.cpu().detach().numpy())
dist = torch.mean(torch.abs(self.dist)).detach().cpu().numpy()
return T_co_opt, dist
def refine_pose_layer(self, T_oc_0, points, steps=100):
# input T_co_0: mx4x4, m is the number of objects
# points: nx3 in camera
# construct initial pose
pose_init = torch.from_numpy(T_oc_0).cuda()
m = T_oc_0.shape[0]
dpose = torch.zeros((m, 6), dtype=torch.float32, requires_grad=True, device=0)
dpose.data[:, :3] *= 0
dpose.data[:, 3:] = dpose.data[:, 3:] * 0 + 1e-12
treg = cfg.TEST.SDF_TRANSLATION_REG
rreg = cfg.TEST.SDF_ROTATION_REG
regularization = torch.tensor([treg, treg, treg, rreg, rreg, rreg], dtype=torch.float32, requires_grad=False, device=0)
start = time.time()
for i in range(steps):
# self.optimizer.zero_grad()
loss, sdf_values, T_oc_opt, dalpha, J = self.sdf_loss(dpose, pose_init, self.sdf_torch, self.sdf_limits, points, regularization)
# print(loss)
# loss.backward()
# self.optimizer.step()
# JTJ = JTJ.cpu().detach().numpy() + np.diag([100, 100, 100, 0.001, 0.001, 0.001]).astype(np.float32)
# J = J.cpu().detach().numpy()
# dalpha = torch.from_numpy(np.matmul(np.linalg.inv(JTJ), J)).cuda()
dpose = dpose - dalpha
# self.dpose = self.dpose - 0.001 * J
end = time.time()
print('sdf refinement iterations %d, time %f' % (steps, end - start))
return T_oc_opt.cpu().detach().numpy()
================================================
FILE: lib/sdf/sdf_utils.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import torch
import matplotlib.pyplot as plt
import numpy as np
import torch.optim as optim
import torch.nn as nn
import torch.nn.init as init
from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import
from transforms3d.quaternions import *
from transforms3d.axangles import *
import torch.nn.functional as F
import time
def read_sdf(sdf_file):
with open(sdf_file, "r") as file:
lines = file.readlines()
nx, ny, nz = map(int, lines[0].split(' '))
x0, y0, z0 = map(float, lines[1].split(' '))
delta = float(lines[2].strip())
data = np.zeros([nx, ny, nz])
for i, line in enumerate(lines[3:]):
idx = i % nx
idy = int(i / nx) % ny
idz = int(i / (nx * ny))
val = float(line.strip())
data[idx, idy, idz] = val
return (data, np.array([x0, y0, z0]), delta)
def skew(w, gpu=False):
if gpu:
wc = torch.stack((torch.tensor(0, dtype=torch.float32).cuda(), -w[2], w[1],
w[2], torch.tensor(0, dtype=torch.float32).cuda(), -w[0],
-w[1], w[0], torch.tensor(0, dtype=torch.float32).cuda()
)).view(3, 3)
else:
wc = torch.stack((torch.tensor(0, dtype=torch.float32), -w[2], w[1],
w[2], torch.tensor(0, dtype=torch.float32), -w[0],
-w[1], w[0], torch.tensor(0, dtype=torch.float32)
)).view(3, 3)
return wc
def Exp(dq, gpu):
if gpu:
I = torch.eye(3, dtype=torch.float32).cuda()
else:
I = torch.eye(3, dtype=torch.float32)
dphi = torch.norm(dq, p=2, dim=0)
if dphi > 0.05:
u = 1/dphi * dq
ux = skew(u, gpu)
if gpu:
dR = I + torch.sin(dphi) * ux + (torch.tensor(1, dtype=torch.float32).cuda() - torch.cos(dphi)) * torch.mm(ux, ux)
else:
dR = I + torch.sin(dphi) * ux + (torch.tensor(1, dtype=torch.float32) - torch.cos(dphi)) * torch.mm(ux, ux)
else:
dR = I + skew(dq, gpu)
return dR
def Oplus(T, v, gpu=False):
dR = Exp(v[3:], gpu)
dt = v[:3]
if gpu:
dT = torch.stack((dR[0, 0], dR[0, 1], dR[0, 2], dt[0],
dR[1, 0], dR[1, 1], dR[1, 2], dt[1],
dR[2, 0], dR[2, 1], dR[2, 2], dt[2],
torch.tensor(0, dtype=torch.float32).cuda(),
torch.tensor(0, dtype=torch.float32).cuda(),
torch.tensor(0, dtype=torch.float32).cuda(),
torch.tensor(1, dtype=torch.float32).cuda())).view(4, 4)
else:
dT = torch.stack((dR[0, 0], dR[0, 1], dR[0, 2], dt[0],
dR[1, 0], dR[1, 1], dR[1, 2], dt[1],
dR[2, 0], dR[2, 1], dR[2, 2], dt[2],
torch.tensor(0, dtype=torch.float32),
torch.tensor(0, dtype=torch.float32),
torch.tensor(0, dtype=torch.float32),
torch.tensor(1, dtype=torch.float32))).view(4, 4)
return torch.mm(T, dT)
================================================
FILE: lib/sdf/test_sdf_optimizer.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
from sdf_optimizer import *
from transforms3d.quaternions import mat2quat, quat2mat
import cv2
def Twc_np(pose):
Twc = np.zeros((4, 4), dtype=np.float32)
Twc[:3, :3] = quat2mat(pose[3:])
Twc[:3, 3] = pose[:3]
Twc[3, 3] = 1
return Twc
try:
import cPickle as pickle
except:
import pickle
import numpy as np
from itertools import izip_longest as izip
class SignedDensityField(object):
""" Data is stored in the following way
data[x, y, z]
"""
def __init__(self, data, origin, delta):
self.data = data
self.nx, self.ny, self.nz = data.shape
self.origin = origin
self.delta = delta
self.max_coords = self.origin + delta * np.array(data.shape)
def _rel_pos_to_idxes(self, rel_pos):
i_min = np.array([0, 0, 0], dtype=np.int)
i_max = np.array([self.nx - 1, self.ny - 1, self.nz - 1], dtype=np.int)
return np.clip(((rel_pos - self.origin) / self.delta).astype(int), i_min, i_max)
def get_distance(self, rel_pos):
idxes = self._rel_pos_to_idxes(rel_pos)
assert idxes.shape[0] == rel_pos.shape[0]
return self.data[idxes[:, 0], idxes[:, 1], idxes[:, 2]]
def dump(self, pkl_file):
data = {}
data['data'] = self.data
data['origin'] = self.origin
data['delta'] = self.delta
pickle.dump(data, open(pkl_file, "wb"), protocol=2)
def visualize(self, max_dist=0.1):
try:
from mayavi import mlab
except:
print("mayavi is not installed!")
figure = mlab.figure('Signed Density Field')
SCALE = 100 # The dimensions will be expressed in cm for better visualization.
data = np.copy(self.data)
data = np.minimum(max_dist, data)
xmin, ymin, zmin = SCALE * self.origin
xmax, ymax, zmax = SCALE * self.max_coords
delta = SCALE * self.delta
xi, yi, zi = np.mgrid[xmin:xmax:delta, ymin:ymax:delta, zmin:zmax:delta]
data[data <= 0] -= 0.2
data = -data
grid = mlab.pipeline.scalar_field(xi, yi, zi, data)
vmin = np.min(data)
vmax = np.max(data)
mlab.pipeline.volume(grid, vmin=vmin, vmax=(vmax + vmin) / 2)
mlab.axes()
mlab.show()
@classmethod
def from_sdf(cls, sdf_file):
with open(sdf_file, "r") as file:
axis = 2
lines = file.readlines()
nx, ny, nz = map(int, lines[0].split(' '))
print(nx, ny, nz)
x0, y0, z0 = map(float, lines[1].split(' '))
print(x0, y0, z0)
delta = float(lines[2].strip())
print(delta)
data = np.zeros([nx, ny, nz])
for i, line in enumerate(lines[3:]):
idx = i % nx
idy = int(i / nx) % ny
idz = int(i / (nx * ny))
val = float(line.strip())
data[idx, idy, idz] = val
return cls(data, np.array([x0, y0, z0]), delta)
@classmethod
def from_pkl(cls, pkl_file):
data = pickle.load(open(pkl_file, "r"))
return cls(data['data'], data['origin'], data['delta'])
if __name__ == '__main__':
# object_name = '002_master_chef_can'
# object_name = '037_scissors'
# object_name = '061_foam_brick'
object_name = '007_tuna_fish_can'
visualize_sdf = False
sdf_file = '../../data/YCB_Object/models/{}/textured_simple_low_res.pth'.format(object_name)
sdf_optim = sdf_optimizer(sdf_file, lr=0.01, use_gpu=True, optimizer='Adam')
print(torch.max(sdf_optim.sdf_torch))
print(torch.min(sdf_optim.sdf_torch))
if visualize_sdf:
sdf_show = SignedDensityField.from_sdf(sdf_file)
sdf_show.visualize()
# load points of the same object
point_file = '../../data/YCB_Object/models/{}/points.xyz'.format(object_name)
points = torch.from_numpy(np.loadtxt(point_file)).float()
points = torch.cat((points, torch.ones((points.size(0), 1), dtype=torch.float32)), dim=1)
points_np = points.numpy()
print(points_np.shape)
# set ground truth pose
pose_gt = np.zeros((7,), dtype=np.float32)
pose_gt[:3] = np.array([1, 1, 1], dtype=np.float32)
R = np.array([[-1, 0, 0],
[0, np.sqrt(0.5), -np.sqrt(0.5)],
[0, -np.sqrt(0.5), -np.sqrt(0.5)]], dtype=np.float32)
pose_gt[3:] = mat2quat(R)
# get measurements
Twc_gt = Twc_np(pose_gt)
points_c = np.matmul(np.linalg.inv(Twc_gt), np.transpose(points_np)).transpose()
points_c = torch.from_numpy(points_c)
index = np.random.permutation(np.arange(points_c.shape[0]))[:2000]
points_c = points_c[index, :]
print(points_c.shape)
T_co_init = np.linalg.inv(Twc_gt)
R_perturb = axangle2mat(np.random.rand(3,), 40 * np.random.rand() / 57.3, is_normalized=False)
T_co_init[:3, :3] = np.matmul(T_co_init[:3, :3], R_perturb)
T_co_init[:3, 3] += 0.01
points_init = np.matmul(np.linalg.inv(T_co_init), points_c.numpy().transpose()).transpose()
# optimization
points_input = points_c[:, :3].clone().cuda()
T_co_opt, sdf_values = sdf_optim.refine_pose_layer(T_co_init, points_input, steps=100)
print(T_co_opt)
print(np.linalg.inv(Twc_gt))
np.set_printoptions(threshold=sys.maxsize)
# print(sdf_values.detach().cpu().numpy())
# visualization for debugging
points_opt = np.matmul(np.linalg.inv(T_co_opt), points_c.numpy().transpose()).transpose()
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(points_np[::5, 0], points_np[::5, 1], points_np[::5, 2], color='green')
ax.scatter(points_init[::5, 0], points_init[::5, 1], points_init[::5, 2], color='red')
ax.scatter(points_opt[::5, 0], points_opt[::5, 1], points_opt[::5, 2], color='blue')
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')
min_coor = np.min(np.array([sdf_optim.xmin, sdf_optim.ymin, sdf_optim.zmin]))
max_coor = np.max(np.array([sdf_optim.xmax, sdf_optim.ymax, sdf_optim.zmax]))
ax.set_xlim(min_coor, max_coor)
ax.set_ylim(min_coor, max_coor)
ax.set_zlim(min_coor, max_coor)
plt.show()
================================================
FILE: lib/utils/__init__.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
================================================
FILE: lib/utils/bbox.pyx
================================================
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Sergey Karayev
# --------------------------------------------------------
cimport cython
import numpy as np
cimport numpy as np
DTYPE = np.float
ctypedef np.float_t DTYPE_t
def bbox_overlaps(
np.ndarray[DTYPE_t, ndim=2] boxes,
np.ndarray[DTYPE_t, ndim=2] query_boxes):
"""
Parameters
----------
boxes: (N, 5) ndarray of float (batch_id, x1, y1, x2, y2)
query_boxes: (K, 5) ndarray of float (batch_id, x1, y1, x2, y2)
Returns
-------
overlaps: (N, K) ndarray of overlap between boxes and query_boxes
"""
cdef unsigned int N = boxes.shape[0]
cdef unsigned int K = query_boxes.shape[0]
cdef np.ndarray[DTYPE_t, ndim=2] overlaps = np.zeros((N, K), dtype=DTYPE)
cdef DTYPE_t iw, ih, box_area
cdef DTYPE_t ua
cdef unsigned int k, n
for k in range(K):
box_area = (
(query_boxes[k, 2+1] - query_boxes[k, 0+1] + 1) *
(query_boxes[k, 3+1] - query_boxes[k, 1+1] + 1)
)
for n in range(N):
if query_boxes[k, 0] == boxes[n, 0]:
iw = (
min(boxes[n, 2+1], query_boxes[k, 2+1]) -
max(boxes[n, 0+1], query_boxes[k, 0+1]) + 1
)
if iw > 0:
ih = (
min(boxes[n, 3+1], query_boxes[k, 3+1]) -
max(boxes[n, 1+1], query_boxes[k, 1+1]) + 1
)
if ih > 0:
ua = float(
(boxes[n, 2+1] - boxes[n, 0+1] + 1) *
(boxes[n, 3+1] - boxes[n, 1+1] + 1) +
box_area - iw * ih
)
overlaps[n, k] = iw * ih / ua
return overlaps
================================================
FILE: lib/utils/bbox_transform.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
def bbox_transform(ex_rois, gt_rois):
ex_widths = ex_rois[:, 2] - ex_rois[:, 0] + 1.0
ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1.0
ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widths
ex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heights
gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1.0
gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.0
gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widths
gt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights
targets_dx = (gt_ctr_x - ex_ctr_x) / ex_widths
targets_dy = (gt_ctr_y - ex_ctr_y) / ex_heights
targets_dw = np.log(gt_widths / ex_widths)
targets_dh = np.log(gt_heights / ex_heights)
targets = np.vstack(
(targets_dx, targets_dy, targets_dw, targets_dh)).transpose()
return targets
def bbox_transform_inv(boxes, deltas):
if boxes.shape[0] == 0:
return np.zeros((0, deltas.shape[1]), dtype=deltas.dtype)
boxes = boxes.astype(deltas.dtype, copy=False)
widths = boxes[:, 2] - boxes[:, 0] + 1.0
heights = boxes[:, 3] - boxes[:, 1] + 1.0
ctr_x = boxes[:, 0] + 0.5 * widths
ctr_y = boxes[:, 1] + 0.5 * heights
dx = deltas[:, 0::4]
dy = deltas[:, 1::4]
dw = deltas[:, 2::4]
dh = deltas[:, 3::4]
pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
pred_w = np.exp(dw) * widths[:, np.newaxis]
pred_h = np.exp(dh) * heights[:, np.newaxis]
pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype)
# x1
pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
# y1
pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
# x2
pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w
# y2
pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h
return pred_boxes
def clip_boxes(boxes, im_shape):
"""
Clip boxes to image boundaries.
"""
# x1 >= 0
boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0)
# y1 >= 0
boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0)
# x2 < im_shape[1]
boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0)
# y2 < im_shape[0]
boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0)
return boxes
================================================
FILE: lib/utils/blob.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
"""Blob helper functions."""
import torch
import torch.nn as nn
import numpy as np
import cv2
import random
def im_list_to_blob(ims, num_channels):
"""Convert a list of images into a network input.
Assumes images are already prepared (means subtracted, BGR order, ...).
"""
max_shape = np.array([im.shape for im in ims]).max(axis=0)
num_images = len(ims)
blob = np.zeros((num_images, max_shape[0], max_shape[1], num_channels),
dtype=np.float32)
for i in xrange(num_images):
im = ims[i]
if num_channels == 1:
blob[i, 0:im.shape[0], 0:im.shape[1], :] = im[:,:,np.newaxis]
else:
blob[i, 0:im.shape[0], 0:im.shape[1], :] = im
return blob
def prep_im_for_blob(im, pixel_means, target_size, max_size):
"""Mean subtract and scale an image for use in a blob."""
im = im.astype(np.float32, copy=False)
im -= pixel_means
im_shape = im.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
im_scale = float(target_size) / float(im_size_min)
# Prevent the biggest axis from being more than MAX_SIZE
if np.round(im_scale * im_size_max) > max_size:
im_scale = float(max_size) / float(im_size_max)
im = cv2.resize(im, None, None, fx=im_scale, fy=im_scale,
interpolation=cv2.INTER_LINEAR)
return im, im_scale
def pad_im(im, factor, value=0):
height = im.shape[0]
width = im.shape[1]
pad_height = int(np.ceil(height / float(factor)) * factor - height)
pad_width = int(np.ceil(width / float(factor)) * factor - width)
if len(im.shape) == 3:
return np.lib.pad(im, ((0, pad_height), (0, pad_width), (0,0)), 'constant', constant_values=value)
elif len(im.shape) == 2:
return np.lib.pad(im, ((0, pad_height), (0, pad_width)), 'constant', constant_values=value)
def unpad_im(im, factor):
height = im.shape[0]
width = im.shape[1]
pad_height = int(np.ceil(height / float(factor)) * factor - height)
pad_width = int(np.ceil(width / float(factor)) * factor - width)
if len(im.shape) == 3:
return im[0:height-pad_height, 0:width-pad_width, :]
elif len(im.shape) == 2:
return im[0:height-pad_height, 0:width-pad_width]
def chromatic_transform(im, label=None, d_h=None, d_s=None, d_l=None):
"""
Given an image array, add the hue, saturation and luminosity to the image
"""
# Set random hue, luminosity and saturation which ranges from -0.1 to 0.1
if d_h is None:
d_h = (np.random.rand(1) - 0.5) * 0.1 * 180
if d_l is None:
d_l = (np.random.rand(1) - 0.5) * 0.2 * 256
if d_s is None:
d_s = (np.random.rand(1) - 0.5) * 0.2 * 256
# Convert the BGR to HLS
hls = cv2.cvtColor(im, cv2.COLOR_BGR2HLS)
h, l, s = cv2.split(hls)
# Add the values to the image H, L, S
new_h = (h + d_h) % 180
new_l = np.clip(l + d_l, 0, 255)
new_s = np.clip(s + d_s, 0, 255)
# Convert the HLS to BGR
new_hls = cv2.merge((new_h, new_l, new_s)).astype('uint8')
new_im = cv2.cvtColor(new_hls, cv2.COLOR_HLS2BGR)
if label is not None:
I = np.where(label > 0)
new_im[I[0], I[1], :] = im[I[0], I[1], :]
return new_im
def add_noise(image, level = 0.1):
# random number
r = np.random.rand(1)
# gaussian noise
if r < 0.9:
row,col,ch= image.shape
mean = 0
noise_level = random.uniform(0, level)
sigma = np.random.rand(1) * noise_level * 256
gauss = sigma * np.random.randn(row,col) + mean
gauss = np.repeat(gauss[:, :, np.newaxis], ch, axis=2)
noisy = image + gauss
noisy = np.clip(noisy, 0, 255)
else:
# motion blur
sizes = [3, 5, 7, 9, 11, 15]
size = sizes[int(np.random.randint(len(sizes), size=1))]
kernel_motion_blur = np.zeros((size, size))
if np.random.rand(1) < 0.5:
kernel_motion_blur[int((size-1)/2), :] = np.ones(size)
else:
kernel_motion_blur[:, int((size-1)/2)] = np.ones(size)
kernel_motion_blur = kernel_motion_blur / size
noisy = cv2.filter2D(image, -1, kernel_motion_blur)
return noisy.astype('uint8')
def add_noise_depth(image, level = 0.1):
row,col,ch= image.shape
noise_level = random.uniform(0, level)
gauss = noise_level * np.random.randn(row,col)
gauss = np.repeat(gauss[:, :, np.newaxis], ch, axis=2)
noisy = image + gauss
return noisy
def add_noise_depth_cuda(image, level = 0.1):
noise_level = random.uniform(0, level)
gauss = torch.randn_like(image) * noise_level
noisy = image + gauss
return noisy
def add_gaussian_noise_cuda(image, level = 0.1):
# gaussian noise
noise_level = random.uniform(0, level)
gauss = torch.randn_like(image) * noise_level
noisy = image + gauss
noisy = torch.clamp(noisy, 0, 1.0)
return noisy
def add_noise_cuda(image, level = 0.1):
# random number
r = np.random.rand(1)
# gaussian noise
if r < 0.8:
noise_level = random.uniform(0, level)
gauss = torch.randn_like(image) * noise_level
noisy = image + gauss
noisy = torch.clamp(noisy, 0, 1.0)
else:
# motion blur
sizes = [3, 5, 7, 9, 11, 15]
size = sizes[int(np.random.randint(len(sizes), size=1))]
kernel_motion_blur = torch.zeros((size, size))
if np.random.rand(1) < 0.5:
kernel_motion_blur[int((size-1)/2), :] = torch.ones(size)
else:
kernel_motion_blur[:, int((size-1)/2)] = torch.ones(size)
kernel_motion_blur = kernel_motion_blur.cuda() / size
kernel_motion_blur = kernel_motion_blur.view(1, 1, size, size)
kernel_motion_blur = kernel_motion_blur.repeat(image.size(2), 1, 1, 1)
motion_blur_filter = nn.Conv2d(in_channels=image.size(2),
out_channels=image.size(2),
kernel_size=size,
groups=image.size(2),
bias=False,
padding=int(size/2))
motion_blur_filter.weight.data = kernel_motion_blur
motion_blur_filter.weight.requires_grad = False
noisy = motion_blur_filter(image.permute(2, 0, 1).unsqueeze(0))
noisy = noisy.squeeze(0).permute(1, 2, 0)
return noisy
================================================
FILE: lib/utils/nms.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import numpy as np
def nms(dets, thresh):
cls = dets[:, 1]
x1 = dets[:, 2]
y1 = dets[:, 3]
x2 = dets[:, 4]
y2 = dets[:, 5]
scores = dets[:, 6]
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
order = scores.argsort()[::-1]
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (areas[i] + areas[order[1:]] - inter)
inds = np.where(~((ovr > thresh) & (cls[order[1:]] == cls[i])))[0]
#inds = np.where(ovr <= thresh)[0]
order = order[inds + 1]
return keep
================================================
FILE: lib/utils/pose_error.py
================================================
# Author: Tomas Hodan (hodantom@cmp.felk.cvut.cz)
# Center for Machine Perception, Czech Technical University in Prague
# Implementation of the pose error functions described in:
# Hodan et al., "On Evaluation of 6D Object Pose Estimation", ECCVW 2016
import math
import numpy as np
from scipy import spatial
from transforms3d.quaternions import quat2mat, mat2quat
def VOCap(rec, prec):
index = np.where(np.isfinite(rec))[0]
rec = rec[index]
prec = prec[index]
if len(rec) == 0 or len(prec) == 0:
ap = 0
else:
mrec = np.insert(rec, 0, 0)
mrec = np.append(mrec, 0.1)
mpre = np.insert(prec, 0, 0)
mpre = np.append(mpre, prec[-1])
for i in range(1, len(mpre)):
mpre[i] = max(mpre[i], mpre[i-1])
i = np.where(mrec[1:] != mrec[:-1])[0] + 1
ap = np.sum(np.multiply(mrec[i] - mrec[i-1], mpre[i])) * 10
return ap
def transform_pts_Rt(pts, R, t):
"""
Applies a rigid transformation to 3D points.
:param pts: nx3 ndarray with 3D points.
:param R: 3x3 rotation matrix.
:param t: 3x1 translation vector.
:return: nx3 ndarray with transformed 3D points.
"""
assert(pts.shape[1] == 3)
pts_t = R.dot(pts.T) + t.reshape((3, 1))
return pts_t.T
def reproj(K, R_est, t_est, R_gt, t_gt, pts):
"""
reprojection error.
:param K intrinsic matrix
:param R_est, t_est: Estimated pose (3x3 rot. matrix and 3x1 trans. vector).
:param R_gt, t_gt: GT pose (3x3 rot. matrix and 3x1 trans. vector).
:param model: Object model given by a dictionary where item 'pts'
is nx3 ndarray with 3D model points.
:return: Error of pose_est w.r.t. pose_gt.
"""
pts_est = transform_pts_Rt(pts, R_est, t_est)
pts_gt = transform_pts_Rt(pts, R_gt, t_gt)
pixels_est = K.dot(pts_est.T)
pixels_est = pixels_est.T
pixels_gt = K.dot(pts_gt.T)
pixels_gt = pixels_gt.T
n = pts.shape[0]
est = np.zeros((n, 2), dtype=np.float32);
est[:, 0] = np.divide(pixels_est[:, 0], pixels_est[:, 2])
est[:, 1] = np.divide(pixels_est[:, 1], pixels_est[:, 2])
gt = np.zeros((n, 2), dtype=np.float32);
gt[:, 0] = np.divide(pixels_gt[:, 0], pixels_gt[:, 2])
gt[:, 1] = np.divide(pixels_gt[:, 1], pixels_gt[:, 2])
e = np.linalg.norm(est - gt, axis=1).mean()
return e
def add(R_est, t_est, R_gt, t_gt, pts):
"""
Average Distance of Model Points for objects with no indistinguishable views
- by Hinterstoisser et al. (ACCV 2012).
:param R_est, t_est: Estimated pose (3x3 rot. matrix and 3x1 trans. vector).
:param R_gt, t_gt: GT pose (3x3 rot. matrix and 3x1 trans. vector).
:param model: Object model given by a dictionary where item 'pts'
is nx3 ndarray with 3D model points.
:return: Error of pose_est w.r.t. pose_gt.
"""
pts_est = transform_pts_Rt(pts, R_est, t_est)
pts_gt = transform_pts_Rt(pts, R_gt, t_gt)
e = np.linalg.norm(pts_est - pts_gt, axis=1).mean()
return e
def adi(R_est, t_est, R_gt, t_gt, pts):
"""
Average Distance of Model Points for objects with indistinguishable views
- by Hinterstoisser et al. (ACCV 2012).
:param R_est, t_est: Estimated pose (3x3 rot. matrix and 3x1 trans. vector).
:param R_gt, t_gt: GT pose (3x3 rot. matrix and 3x1 trans. vector).
:param model: Object model given by a dictionary where item 'pts'
is nx3 ndarray with 3D model points.
:return: Error of pose_est w.r.t. pose_gt.
"""
pts_est = transform_pts_Rt(pts, R_est, t_est)
pts_gt = transform_pts_Rt(pts, R_gt, t_gt)
# Calculate distances to the nearest neighbors from pts_gt to pts_est
nn_index = spatial.cKDTree(pts_est)
nn_dists, _ = nn_index.query(pts_gt, k=1)
e = nn_dists.mean()
return e
def re(R_est, R_gt):
"""
Rotational Error.
:param R_est: Rotational element of the estimated pose (3x1 vector).
:param R_gt: Rotational element of the ground truth pose (3x1 vector).
:return: Error of t_est w.r.t. t_gt.
"""
assert(R_est.shape == R_gt.shape == (3, 3))
error_cos = 0.5 * (np.trace(R_est.dot(np.linalg.inv(R_gt))) - 1.0)
error_cos = min(1.0, max(-1.0, error_cos)) # Avoid invalid values due to numerical errors
error = math.acos(error_cos)
error = 180.0 * error / np.pi # [rad] -> [deg]
return error
def te(t_est, t_gt):
"""
Translational Error.
:param t_est: Translation element of the estimated pose (3x1 vector).
:param t_gt: Translation element of the ground truth pose (3x1 vector).
:return: Error of t_est w.r.t. t_gt.
"""
assert(t_est.size == t_gt.size == 3)
error = np.linalg.norm(t_gt - t_est)
return error
================================================
FILE: lib/utils/se3.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import numpy as np
from transforms3d.quaternions import mat2quat, quat2mat, qmult, qinverse
from transforms3d.euler import quat2euler, mat2euler, euler2quat
# RT is a 3x4 matrix
def se3_inverse(RT):
R = RT[0:3, 0:3]
T = RT[0:3, 3].reshape((3,1))
RT_new = np.zeros((3, 4), dtype=np.float32)
RT_new[0:3, 0:3] = R.transpose()
RT_new[0:3, 3] = -1 * np.dot(R.transpose(), T).reshape((3))
return RT_new
def se3_mul(RT1, RT2):
R1 = RT1[0:3, 0:3]
T1 = RT1[0:3, 3].reshape((3,1))
R2 = RT2[0:3, 0:3]
T2 = RT2[0:3, 3].reshape((3,1))
RT_new = np.zeros((3, 4), dtype=np.float32)
RT_new[0:3, 0:3] = np.dot(R1, R2)
T_new = np.dot(R1, T2) + T1
RT_new[0:3, 3] = T_new.reshape((3))
return RT_new
def egocentric2allocentric(qt, T):
dx = np.arctan2(T[0], -T[2])
dy = np.arctan2(T[1], -T[2])
quat = euler2quat(-dy, -dx, 0, axes='sxyz')
quat = qmult(qinverse(quat), qt)
return quat
def allocentric2egocentric(qt, T):
dx = np.arctan2(T[0], -T[2])
dy = np.arctan2(T[1], -T[2])
quat = euler2quat(-dy, -dx, 0, axes='sxyz')
quat = qmult(quat, qt)
return quat
def T_inv_transform(T_src, T_tgt):
'''
:param T_src:
:param T_tgt:
:return: T_delta: delta in pixel
'''
T_delta = np.zeros((3, ), dtype=np.float32)
T_delta[0] = T_tgt[0] / T_tgt[2] - T_src[0] / T_src[2]
T_delta[1] = T_tgt[1] / T_tgt[2] - T_src[1] / T_src[2]
T_delta[2] = np.log(T_src[2] / T_tgt[2])
return T_delta
def rotation_x(theta):
t = theta * np.pi / 180.0
R = np.zeros((3, 3), dtype=np.float32)
R[0, 0] = 1
R[1, 1] = np.cos(t)
R[1, 2] = -np.sin(t)
R[2, 1] = np.sin(t)
R[2, 2] = np.cos(t)
return R
def rotation_y(theta):
t = theta * np.pi / 180.0
R = np.zeros((3, 3), dtype=np.float32)
R[0, 0] = np.cos(t)
R[0, 2] = np.sin(t)
R[1, 1] = 1
R[2, 0] = -np.sin(t)
R[2, 2] = np.cos(t)
return R
def rotation_z(theta):
t = theta * np.pi / 180.0
R = np.zeros((3, 3), dtype=np.float32)
R[0, 0] = np.cos(t)
R[0, 1] = -np.sin(t)
R[1, 0] = np.sin(t)
R[1, 1] = np.cos(t)
R[2, 2] = 1
return R
================================================
FILE: lib/utils/segmentation_evaluation.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import sys, os
import numpy as np
import cv2
# This function is modeled off of P/R/F measure as described by Dave et al. (arXiv19)
def multilabel_metrics(prediction, gt, num_classes):
""" Computes F-Measure, Precision, Recall, IoU, #objects detected, #confident objects detected, #GT objects.
It computes these measures only of objects, not background (0)/table (1).
Uses the Hungarian algorithm to match predicted masks with ground truth masks.
A "confident object" is an object that is predicted with more than 0.75 F-measure
@param gt: a [H x W] numpy.ndarray with ground truth masks
@param prediction: a [H x W] numpy.ndarray with predicted masks
@return: a dictionary with the metrics
"""
precisions = np.zeros((num_classes, ), dtype=np.float32)
recalls = np.zeros((num_classes, ), dtype=np.float32)
f1s = np.zeros((num_classes, ), dtype=np.float32)
count = np.zeros((num_classes, ), dtype=np.float32)
# for each class
for cls in range(num_classes):
gt_mask = (gt == cls)
pred_mask = (prediction == cls)
A = np.logical_and(pred_mask, gt_mask)
count_true = np.count_nonzero(A)
count_pred = np.count_nonzero(pred_mask)
count_gt = np.count_nonzero(gt_mask)
# precision
if count_pred > 0:
precisions[cls] = float(count_true) / float(count_pred)
# recall
if count_gt > 0:
recalls[cls] = float(count_true) / float(count_gt)
count[cls] = 1
# F-measure
if precisions[cls] + recalls[cls] != 0:
f1s[cls] = (2 * precisions[cls] * recalls[cls]) / (precisions[cls] + recalls[cls])
return {'F-measure' : f1s,
'Precision' : precisions,
'Recall' : recalls,
'Count': count}
================================================
FILE: lib/utils/setup.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
import os
from os.path import join as pjoin
import numpy as np
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
def find_in_path(name, path):
"Find a file in a search path"
#adapted fom http://code.activestate.com/recipes/52224-find-a-file-given-a-search-path/
for dir in path.split(os.pathsep):
binpath = pjoin(dir, name)
if os.path.exists(binpath):
return os.path.abspath(binpath)
return None
def locate_cuda():
"""Locate the CUDA environment on the system
Returns a dict with keys 'home', 'nvcc', 'include', and 'lib64'
and values giving the absolute path to each directory.
Starts by looking for the CUDAHOME env variable. If not found, everything
is based on finding 'nvcc' in the PATH.
"""
# first check if the CUDAHOME env variable is in use
if 'CUDAHOME' in os.environ:
home = os.environ['CUDAHOME']
nvcc = pjoin(home, 'bin', 'nvcc')
else:
# otherwise, search the PATH for NVCC
default_path = pjoin(os.sep, 'usr', 'local', 'cuda', 'bin')
nvcc = find_in_path('nvcc', os.environ['PATH'] + os.pathsep + default_path)
if nvcc is None:
raise EnvironmentError('The nvcc binary could not be '
'located in your $PATH. Either add it to your path, or set $CUDAHOME')
home = os.path.dirname(os.path.dirname(nvcc))
cudaconfig = {'home':home, 'nvcc':nvcc,
'include': pjoin(home, 'include'),
'lib64': pjoin(home, 'lib64')}
for k, v in cudaconfig.items():
if not os.path.exists(v):
raise EnvironmentError('The CUDA %s path could not be located in %s' % (k, v))
return cudaconfig
CUDA = locate_cuda()
# Obtain the numpy include directory. This logic works across numpy versions.
try:
numpy_include = np.get_include()
except AttributeError:
numpy_include = np.get_numpy_include()
def customize_compiler_for_nvcc(self):
"""inject deep into distutils to customize how the dispatch
to gcc/nvcc works.
If you subclass UnixCCompiler, it's not trivial to get your subclass
injected in, and still have the right customizations (i.e.
distutils.sysconfig.customize_compiler) run on it. So instead of going
the OO route, I have this. Note, it's kindof like a wierd functional
subclassing going on."""
# tell the compiler it can processes .cu
self.src_extensions.append('.cu')
# save references to the default compiler_so and _comple methods
default_compiler_so = self.compiler_so
super = self._compile
# now redefine the _compile method. This gets executed for each
# object but distutils doesn't have the ability to change compilers
# based on source extension: we add it.
def _compile(obj, src, ext, cc_args, extra_postargs, pp_opts):
if os.path.splitext(src)[1] == '.cu':
# use the cuda for .cu files
self.set_executable('compiler_so', CUDA['nvcc'])
# use only a subset of the extra_postargs, which are 1-1 translated
# from the extra_compile_args in the Extension class
postargs = extra_postargs['nvcc']
else:
postargs = extra_postargs['gcc']
super(obj, src, ext, cc_args, postargs, pp_opts)
# reset the default compiler_so, which we might have changed for cuda
self.compiler_so = default_compiler_so
# inject our redefined _compile method into the class
self._compile = _compile
# run the customize_compiler
class custom_build_ext(build_ext):
def build_extensions(self):
customize_compiler_for_nvcc(self.compiler)
build_ext.build_extensions(self)
ext_modules = [
Extension(
"cython_bbox",
["bbox.pyx"],
extra_compile_args={'gcc': ["-Wno-cpp", "-Wno-unused-function"]},
include_dirs = [numpy_include]
)
]
setup(
name='fcn',
ext_modules=ext_modules,
# inject our custom trigger
cmdclass={'build_ext': custom_build_ext},
)
================================================
FILE: requirement.txt
================================================
pyassimp == 4.1.3
progressbar2
pyopengl >= 3.1.0
opencv-python == 4.2.0.34
transforms3d
pillow
IPython
matplotlib
easydict
pyyaml
future
scipy
Cython
================================================
FILE: ros/_init_paths.py
================================================
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
"""Set up paths for Fast R-CNN."""
import os.path as osp
import sys
def add_path(path):
if path not in sys.path:
sys.path.insert(0, path)
this_dir = osp.dirname(__file__)
# Add lib to PYTHONPATH
lib_path = osp.join(this_dir, '..', 'lib')
add_path(lib_path)
lib_path = osp.join(this_dir, '..', 'ycb_render')
add_path(lib_path)
================================================
FILE: ros/collect_images_realsense.py
================================================
#!/usr/bin/env python
# Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
# This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
# text can be found in LICENSE.md
"""collect images from Intel RealSense D435"""
import rospy
import message_filters
import cv2
import argparse
import pprint
import time, os, sys
import os.path as osp
import numpy as np
import yaml
from cv_bridge import CvBridge, CvBridgeError
from sensor_msgs.msg import Image, CameraInfo
class ImageListener:
def __init__(self):
self.cv_bridge = CvBridge()
self.count = 0
# output dir
this_dir = osp.dirname(__file__)
self.outdir = osp.join(this_dir, '..', 'data', 'images')
if not os.path.exists(self.outdir):
os.mkdir(self.outdir)
# initialize a node
rospy.init_node("image_listener")
rgb_sub = message_filters.Subscriber('/camera/color/image_raw', Image, queue_size=2)
depth_sub = message_filters.Subscriber('/camera/aligned_depth_to_color/image_raw', Image, queue_size=2)
msg = rospy.wait_for_message('/camera/color/camera_info', CameraInfo)
# save camera intrinsics
intrinsic_matrix = np.array(msg.K).reshape(3, 3)
print(intrinsic_matrix)
dict_file = {'INTRINSICS' : intrinsic_matrix.flatten().tolist()}
filename = os.path.join(self.outdir, 'meta.yml')
with open(filename, 'w') as file:
yaml.dump(dict_file, file)
queue_size = 1
slop_seconds = 0.025
ts = message_filters.ApproximateTimeSynchronizer([rgb_sub, depth_sub], queue_size, slop_seconds)
ts.registerCallback(self.callback)
def callback(self, rgb, depth):
if depth.encoding == '32FC1':
depth_32 = self.cv_bridge.imgmsg_to_cv2(depth) * 1000
depth_cv = np.array(depth_32, dtype=np.uint16)
elif depth.encoding == '16UC1':
depth_cv = self.cv_bridge.imgmsg_to_cv2(depth)
else:
rospy.logerr_throttle(
1, 'Unsupported depth type. Expected 16UC1 or 32FC1, got {}'.format(
depth.encoding))
return
# write images
im = self.cv_bridge.imgmsg_to_cv2(rgb, 'bgr8')
filename = self.outdir + '/%06d-color.png' % self.count
cv2.imwrite(filename, im)
print filename
filename = self.outdir + '/%06d-depth.png' % self.count
cv2.imwrite(filename, depth_cv)
print(filename)
self.count += 1
if __name__ == '__main__':
# image listener
listener = ImageListener()
try:
rospy.spin()
except KeyboardInterrupt:
print "Shutting down"
================================================
FILE: ros/posecnn.rviz
================================================
Panels:
- Class: rviz/Displays
Help Height: 0
Name: Displays
Property Tree Widget:
Expanded:
- /Global Options1
- /Status1
- /DepthCloud1
- /DepthCloud1/Auto Size1
- /rgb_input1
- /depth_input1
- /posecnn detection1
- /posecnn1
- /TF1
- /TF1/Frames1
- /DepthCloud2
- /DepthCloud2/Auto Size1
- /posecnn+sdf1
- /deepim1
Splitter Ratio: 0.829787254
Tree Height: 540
- Class: rviz/Selection
Name: Selection
- Class: rviz/Tool Properties
Expanded:
- /2D Pose Estimate1
- /2D Nav Goal1
- /Publish Point1
Name: Tool Properties
Splitter Ratio: 0.588679016
- Class: rviz/Views
Expanded:
- /Current View1
Name: Views
Splitter Ratio: 0.5
- Class: rviz/Time
Experimental: false
Name: Time
SyncMode: 0
SyncSource: rgb_input
Toolbars:
toolButtonStyle: 2
Visualization Manager:
Class: ""
Displays:
- Alpha: 0.5
Cell Size: 1
Class: rviz/Grid
Color: 160; 160; 164
Enabled: true
Line Style:
Line Width: 0.0299999993
Value: Lines
Name: Grid
Normal Cell Count: 0
Offset:
X: 0
Y: 0
Z: 0
Plane: XY
Plane Cell Count: 10
Reference Frame: