Showing preview only (681K chars total). Download the full file or copy to clipboard to get everything.
Repository: B1ueber2y/TrianFlow
Branch: master
Commit: af3f2bfc745e
Files: 54
Total size: 655.9 KB
Directory structure:
gitextract_5bwpkhvi/
├── LICENSE
├── README.md
├── config/
│ ├── kitti.yaml
│ ├── kitti_3stage.yaml
│ ├── nyu.yaml
│ ├── nyu_192.yaml
│ ├── nyu_192_3stage.yaml
│ ├── nyu_3stage.yaml
│ ├── nyu_posenet_192.yaml
│ ├── odo.yaml
│ ├── odo_3stage.yaml
│ └── odo_posenet.yaml
├── core/
│ ├── config/
│ │ ├── __init__.py
│ │ └── config_utils.py
│ ├── dataset/
│ │ ├── __init__.py
│ │ ├── kitti_2012.py
│ │ ├── kitti_2015.py
│ │ ├── kitti_odo.py
│ │ ├── kitti_prepared.py
│ │ ├── kitti_raw.py
│ │ └── nyu_v2.py
│ ├── evaluation/
│ │ ├── __init__.py
│ │ ├── eval_odom.py
│ │ ├── evaluate_depth.py
│ │ ├── evaluate_flow.py
│ │ ├── evaluate_mask.py
│ │ ├── evaluation_utils.py
│ │ └── flowlib.py
│ ├── networks/
│ │ ├── __init__.py
│ │ ├── model_depth_pose.py
│ │ ├── model_flow.py
│ │ ├── model_flowposenet.py
│ │ ├── model_triangulate_pose.py
│ │ ├── pytorch_ssim/
│ │ │ ├── __init__.py
│ │ │ └── ssim.py
│ │ └── structures/
│ │ ├── __init__.py
│ │ ├── depth_model.py
│ │ ├── feature_pyramid.py
│ │ ├── flowposenet.py
│ │ ├── inverse_warp.py
│ │ ├── net_utils.py
│ │ ├── pwc_tf.py
│ │ └── ransac.py
│ └── visualize/
│ ├── __init__.py
│ ├── profiler.py
│ └── visualizer.py
├── data/
│ └── eigen/
│ ├── export_gt_depth.py
│ ├── static_frames.txt
│ ├── test_files.txt
│ └── test_scenes.txt
├── infer_vo.py
├── requirements.txt
├── test.py
└── train.py
================================================
FILE CONTENTS
================================================
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2021 Shaohui Liu
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
## Towards Better Generalization: Joint Depth-Pose Learning without PoseNet
Created by <a href="https://github.com/thuzhaowang" target="_blank">Wang Zhao</a>, <a href="http://b1ueber2y.me" target="_blank">Shaohui Liu</a>, Yezhi Shu, <a href="https://cg.cs.tsinghua.edu.cn/people/~Yongjin/Yongjin.htm" target="_blank">Yong-Jin Liu</a>.
## Introduction
This implementation is based on our CVPR'2020 paper "Towards Better Generalization: Joint Depth-Pose Learning without PoseNet". You can find the arXiv version of the paper <a href="https://arxiv.org/abs/2004.01314">here</a>. In this repository we release code and pre-trained models for *TrianFlow* (our method) and a strong baseline *PoseNet-Flow* method.

## Installation
The code is based on Python3.6. You could use either virtualenv or conda to setup a specified environment. And then run:
```bash
pip install -r requirements.txt
```
## Run a demo
To run a depth prediction demo, you may need to first download the pretrained model from <a href="https://drive.google.com/drive/folders/1rPXlK9bJpjU0OQH5leDCvyb0FcL5jlUk?usp=sharing">here</a>.
```bash
python test.py --config_file ./config/default_1scale.yaml --gpu 0 --mode depth --task demo --image_path ./data/demo/kitti.png --pretrained_model ./models/pretrained/depth_pretrained.pth --result_dir ./data/demo
```
This will give you a predicted depth map for demo image.


## Run experiments
### Prepare training data:
1. For KITTI depth and flow tasks, download KITTI raw dataset using the <a href="http://www.cvlibs.net/download.php?file=raw_data_downloader.zip">script</a> provided on the official website. You also need to download <a href="http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=flow">KITTI 2015 dataset</a> to evaluate the predicted optical flow. Run the following commands to generate groundtruth files for eigen test images.
```
cd ./data/eigen
python export_gt_depth.py --data_path /path/to/your/kitti/root
```
2. For KITTI Odometry task, download <a href="http://www.cvlibs.net/datasets/kitti/eval_odometry.php">KITTI Odometry dataset</a>.
3. For NYUv2 experiments, download <a href="https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html">NYUv2</a> raw sequences and labeled data mat, also the traing test split mat from <a href="https://github.com/ankurhanda/nyuv2-meta-data">here</a>. Put the labeled data and splits file under the same directory. The data structure should be:
```
nyuv2
| basements
| cafe
| ...
nyuv2_test
| nyu_depth_v2_labeled.mat
| splits.mat
```
### Training:
1. Modify the configuration file in the ./config directory to set up your path. The config file contains the important paths and default hyper-parameters used in the training process.
2. For KITTI depth, we have the three-stage training schedule.
```bash
1. python train.py --config_file ./config/kitti.yaml --gpu [gpu_id] --mode flow --prepared_save_dir [name_of_your_prepared_dataset] --model_dir [your/directory/to/save/training/models]
2. python train.py --config_file ./config/kitti.yaml --gpu [gpu_id] --mode depth --prepared_save_dir [name_of_your_prepared_dataset] --model_dir [your/directory/to/save/training/models] --flow_pretrained_model [path/to/your/stage1_flow_model]
3. python train.py --config_file ./config/kitti_3stage.yaml --gpu [gpu_id] --mode depth_pose --prepared_save_dir [name_of_your_prepared_dataset] --model_dir [your/directory/to/save/training/models] --depth_pretrained_model [path/to/your/stage2_depth_model]
```
If you are running experiments on the dataset for the first time, it would first process data and save in the [prepared_base_dir] path defined in your config file.
For other datasets like KITTI Odometry and NYUv2 dataset, you could run with the same commands using the appropriate config file.
We also implement and release codes for the strong baseline *PoseNet-Flow* method, you could run it by two-stage training:
```bash
1. python train.py --config_file [path/to/your/config/file] --gpu [gpu_id] --mode flow --prepared_save_dir [name_of_your_prepared_dataset] --model_dir [your/directory/to/save/training/models]
2. python train.py --config_file [path/to/your/config/file] --gpu [gpu_id] --mode flowposenet --prepared_save_dir [name_of_your_prepared_dataset] --model_dir [your/directory/to/save/training/models] --flow_pretrained_model [path/to/your/stage1_flow_model]
```
### Evaluation:
We provide pretrained models <a href="https://drive.google.com/drive/folders/1rPXlK9bJpjU0OQH5leDCvyb0FcL5jlUk?usp=sharing">here</a> for different tasks. The performance could be slightly different with the paper due to randomness.
1. To evaluate the monocular depth estimation on kitti eigen test split, run:
```bash
python test.py --config_file ./config/kitti.yaml --gpu [gpu_id] --mode depth --task kitti_depth --pretrained_model [path/to/your/model] --result_dir [path/to/save/results]
```
2. To evaluate the monocular depth estimation on nyuv2 test split, run:
```bash
python test.py --config_file ./config/nyu.yaml --gpu [gpu_id] --mode depth --task nyuv2 --pretrained_model [path/to/your/model] --result_dir [path/to/save/results]
```
3. To evaluate the optical flow estimation on KITTI 2015, run:
```bash
python test.py --config_file ./config/kitti.yaml --gpu [gpu_id] --mode flow_3stage --task kitti_flow --pretrained_model [path/to/your/model] --result_dir [path/to/save/results]
```
4. To evaluate the visual odometry task on KITTI Odometry dataset, first get predictions on a single sequence and then evaluate:
```bash
python infer_vo.py --config_file ./config/odo.yaml --gpu [gpu_id] --traj_save_dir_txt [where/to/save/the/prediction/file] --sequences_root_dir [the/root/dir/of/your/image/sequences] --sequence [the sequence id] ----pretrained_model [path/to/your/model]
python ./core/evaluation/eval_odom.py --gt_txt [path/to/your/groundtruth/poses/txt] --result_txt [path/to/your/prediction/txt] --seq [sequence id to evaluate]
```
You could evaluate on the sampled KITTI odometry dataset by simply sampling the raw image sequences and gt-pose txt. Then run *infer_vo.py* on the sampled image sequence and *eval_odom.py* with predicted txt and sampled gt txt to get results.
### Citation
If you find our work useful in your research, please consider citing:
```
@inproceedings{zhao2020towards,
title={Towards Better Generalization: Joint Depth-Pose Learning without PoseNet},
author={Zhao, Wang and Liu, Shaohui and Shu, Yezhi and Liu, Yong-Jin},
booktitle={Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2020}
}
```
### Related Projects
<a href="https://github.com/nianticlabs/monodepth2">Digging into Self-Supervised Monocular Depth Prediction.</a>
<a href="https://github.com/anuragranj/cc">Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation.</a>
<a href="https://github.com/Huangying-Zhan/DF-VO">Visual Odometry Revisited: What Should Be Learnt?</a>
================================================
FILE: config/kitti.yaml
================================================
cfg_name: 'default'
# dataset
raw_base_dir: '/home4/zhaow/data/kitti'
prepared_base_dir: '/home5/zhaow/data/kitti_release'
gt_2012_dir: '/home4/zhaow/data/kitti_stereo/kitti_2012/training'
gt_2015_dir: '/home4/zhaow/data/kitti_stereo/kitti_2015/training'
static_frames_txt: '/home5/zhaow/release/data/eigen/static_frames.txt'
test_scenes_txt: '/home5/zhaow/release/data/eigen/test_scenes.txt'
dataset: 'kitti_depth'
num_scales: 3
# training
num_iterations: 200000
# loss hyperparameters
w_ssim: 0.85 # w_pixel = 1 - w_ssim
w_flow_smooth: 10.0
w_flow_consis: 0.01
w_geo: 1.0
w_pt_depth: 1.0
w_pj_depth: 0.1
w_flow_error: 0.0
w_depth_smooth: 0.001
h_flow_consist_alpha: 3.0
h_flow_consist_beta: 0.05
ransac_iters: 100
ransac_points: 6000
# Depth Setting
depth_match_num: 6000
depth_sample_ratio: 0.20
depth_scale: 1
# basic info
img_hw: [256, 832]
use_svd_gpu: False
================================================
FILE: config/kitti_3stage.yaml
================================================
cfg_name: 'default'
# dataset
raw_base_dir: '/home/zhaow/data/kitti'
prepared_base_dir: '/home/zhaow/data/kitti_seq'
gt_2012_dir: '/home/zhaow/data/kitti_stereo/kitti_2012/training'
gt_2015_dir: '/home/zhaow/data/kitti_stereo/kitti_2015/training'
static_frames_txt: '/home/zhaow/data/kitti_seq/static_frames.txt'
test_scenes_txt: '/home5/zhaow/release/data/eigen/test_scenes.txt'
dataset: 'kitti_depth'
num_scales: 3
# training
num_iterations: 200000 # set -1 to use num_epochs
num_epochs: 0
# loss hyperparameters
w_ssim: 0.85 # w_pixel = 1 - w_ssim
w_flow_smooth: 10.0
w_flow_consis: 0.01
w_geo: 0.002 #0.002
w_pt_depth: 0.0
w_pj_depth: 0.000
w_flow_error: 0.01
w_depth_smooth: 0.000
h_flow_consist_alpha: 3.0
h_flow_consist_beta: 0.05
# dfe setting
dfe_depth: 3
dfe_depth_scales: 4
dfe_points: 500
ransac_iters: 100
ransac_points: 6000
# Depth Setting
depth_match_num: 6000
depth_sample_ratio: 0.20
depth_scale: 1
# basic info
img_hw: [256, 832]
use_svd_gpu: False
================================================
FILE: config/nyu.yaml
================================================
cfg_name: 'default'
# dataset
raw_base_dir: '/home5/zhaow/data/nyuv2'
prepared_base_dir: '/home5/zhaow/data/nyu_seq_release'
nyu_test_dir: '/home5/zhaow/data/nyuv2_test'
dataset: 'nyuv2'
num_scales: 3
# training
num_iterations: 400000
# loss hyperparameters
w_ssim: 0.85 # w_pixel = 1 - w_ssim
w_flow_smooth: 10.0
w_flow_consis: 0.01
w_geo: 0.0
w_pt_depth: 1.0
w_pj_depth: 0.1
w_flow_error: 0.00
w_depth_smooth: 0.0001
h_flow_consist_alpha: 3.0
h_flow_consist_beta: 0.05
ransac_iters: 100
ransac_points: 6000
# Depth Setting
depth_match_num: 6000
depth_sample_ratio: 0.20
depth_scale: 1
# basic info
img_hw: [448, 576]
#img_hw: [192, 256]
block_tri_grad: False
================================================
FILE: config/nyu_192.yaml
================================================
cfg_name: 'default'
# dataset
raw_base_dir: '/home5/zhaow/data/nyuv2'
prepared_base_dir: '/home5/zhaow/data/nyu_seq_release'
nyu_test_dir: '/home5/zhaow/data/nyuv2_test'
dataset: 'nyuv2'
num_scales: 3
# training
num_iterations: 400000
# loss hyperparameters
w_ssim: 0.85 # w_pixel = 1 - w_ssim
w_flow_smooth: 10.0
w_flow_consis: 0.01
w_geo: 0.0
w_pt_depth: 1.0
w_pj_depth: 0.1
w_flow_error: 0.00
w_depth_smooth: 0.0001
h_flow_consist_alpha: 3.0
h_flow_consist_beta: 0.05
ransac_iters: 100
ransac_points: 6000
# Depth Setting
depth_match_num: 6000
depth_sample_ratio: 0.20
depth_scale: 1
# basic info
#img_hw: [448, 576]
img_hw: [192, 256]
block_tri_grad: False
================================================
FILE: config/nyu_192_3stage.yaml
================================================
cfg_name: 'default'
# dataset
raw_base_dir: '/home5/zhaow/data/nyuv2'
prepared_base_dir: '/home5/zhaow/data/nyu_seq_release'
nyu_test_dir: '/home5/zhaow/data/nyuv2_test'
dataset: 'nyuv2'
num_scales: 3
# training
num_iterations: 400000 # set -1 to use num_epochs
# loss hyperparameters
w_ssim: 0.85 # w_pixel = 1 - w_ssim
w_flow_smooth: 10.0
w_flow_consis: 0.01
w_geo: 0.1
w_pt_depth: 0.1
w_pj_depth: 0.01
w_flow_error: 0.00
w_depth_smooth: 0.0001
h_flow_consist_alpha: 3.0
h_flow_consist_beta: 0.05
ransac_iters: 100
ransac_points: 6000
# Depth Setting
depth_match_num: 6000
depth_sample_ratio: 0.20
depth_scale: 1
# basic info
#img_hw: [448, 576]
img_hw: [192, 256]
block_tri_grad: False
================================================
FILE: config/nyu_3stage.yaml
================================================
cfg_name: 'default'
# dataset
raw_base_dir: '/home5/zhaow/data/nyuv2'
prepared_base_dir: '/home5/zhaow/data/nyu_seq_release'
nyu_test_dir: '/home5/zhaow/data/nyuv2_test'
dataset: 'nyuv2'
num_scales: 3
# training
num_iterations: 400000 # set -1 to use num_epochs
# loss hyperparameters
w_ssim: 0.85 # w_pixel = 1 - w_ssim
w_flow_smooth: 10.0
w_flow_consis: 0.01
w_geo: 0.1
w_pt_depth: 0.1
w_pj_depth: 0.01
w_flow_error: 0.00
w_depth_smooth: 0.0001
h_flow_consist_alpha: 3.0
h_flow_consist_beta: 0.05
ransac_iters: 100
ransac_points: 6000
# Depth Setting
depth_match_num: 6000
depth_sample_ratio: 0.20
depth_scale: 1
# basic info
img_hw: [448, 576]
#img_hw: [192, 256]
block_tri_grad: False
================================================
FILE: config/nyu_posenet_192.yaml
================================================
cfg_name: 'default'
# dataset
raw_base_dir: '/home5/zhaow/data/nyuv2_sub2'
prepared_base_dir: '/home5/zhaow/data/nyu_seq_release'
nyu_test_dir: '/home5/zhaow/data/nyuv2_test'
dataset: 'nyuv2'
num_scales: 3
# training
num_iterations: 500000 # set -1 to use num_epochs
num_epochs: 0
# loss hyperparameters
w_ssim: 0.85 # w_pixel = 1 - w_ssim
w_flow_smooth: 10.0
w_flow_consis: 0.01
w_geo: 0.0
w_pt_depth: 0.0
w_pj_depth: 0.5
w_flow_error: 10.0
w_depth_smooth: 0.00001
h_flow_consist_alpha: 3.0
h_flow_consist_beta: 0.05
ransac_iters: 100
ransac_points: 6000
# Depth Setting
depth_match_num: 6000
depth_scale: 1
# basic info
img_hw: [192,256]
block_tri_grad: False
================================================
FILE: config/odo.yaml
================================================
cfg_name: 'default'
# dataset
raw_base_dir: '/home4/zhaow/data/kitti_odometry/sequences'
prepared_base_dir: '/home5/zhaow/data/kitti_odo_release/'
gt_2012_dir: '/home4/zhaow/data/kitti_stereo/kitti_2012/training'
gt_2015_dir: '/home4/zhaow/data/kitti_stereo/kitti_2015/training'
dataset: 'kitti_odo'
num_scales: 3
# training
num_iterations: 200000
w_ssim: 0.85 # w_pixel = 1 - w_ssim
w_flow_smooth: 10.0
w_flow_consis: 0.01
w_geo: 0.1
w_pt_depth: 1.0
w_pj_depth: 0.1
w_flow_error: 0.0
w_depth_smooth: 0.0001
h_flow_consist_alpha: 3.0
h_flow_consist_beta: 0.05
ransac_iters: 100
ransac_points: 6000
# Depth Setting
depth_match_num: 6000
depth_sample_ratio: 0.20
depth_scale: 1
# basic info
img_hw: [256, 832]
block_tri_grad: False
================================================
FILE: config/odo_3stage.yaml
================================================
cfg_name: 'default'
# dataset
raw_base_dir: '/home4/zhaow/data/kitti_odometry/sequences'
prepared_base_dir: '/home5/zhaow/data/kitti_odo_release/'
gt_2012_dir: '/home4/zhaow/data/kitti_stereo/kitti_2012/training'
gt_2015_dir: '/home4/zhaow/data/kitti_stereo/kitti_2015/training'
dataset: 'kitti_odo'
num_scales: 3
# training
num_iterations: 200000
w_ssim: 0.85 # w_pixel = 1 - w_ssim
w_flow_smooth: 10.0
w_flow_consis: 0.01
w_geo: 0.002 #0.002
w_pt_depth: 0.0
w_pj_depth: 0.000
w_flow_error: 0.01
w_depth_smooth: 0.000
h_flow_consist_alpha: 3.0
h_flow_consist_beta: 0.05
ransac_iters: 100
ransac_points: 6000
# Depth Setting
depth_match_num: 6000
depth_sample_ratio: 0.20
depth_scale: 1
# basic info
img_hw: [256, 832]
block_tri_grad: False
================================================
FILE: config/odo_posenet.yaml
================================================
cfg_name: 'default'
# dataset
raw_base_dir: '/home4/zhaow/data/kitti_odometry/sequences'
prepared_base_dir: '/home5/zhaow/data/kitti_odo/'
gt_2012_dir: '/home4/zhaow/data/kitti_stereo/kitti_2012/training'
gt_2015_dir: '/home4/zhaow/data/kitti_stereo/kitti_2015/training'
dataset: 'kitti_odo'
num_scales: 3
# training
num_iterations: 200000 # set -1 to use num_epochs
num_epochs: 0
w_ssim: 0.85 # w_pixel = 1 - w_ssim
w_flow_smooth: 10.0
w_flow_consis: 0.01
w_geo: 0.0
w_pt_depth: 0.0
w_pj_depth: 0.1
w_flow_error: 0.5
w_depth_smooth: 0.0001
h_flow_consist_alpha: 3.0
h_flow_consist_beta: 0.05
ransac_iters: 100
ransac_points: 6000
# Depth Setting
depth_match_num: 6000
depth_scale: 1
# basic info
img_hw: [256, 832]
block_tri_grad: False
================================================
FILE: core/config/__init__.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from config_utils import generate_loss_weights_dict
================================================
FILE: core/config/config_utils.py
================================================
import os, sys
def generate_loss_weights_dict(cfg):
weight_dict = {}
weight_dict['loss_pixel'] = 1 - cfg.w_ssim
weight_dict['loss_ssim'] = cfg.w_ssim
weight_dict['loss_flow_smooth'] = cfg.w_flow_smooth
weight_dict['loss_flow_consis'] = cfg.w_flow_consis
weight_dict['geo_loss'] = cfg.w_geo
weight_dict['pt_depth_loss'] = cfg.w_pt_depth
weight_dict['pj_depth_loss'] = cfg.w_pj_depth
weight_dict['depth_smooth_loss'] = cfg.w_depth_smooth
weight_dict['flow_error'] = cfg.w_flow_error
return weight_dict
================================================
FILE: core/dataset/__init__.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from kitti_raw import KITTI_RAW
from kitti_prepared import KITTI_Prepared
from kitti_2012 import KITTI_2012
from kitti_2015 import KITTI_2015
from nyu_v2 import NYU_Prepare, NYU_v2
from kitti_odo import KITTI_Odo
================================================
FILE: core/dataset/kitti_2012.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from kitti_prepared import KITTI_Prepared
sys.path.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', 'evaluation'))
from evaluate_flow import get_scaled_intrinsic_matrix, eval_flow_avg
import numpy as np
import cv2
import copy
import torch
import pdb
class KITTI_2012(KITTI_Prepared):
def __init__(self, data_dir, img_hw=(256, 832), init=True):
self.data_dir = data_dir
self.img_hw = img_hw
self.num_total = 194
if init:
self.data_list = self.get_data_list()
def get_data_list(self):
data_list = []
for i in range(self.num_total):
data = {}
data['img1_dir'] = os.path.join(self.data_dir, 'image_2', str(i).zfill(6) + '_10.png')
data['img2_dir'] = os.path.join(self.data_dir, 'image_2', str(i).zfill(6) + '_11.png')
data['calib_file_dir'] = os.path.join(self.data_dir, 'calib_cam_to_cam', str(i).zfill(6) + '.txt')
data_list.append(data)
return data_list
def __len__(self):
return len(self.data_list)
def read_cam_intrinsic(self, calib_file):
input_intrinsic = get_scaled_intrinsic_matrix(calib_file, zoom_x=1.0, zoom_y=1.0)
return input_intrinsic
def __getitem__(self, idx):
'''
Returns:
- img torch.Tensor (N * H, W, 3)
- K torch.Tensor (num_scales, 3, 3)
- K_inv torch.Tensor (num_scales, 3, 3)
'''
data = self.data_list[idx]
# load img
img1 = cv2.imread(data['img1_dir'])
img2 = cv2.imread(data['img2_dir'])
img_hw_orig = (img1.shape[0], img1.shape[1])
img = np.concatenate([img1, img2], 0)
img = self.preprocess_img(img, self.img_hw, is_test=True)
img = img.transpose(2,0,1)
# load intrinsic
cam_intrinsic = self.read_cam_intrinsic(data['calib_file_dir'])
cam_intrinsic = self.rescale_intrinsics(cam_intrinsic, img_hw_orig, self.img_hw)
K, K_inv = self.get_intrinsics_per_scale(cam_intrinsic, scale=0) # (3, 3), (3, 3)
return torch.from_numpy(img).float(), torch.from_numpy(K).float(), torch.from_numpy(K_inv).float()
if __name__ == '__main__':
pass
================================================
FILE: core/dataset/kitti_2015.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from kitti_2012 import KITTI_2012
class KITTI_2015(KITTI_2012):
def __init__(self, data_dir, img_hw=(256, 832)):
super(KITTI_2015, self).__init__(data_dir, img_hw, init=False)
self.num_total = 200
self.data_list = self.get_data_list()
if __name__ == '__main__':
pass
================================================
FILE: core/dataset/kitti_odo.py
================================================
import os, sys
import numpy as np
import imageio
from tqdm import tqdm
import torch.multiprocessing as mp
def process_folder(q, data_dir, output_dir, stride=1):
while True:
if q.empty():
break
folder = q.get()
image_path = os.path.join(data_dir, folder, 'image_2/')
dump_image_path = os.path.join(output_dir, folder)
if not os.path.isdir(dump_image_path):
os.makedirs(dump_image_path)
f = open(os.path.join(dump_image_path, 'train.txt'), 'w')
# Note. the os.listdir method returns arbitary order of list. We need correct order.
numbers = len(os.listdir(image_path))
for n in range(numbers - stride):
s_idx = n
e_idx = s_idx + stride
curr_image = imageio.imread(os.path.join(image_path, '%.6d'%s_idx)+'.png')
next_image = imageio.imread(os.path.join(image_path, '%.6d'%e_idx)+'.png')
seq_images = np.concatenate([curr_image, next_image], axis=0)
imageio.imsave(os.path.join(dump_image_path, '%.6d'%s_idx)+'.png', seq_images.astype('uint8'))
# Write training files
f.write('%s %s\n' % (os.path.join(folder, '%.6d'%s_idx)+'.png', os.path.join(folder, 'calib.txt')))
print(folder)
class KITTI_Odo(object):
def __init__(self, data_dir):
self.data_dir = data_dir
self.train_seqs = ['00','01','02','03','04','05','06','07','08']
def __len__(self):
raise NotImplementedError
def prepare_data_mp(self, output_dir, stride=1):
num_processes = 16
processes = []
q = mp.Queue()
if not os.path.isfile(os.path.join(output_dir, 'train.txt')):
os.makedirs(output_dir)
#f = open(os.path.join(output_dir, 'train.txt'), 'w')
print('Preparing sequence data....')
if not os.path.isdir(self.data_dir):
raise
dirlist = os.listdir(self.data_dir)
total_dirlist = []
# Get the different folders of images
for d in dirlist:
if d in self.train_seqs:
q.put(d)
# Process every folder
for rank in range(num_processes):
p = mp.Process(target=process_folder, args=(q, self.data_dir, output_dir, stride))
p.start()
processes.append(p)
for p in processes:
p.join()
f = open(os.path.join(output_dir, 'train.txt'), 'w')
for d in self.train_seqs:
train_file = open(os.path.join(output_dir, d, 'train.txt'), 'r')
for l in train_file.readlines():
f.write(l)
command = 'cp ' + os.path.join(self.data_dir, d, 'calib.txt') + ' ' + os.path.join(output_dir, d, 'calib.txt')
os.system(command)
print('Data Preparation Finished.')
def __getitem__(self, idx):
raise NotImplementedError
if __name__ == '__main__':
data_dir = '/home4/zhaow/data/kitti'
dirlist = os.listdir('/home4/zhaow/data/kitti')
output_dir = '/home4/zhaow/data/kitti_seq/data_generated_s2'
total_dirlist = []
# Get the different folders of images
for d in dirlist:
seclist = os.listdir(os.path.join(data_dir, d))
for s in seclist:
if os.path.isdir(os.path.join(data_dir, d, s)):
total_dirlist.append(os.path.join(d, s))
F = open(os.path.join(output_dir, 'train.txt'), 'w')
for p in total_dirlist:
traintxt = os.path.join(os.path.join(output_dir, p), 'train.txt')
f = open(traintxt, 'r')
for line in f.readlines():
F.write(line)
print(traintxt)
================================================
FILE: core/dataset/kitti_prepared.py
================================================
import os, sys
import numpy as np
import cv2
import copy
import torch
import torch.utils.data
import pdb
class KITTI_Prepared(torch.utils.data.Dataset):
def __init__(self, data_dir, num_scales=3, img_hw=(256, 832), num_iterations=None):
super(KITTI_Prepared, self).__init__()
self.data_dir = data_dir
self.num_scales = num_scales
self.img_hw = img_hw
self.num_iterations = num_iterations
info_file = os.path.join(self.data_dir, 'train.txt')
#info_file = os.path.join(self.data_dir, 'train_flow.txt')
self.data_list = self.get_data_list(info_file)
def get_data_list(self, info_file):
with open(info_file, 'r') as f:
lines = f.readlines()
data_list = []
for line in lines:
k = line.strip('\n').split()
data = {}
data['image_file'] = os.path.join(self.data_dir, k[0])
data['cam_intrinsic_file'] = os.path.join(self.data_dir, k[1])
data_list.append(data)
print('A total of {} image pairs found'.format(len(data_list)))
return data_list
def count(self):
return len(self.data_list)
def rand_num(self, idx):
num_total = self.count()
np.random.seed(idx)
num = np.random.randint(num_total)
return num
def __len__(self):
if self.num_iterations is None:
return self.count()
else:
return self.num_iterations
def resize_img(self, img, img_hw):
'''
Input size (N*H, W, 3)
Output size (N*H', W', 3), where (H', W') == self.img_hw
'''
img_h, img_w = img.shape[0], img.shape[1]
img_hw_orig = (int(img_h / 2), img_w)
img1, img2 = img[:img_hw_orig[0], :, :], img[img_hw_orig[0]:, :, :]
img1_new = cv2.resize(img1, (img_hw[1], img_hw[0]))
img2_new = cv2.resize(img2, (img_hw[1], img_hw[0]))
img_new = np.concatenate([img1_new, img2_new], 0)
return img_new
def random_flip_img(self, img):
is_flip = (np.random.rand() > 0.5)
if is_flip:
img = cv2.flip(img, 1)
return img
def preprocess_img(self, img, img_hw=None, is_test=False):
if img_hw is None:
img_hw = self.img_hw
img = self.resize_img(img, img_hw)
if not is_test:
img = self.random_flip_img(img)
img = img / 255.0
return img
def read_cam_intrinsic(self, fname):
with open(fname, 'r') as f:
lines = f.readlines()
data = lines[-1].strip('\n').split(' ')[1:]
data = [float(k) for k in data]
data = np.array(data).reshape(3,4)
cam_intrinsics = data[:3,:3]
return cam_intrinsics
def rescale_intrinsics(self, K, img_hw_orig, img_hw_new):
K[0,:] = K[0,:] * img_hw_new[1] / img_hw_orig[1]
K[1,:] = K[1,:] * img_hw_new[0] / img_hw_orig[0]
return K
def get_intrinsics_per_scale(self, K, scale):
K_new = copy.deepcopy(K)
K_new[0,:] = K_new[0,:] / (2**scale)
K_new[1,:] = K_new[1,:] / (2**scale)
K_new_inv = np.linalg.inv(K_new)
return K_new, K_new_inv
def get_multiscale_intrinsics(self, K, num_scales):
K_ms, K_inv_ms = [], []
for s in range(num_scales):
K_new, K_new_inv = self.get_intrinsics_per_scale(K, s)
K_ms.append(K_new[None,:,:])
K_inv_ms.append(K_new_inv[None,:,:])
K_ms = np.concatenate(K_ms, 0)
K_inv_ms = np.concatenate(K_inv_ms, 0)
return K_ms, K_inv_ms
def __getitem__(self, idx):
'''
Returns:
- img torch.Tensor (N * H, W, 3)
- K torch.Tensor (num_scales, 3, 3)
- K_inv torch.Tensor (num_scales, 3, 3)
'''
if self.num_iterations is not None:
idx = self.rand_num(idx)
data = self.data_list[idx]
# load img
img = cv2.imread(data['image_file'])
img_hw_orig = (int(img.shape[0] / 2), img.shape[1])
img = self.preprocess_img(img, self.img_hw) # (img_h * 2, img_w, 3)
img = img.transpose(2,0,1)
# load intrinsic
cam_intrinsic = self.read_cam_intrinsic(data['cam_intrinsic_file'])
cam_intrinsic = self.rescale_intrinsics(cam_intrinsic, img_hw_orig, self.img_hw)
K_ms, K_inv_ms = self.get_multiscale_intrinsics(cam_intrinsic, self.num_scales) # (num_scales, 3, 3), (num_scales, 3, 3)
return torch.from_numpy(img).float(), torch.from_numpy(K_ms).float(), torch.from_numpy(K_inv_ms).float()
if __name__ == '__main__':
pass
================================================
FILE: core/dataset/kitti_raw.py
================================================
import os, sys
import numpy as np
import imageio
from tqdm import tqdm
import torch.multiprocessing as mp
import pdb
def process_folder(q, static_frames, test_scenes, data_dir, output_dir, stride=1):
while True:
if q.empty():
break
folder = q.get()
if folder in static_frames.keys():
static_ids = static_frames[folder]
else:
static_ids = []
scene = folder.split('/')[1]
if scene[:-5] in test_scenes:
continue
image_path = os.path.join(data_dir, folder, 'image_02/data')
dump_image_path = os.path.join(output_dir, folder)
if not os.path.isdir(dump_image_path):
os.makedirs(dump_image_path)
f = open(os.path.join(dump_image_path, 'train.txt'), 'w')
# Note. the os.listdir method returns arbitary order of list. We need correct order.
numbers = len(os.listdir(image_path))
for n in range(numbers - stride):
s_idx = n
e_idx = s_idx + stride
if '%.10d'%s_idx in static_ids or '%.10d'%e_idx in static_ids:
#print('%.10d'%s_idx)
continue
curr_image = imageio.imread(os.path.join(image_path, '%.10d'%s_idx)+'.png')
next_image = imageio.imread(os.path.join(image_path, '%.10d'%e_idx)+'.png')
seq_images = np.concatenate([curr_image, next_image], axis=0)
imageio.imsave(os.path.join(dump_image_path, '%.10d'%s_idx)+'.png', seq_images.astype('uint8'))
# Write training files
date = folder.split('/')[0]
f.write('%s %s\n' % (os.path.join(folder, '%.10d'%s_idx)+'.png', os.path.join(date, 'calib_cam_to_cam.txt')))
print(folder)
class KITTI_RAW(object):
def __init__(self, data_dir, static_frames_txt, test_scenes_txt):
self.data_dir = data_dir
self.static_frames_txt = static_frames_txt
self.test_scenes_txt = test_scenes_txt
def __len__(self):
raise NotImplementedError
def collect_static_frame(self):
f = open(self.static_frames_txt)
static_frames = {}
for line in f.readlines():
line = line.strip()
date, drive, frame_id = line.split(' ')
curr_fid = '%.10d' % (np.int(frame_id))
if os.path.join(date, drive) not in static_frames.keys():
static_frames[os.path.join(date, drive)] = []
static_frames[os.path.join(date, drive)].append(curr_fid)
return static_frames
def collect_test_scenes(self):
f = open(self.test_scenes_txt)
test_scenes = []
for line in f.readlines():
line = line.strip()
test_scenes.append(line)
return test_scenes
def prepare_data_mp(self, output_dir, stride=1):
num_processes = 16
processes = []
q = mp.Queue()
static_frames = self.collect_static_frame()
test_scenes = self.collect_test_scenes()
if not os.path.isfile(os.path.join(output_dir, 'train.txt')):
os.makedirs(output_dir)
#f = open(os.path.join(output_dir, 'train.txt'), 'w')
print('Preparing sequence data....')
if not os.path.isdir(self.data_dir):
raise
dirlist = os.listdir(self.data_dir)
total_dirlist = []
# Get the different folders of images
for d in dirlist:
seclist = os.listdir(os.path.join(self.data_dir, d))
for s in seclist:
if os.path.isdir(os.path.join(self.data_dir, d, s)):
total_dirlist.append(os.path.join(d, s))
q.put(os.path.join(d, s))
# Process every folder
for rank in range(num_processes):
p = mp.Process(target=process_folder, args=(q, static_frames, test_scenes, self.data_dir, output_dir, stride))
p.start()
processes.append(p)
for p in processes:
p.join()
# Collect the training frames.
f = open(os.path.join(output_dir, 'train.txt'), 'w')
for date in os.listdir(output_dir):
if os.path.isdir(os.path.join(output_dir, date)):
drives = os.listdir(os.path.join(output_dir, date))
for d in drives:
train_file = open(os.path.join(output_dir, date, d, 'train.txt'), 'r')
for l in train_file.readlines():
f.write(l)
# Get calib files
for date in os.listdir(self.data_dir):
command = 'cp ' + os.path.join(self.data_dir, date, 'calib_cam_to_cam.txt') + ' ' + os.path.join(output_dir, date, 'calib_cam_to_cam.txt')
os.system(command)
print('Data Preparation Finished.')
def prepare_data(self, output_dir):
static_frames = self.collect_static_frame()
test_scenes = self.collect_test_scenes()
if not os.path.isfile(os.path.join(output_dir, 'train.txt')):
os.makedirs(output_dir)
f = open(os.path.join(output_dir, 'train.txt'), 'w')
print('Preparing sequence data....')
if not os.path.isdir(self.data_dir):
raise
dirlist = os.listdir(self.data_dir)
total_dirlist = []
# Get the different folders of images
for d in dirlist:
seclist = os.listdir(os.path.join(self.data_dir, d))
for s in seclist:
if os.path.isdir(os.path.join(self.data_dir, d, s)):
total_dirlist.append(os.path.join(d, s))
# Process every folder
for folder in tqdm(total_dirlist):
if folder in static_frames.keys():
static_ids = static_frames[folder]
else:
static_ids = []
scene = folder.split('/')[1]
if scene in test_scenes:
continue
image_path = os.path.join(self.data_dir, folder, 'image_02/data')
dump_image_path = os.path.join(output_dir, folder)
if not os.path.isdir(dump_image_path):
os.makedirs(dump_image_path)
# Note. the os.listdir method returns arbitary order of list. We need correct order.
numbers = len(os.listdir(image_path))
for n in range(numbers - 1):
s_idx = n
e_idx = s_idx + 1
if '%.10d'%s_idx in static_ids or '%.10d'%e_idx in static_ids:
print('%.10d'%s_idx)
continue
curr_image = imageio.imread(os.path.join(image_path, '%.10d'%s_idx)+'.png')
next_image = imageio.imread(os.path.join(image_path, '%.10d'%e_idx)+'.png')
seq_images = np.concatenate([curr_image, next_image], axis=0)
imageio.imsave(os.path.join(dump_image_path, '%.10d'%s_idx)+'.png', seq_images.astype('uint8'))
# Write training files
date = folder.split('/')[0]
f.write('%s %s\n' % (os.path.join(folder, '%.10d'%s_idx)+'.png', os.path.join(date, 'calib_cam_to_cam.txt')))
print(folder)
# Get calib files
for date in os.listdir(self.data_dir):
command = 'cp ' + os.path.join(self.data_dir, date, 'calib_cam_to_cam.txt') + ' ' + os.path.join(output_dir, date, 'calib_cam_to_cam.txt')
os.system(command)
return os.path.join(output_dir, 'train.txt')
def __getitem__(self, idx):
raise NotImplementedError
if __name__ == '__main__':
data_dir = '/home4/zhaow/data/kitti'
dirlist = os.listdir('/home4/zhaow/data/kitti')
output_dir = '/home4/zhaow/data/kitti_seq/data_generated_s2'
total_dirlist = []
# Get the different folders of images
for d in dirlist:
seclist = os.listdir(os.path.join(data_dir, d))
for s in seclist:
if os.path.isdir(os.path.join(data_dir, d, s)):
total_dirlist.append(os.path.join(d, s))
F = open(os.path.join(output_dir, 'train.txt'), 'w')
for p in total_dirlist:
traintxt = os.path.join(os.path.join(output_dir, p), 'train.txt')
f = open(traintxt, 'r')
for line in f.readlines():
F.write(line)
print(traintxt)
================================================
FILE: core/dataset/nyu_v2.py
================================================
import os, sys
import numpy as np
import imageio
import cv2
import copy
import h5py
import scipy.io as sio
import torch
import torch.utils.data
import pdb
from tqdm import tqdm
import torch.multiprocessing as mp
def collect_image_list(path):
# Get ppm images list of a folder.
files = os.listdir(path)
sorted_file = sorted([f for f in files])
image_list = []
for l in sorted_file:
if l.split('.')[-1] == 'ppm':
image_list.append(l)
return image_list
def process_folder(q, data_dir, output_dir, stride, train_scenes):
# Directly process the original nyu v2 depth dataset.
while True:
if q.empty():
break
folder = q.get()
scene_name = folder.split('/')[-1]
s1,s2 = scene_name.split('_')[:-1], scene_name.split('_')[-1]
scene_name_full = ''
for j in s1:
scene_name_full = scene_name_full + j + '_'
scene_name_full = scene_name_full + s2[:4]
if scene_name_full not in train_scenes:
continue
image_path = os.path.join(data_dir, folder)
dump_image_path = os.path.join(output_dir, folder)
if not os.path.isdir(dump_image_path):
os.makedirs(dump_image_path)
f = open(os.path.join(dump_image_path, 'train.txt'), 'w')
# Note. the os.listdir method returns arbitary order of list. We need correct order.
image_list = collect_image_list(image_path)
#image_list = open(os.path.join(image_path, 'index.txt')).readlines()
numbers = len(image_list) - 1 # The last ppm file seems truncated.
for n in range(numbers - stride):
s_idx = n
e_idx = s_idx + stride
s_name = image_list[s_idx].strip()
e_name = image_list[e_idx].strip()
curr_image = imageio.imread(os.path.join(image_path, s_name))
next_image = imageio.imread(os.path.join(image_path, e_name))
#curr_image = cv2.imread(os.path.join(image_path, s_name))
#next_image = cv2.imread(os.path.join(image_path, e_name))
seq_images = np.concatenate([curr_image, next_image], axis=0)
imageio.imsave(os.path.join(dump_image_path, os.path.splitext(s_name)[0]+'.png'), seq_images.astype('uint8'))
#cv2.imwrite(os.path.join(dump_image_path, os.path.splitext(s_name)[0]+'.png'), seq_images.astype('uint8'))
# Write training files
#date = folder.split('_')[2]
f.write('%s %s\n' % (os.path.join(folder, os.path.splitext(s_name)[0]+'.png'), 'calib_cam_to_cam.txt'))
print(folder)
class NYU_Prepare(object):
def __init__(self, data_dir, test_dir):
self.data_dir = data_dir
self.test_data = os.path.join(test_dir, 'nyu_depth_v2_labeled.mat')
self.splits = os.path.join(test_dir, 'splits.mat')
self.get_all_scenes()
self.get_test_scenes()
self.get_train_scenes()
def __len__(self):
raise NotImplementedError
def get_all_scenes(self):
self.all_scenes = []
paths = os.listdir(self.data_dir)
for p in paths:
if os.path.isdir(os.path.join(self.data_dir, p)):
pp = os.listdir(os.path.join(self.data_dir, p))
for path in pp:
self.all_scenes.append(path)
def get_test_scenes(self):
self.test_scenes = []
test_data = h5py.File(self.test_data, 'r')
test_split = sio.loadmat(self.splits)['testNdxs']
test_split = np.array(test_split).squeeze(1)
test_scenes = test_data['scenes'][0][test_split-1]
for i in range(len(test_scenes)):
obj = test_data[test_scenes[i]]
name = "".join(chr(j) for j in obj[:])
if name not in self.test_scenes:
self.test_scenes.append(name)
#pdb.set_trace()
def get_train_scenes(self):
self.train_scenes = []
train_data = h5py.File(self.test_data, 'r')
train_split = sio.loadmat(self.splits)['trainNdxs']
train_split = np.array(train_split).squeeze(1)
train_scenes = train_data['scenes'][0][train_split-1]
for i in range(len(train_scenes)):
obj = train_data[train_scenes[i]]
name = "".join(chr(j) for j in obj[:])
if name not in self.train_scenes:
self.train_scenes.append(name)
def prepare_data_mp(self, output_dir, stride=1):
num_processes = 32
processes = []
q = mp.Queue()
if not os.path.isfile(os.path.join(output_dir, 'train.txt')):
os.makedirs(output_dir)
#f = open(os.path.join(output_dir, 'train.txt'), 'w')
print('Preparing sequence data....')
if not os.path.isdir(self.data_dir):
raise
dirlist = os.listdir(self.data_dir)
total_dirlist = []
# Get the different folders of images
for d in dirlist:
if not os.path.isdir(os.path.join(self.data_dir, d)):
continue
seclist = os.listdir(os.path.join(self.data_dir, d))
for s in seclist:
if os.path.isdir(os.path.join(self.data_dir, d, s)):
total_dirlist.append(os.path.join(d, s))
q.put(os.path.join(d, s))
# Process every folder
for rank in range(num_processes):
p = mp.Process(target=process_folder, args=(q, self.data_dir, output_dir, stride, self.train_scenes))
p.start()
processes.append(p)
for p in processes:
p.join()
# Collect the training frames.
f = open(os.path.join(output_dir, 'train.txt'), 'w')
for dirlist in os.listdir(output_dir):
if os.path.isdir(os.path.join(output_dir, dirlist)):
seclists = os.listdir(os.path.join(output_dir, dirlist))
for s in seclists:
train_file = open(os.path.join(output_dir, dirlist, s, 'train.txt'), 'r')
for l in train_file.readlines():
f.write(l)
f.close()
f = open(os.path.join(output_dir, 'calib_cam_to_cam.txt'), 'w')
f.write('P_rect: 5.1885790117450188e+02 0.0 3.2558244941119034e+02 0.0 0.0 5.1946961112127485e+02 2.5373616633400465e+02 0.0 0.0 0.0 1.0 0.0')
f.close()
print('Data Preparation Finished.')
def __getitem__(self, idx):
raise NotImplementedError
class NYU_v2(torch.utils.data.Dataset):
def __init__(self, data_dir, num_scales=3, img_hw=(448, 576), num_iterations=None):
super(NYU_v2, self).__init__()
self.data_dir = data_dir
self.num_scales = num_scales
self.img_hw = img_hw
self.num_iterations = num_iterations
self.undist_coeff = np.array([2.07966153e-01, -5.8613825e-01, 7.223136313e-04, 1.047962719e-03, 4.98569866e-01])
self.mapx, self.mapy = None, None
self.roi = None
info_file = os.path.join(self.data_dir, 'train.txt')
self.data_list = self.get_data_list(info_file)
def get_data_list(self, info_file):
with open(info_file, 'r') as f:
lines = f.readlines()
data_list = []
for line in lines:
k = line.strip('\n').split()
data = {}
data['image_file'] = os.path.join(self.data_dir, k[0])
data['cam_intrinsic_file'] = os.path.join(self.data_dir, k[1])
data_list.append(data)
print('A total of {} image pairs found'.format(len(data_list)))
return data_list
def count(self):
return len(self.data_list)
def rand_num(self, idx):
num_total = self.count()
np.random.seed(idx)
num = np.random.randint(num_total)
return num
def __len__(self):
if self.num_iterations is None:
return self.count()
else:
return self.num_iterations
def resize_img(self, img, img_hw):
'''
Input size (N*H, W, 3)
Output size (N*H', W', 3), where (H', W') == self.img_hw
'''
img_h, img_w = img.shape[0], img.shape[1]
img_hw_orig = (int(img_h / 2), img_w)
img1, img2 = img[:img_hw_orig[0], :, :], img[img_hw_orig[0]:, :, :]
img1_new = cv2.resize(img1, (img_hw[1], img_hw[0]))
img2_new = cv2.resize(img2, (img_hw[1], img_hw[0]))
img_new = np.concatenate([img1_new, img2_new], 0)
return img_new
def random_flip_img(self, img):
is_flip = (np.random.rand() > 0.5)
if is_flip:
img = cv2.flip(img, 1)
return img
def undistort_img(self, img, K):
img_h, img_w = img.shape[0], img.shape[1]
img_hw_orig = (int(img_h / 2), img_w)
img1, img2 = img[:img_hw_orig[0], :, :], img[img_hw_orig[0]:, :, :]
h, w = img_hw_orig
if self.mapx is None:
newcameramtx, self.roi = cv2.getOptimalNewCameraMatrix(K, self.undist_coeff, (w,h), 1, (w,h))
self.mapx, self.mapy = cv2.initUndistortRectifyMap(K, self.undist_coeff, None, newcameramtx, (w,h), 5)
img1_undist = cv2.remap(img1, self.mapx, self.mapy, cv2.INTER_LINEAR)
img2_undist = cv2.remap(img2, self.mapx, self.mapy, cv2.INTER_LINEAR)
x,y,w,h = self.roi
img1_undist = img1_undist[y:y+h, x:x+w]
img2_undist = img2_undist[y:y+h, x:x+w]
img_undist = np.concatenate([img1_undist, img2_undist], 0)
#cv2.imwrite('./test.png', img)
#cv2.imwrite('./test_undist.png', img_undist)
#pdb.set_trace()
return img_undist
def preprocess_img(self, img, K, img_hw=None, is_test=False):
if img_hw is None:
img_hw = self.img_hw
if not is_test:
#img = img
img = self.undistort_img(img, K)
#img = self.random_flip_img(img)
img = self.resize_img(img, img_hw)
img = img / 255.0
return img
def read_cam_intrinsic(self, fname):
with open(fname, 'r') as f:
lines = f.readlines()
data = lines[-1].strip('\n').split(' ')[1:]
data = [float(k) for k in data]
data = np.array(data).reshape(3,4)
cam_intrinsics = data[:3,:3]
return cam_intrinsics
def rescale_intrinsics(self, K, img_hw_orig, img_hw_new):
K_new = copy.deepcopy(K)
K_new[0,:] = K_new[0,:] * img_hw_new[0] / img_hw_orig[0]
K_new[1,:] = K_new[1,:] * img_hw_new[1] / img_hw_orig[1]
return K_new
def get_intrinsics_per_scale(self, K, scale):
K_new = copy.deepcopy(K)
K_new[0,:] = K_new[0,:] / (2**scale)
K_new[1,:] = K_new[1,:] / (2**scale)
K_new_inv = np.linalg.inv(K_new)
return K_new, K_new_inv
def get_multiscale_intrinsics(self, K, num_scales):
K_ms, K_inv_ms = [], []
for s in range(num_scales):
K_new, K_new_inv = self.get_intrinsics_per_scale(K, s)
K_ms.append(K_new[None,:,:])
K_inv_ms.append(K_new_inv[None,:,:])
K_ms = np.concatenate(K_ms, 0)
K_inv_ms = np.concatenate(K_inv_ms, 0)
return K_ms, K_inv_ms
def __getitem__(self, idx):
'''
Returns:
- img torch.Tensor (N * H, W, 3)
- K torch.Tensor (num_scales, 3, 3)
- K_inv torch.Tensor (num_scales, 3, 3)
'''
if idx >= self.num_iterations:
raise IndexError
if self.num_iterations is not None:
idx = self.rand_num(idx)
data = self.data_list[idx]
# load img
img = cv2.imread(data['image_file'])
img_hw_orig = (int(img.shape[0] / 2), img.shape[1])
# load intrinsic
cam_intrinsic_orig = self.read_cam_intrinsic(data['cam_intrinsic_file'])
cam_intrinsic = self.rescale_intrinsics(cam_intrinsic_orig, img_hw_orig, self.img_hw)
K_ms, K_inv_ms = self.get_multiscale_intrinsics(cam_intrinsic, self.num_scales) # (num_scales, 3, 3), (num_scales, 3, 3)
# image preprocessing
img = self.preprocess_img(img, cam_intrinsic_orig, self.img_hw) # (img_h * 2, img_w, 3)
img = img.transpose(2,0,1)
return torch.from_numpy(img).float(), torch.from_numpy(K_ms).float(), torch.from_numpy(K_inv_ms).float()
if __name__ == '__main__':
data_dir = '/home4/zhaow/data/kitti'
dirlist = os.listdir('/home4/zhaow/data/kitti')
output_dir = '/home4/zhaow/data/kitti_seq/data_generated_s2'
total_dirlist = []
# Get the different folders of images
for d in dirlist:
seclist = os.listdir(os.path.join(data_dir, d))
for s in seclist:
if os.path.isdir(os.path.join(data_dir, d, s)):
total_dirlist.append(os.path.join(d, s))
F = open(os.path.join(output_dir, 'train.txt'), 'w')
for p in total_dirlist:
traintxt = os.path.join(os.path.join(output_dir, p), 'train.txt')
f = open(traintxt, 'r')
for line in f.readlines():
F.write(line)
print(traintxt)
================================================
FILE: core/evaluation/__init__.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from evaluate_flow import eval_flow_avg, load_gt_flow_kitti
from evaluate_mask import load_gt_mask
from evaluate_depth import eval_depth
================================================
FILE: core/evaluation/eval_odom.py
================================================
import copy
from matplotlib import pyplot as plt
import numpy as np
import os
from glob import glob
import pdb
def scale_lse_solver(X, Y):
"""Least-sqaure-error solver
Compute optimal scaling factor so that s(X)-Y is minimum
Args:
X (KxN array): current data
Y (KxN array): reference data
Returns:
scale (float): scaling factor
"""
scale = np.sum(X * Y)/np.sum(X ** 2)
return scale
def umeyama_alignment(x, y, with_scale=False):
"""
Computes the least squares solution parameters of an Sim(m) matrix
that minimizes the distance between a set of registered points.
Umeyama, Shinji: Least-squares estimation of transformation parameters
between two point patterns. IEEE PAMI, 1991
:param x: mxn matrix of points, m = dimension, n = nr. of data points
:param y: mxn matrix of points, m = dimension, n = nr. of data points
:param with_scale: set to True to align also the scale (default: 1.0 scale)
:return: r, t, c - rotation matrix, translation vector and scale factor
"""
if x.shape != y.shape:
assert False, "x.shape not equal to y.shape"
# m = dimension, n = nr. of data points
m, n = x.shape
# means, eq. 34 and 35
mean_x = x.mean(axis=1)
mean_y = y.mean(axis=1)
# variance, eq. 36
# "transpose" for column subtraction
sigma_x = 1.0 / n * (np.linalg.norm(x - mean_x[:, np.newaxis])**2)
# covariance matrix, eq. 38
outer_sum = np.zeros((m, m))
for i in range(n):
outer_sum += np.outer((y[:, i] - mean_y), (x[:, i] - mean_x))
cov_xy = np.multiply(1.0 / n, outer_sum)
# SVD (text betw. eq. 38 and 39)
u, d, v = np.linalg.svd(cov_xy)
# S matrix, eq. 43
s = np.eye(m)
if np.linalg.det(u) * np.linalg.det(v) < 0.0:
# Ensure a RHS coordinate system (Kabsch algorithm).
s[m - 1, m - 1] = -1
# rotation, eq. 40
r = u.dot(s).dot(v)
# scale & translation, eq. 42 and 41
c = 1 / sigma_x * np.trace(np.diag(d).dot(s)) if with_scale else 1.0
t = mean_y - np.multiply(c, r.dot(mean_x))
return r, t, c
class KittiEvalOdom():
# ----------------------------------------------------------------------
# poses: N,4,4
# pose: 4,4
# ----------------------------------------------------------------------
def __init__(self):
self.lengths = [100, 200, 300, 400, 500, 600, 700, 800]
self.num_lengths = len(self.lengths)
def loadPoses(self, file_name):
# ----------------------------------------------------------------------
# Each line in the file should follow one of the following structures
# (1) idx pose(3x4 matrix in terms of 12 numbers)
# (2) pose(3x4 matrix in terms of 12 numbers)
# ----------------------------------------------------------------------
f = open(file_name, 'r')
s = f.readlines()
f.close()
file_len = len(s)
poses = {}
for cnt, line in enumerate(s):
P = np.eye(4)
line_split = [float(i) for i in line.split(" ")]
withIdx = int(len(line_split) == 13)
for row in range(3):
for col in range(4):
P[row, col] = line_split[row*4 + col + withIdx]
if withIdx:
frame_idx = line_split[0]
else:
frame_idx = cnt
poses[frame_idx] = P
return poses
def trajectory_distances(self, poses):
# ----------------------------------------------------------------------
# poses: dictionary: [frame_idx: pose]
# ----------------------------------------------------------------------
dist = [0]
sort_frame_idx = sorted(poses.keys())
for i in range(len(sort_frame_idx)-1):
cur_frame_idx = sort_frame_idx[i]
next_frame_idx = sort_frame_idx[i+1]
P1 = poses[cur_frame_idx]
P2 = poses[next_frame_idx]
dx = P1[0, 3] - P2[0, 3]
dy = P1[1, 3] - P2[1, 3]
dz = P1[2, 3] - P2[2, 3]
dist.append(dist[i]+np.sqrt(dx**2+dy**2+dz**2))
return dist
def rotation_error(self, pose_error):
a = pose_error[0, 0]
b = pose_error[1, 1]
c = pose_error[2, 2]
d = 0.5*(a+b+c-1.0)
rot_error = np.arccos(max(min(d, 1.0), -1.0))
return rot_error
def translation_error(self, pose_error):
dx = pose_error[0, 3]
dy = pose_error[1, 3]
dz = pose_error[2, 3]
return np.sqrt(dx**2+dy**2+dz**2)
def last_frame_from_segment_length(self, dist, first_frame, len_):
for i in range(first_frame, len(dist), 1):
if dist[i] > (dist[first_frame] + len_):
return i
return -1
def calc_sequence_errors(self, poses_gt, poses_result):
err = []
dist = self.trajectory_distances(poses_gt)
self.step_size = 10
for first_frame in range(0, len(poses_gt), self.step_size):
for i in range(self.num_lengths):
len_ = self.lengths[i]
last_frame = self.last_frame_from_segment_length(dist, first_frame, len_)
# ----------------------------------------------------------------------
# Continue if sequence not long enough
# ----------------------------------------------------------------------
if last_frame == -1 or not(last_frame in poses_result.keys()) or not(first_frame in poses_result.keys()):
continue
# ----------------------------------------------------------------------
# compute rotational and translational errors
# ----------------------------------------------------------------------
pose_delta_gt = np.dot(np.linalg.inv(poses_gt[first_frame]), poses_gt[last_frame])
pose_delta_result = np.dot(np.linalg.inv(poses_result[first_frame]), poses_result[last_frame])
pose_error = np.dot(np.linalg.inv(pose_delta_result), pose_delta_gt)
r_err = self.rotation_error(pose_error)
t_err = self.translation_error(pose_error)
# ----------------------------------------------------------------------
# compute speed
# ----------------------------------------------------------------------
num_frames = last_frame - first_frame + 1.0
speed = len_/(0.1*num_frames)
err.append([first_frame, r_err/len_, t_err/len_, len_, speed])
return err
def save_sequence_errors(self, err, file_name):
fp = open(file_name, 'w')
for i in err:
line_to_write = " ".join([str(j) for j in i])
fp.writelines(line_to_write+"\n")
fp.close()
def compute_overall_err(self, seq_err):
t_err = 0
r_err = 0
seq_len = len(seq_err)
for item in seq_err:
r_err += item[1]
t_err += item[2]
ave_t_err = t_err / seq_len
ave_r_err = r_err / seq_len
return ave_t_err, ave_r_err
def plotPath(self, seq, poses_gt, poses_result):
plot_keys = ["Ground Truth", "Ours"]
fontsize_ = 20
plot_num =-1
poses_dict = {}
poses_dict["Ground Truth"] = poses_gt
poses_dict["Ours"] = poses_result
fig = plt.figure()
ax = plt.gca()
ax.set_aspect('equal')
for key in plot_keys:
pos_xz = []
# for pose in poses_dict[key]:
for frame_idx in sorted(poses_dict[key].keys()):
pose = poses_dict[key][frame_idx]
pos_xz.append([pose[0,3], pose[2,3]])
pos_xz = np.asarray(pos_xz)
plt.plot(pos_xz[:,0], pos_xz[:,1], label = key)
plt.legend(loc="upper right", prop={'size': fontsize_})
plt.xticks(fontsize=fontsize_)
plt.yticks(fontsize=fontsize_)
plt.xlabel('x (m)', fontsize=fontsize_)
plt.ylabel('z (m)', fontsize=fontsize_)
fig.set_size_inches(10, 10)
png_title = "sequence_"+(seq)
plt.savefig(self.plot_path_dir + "/" + png_title + ".pdf", bbox_inches='tight', pad_inches=0)
# plt.show()
def compute_segment_error(self, seq_errs):
# ----------------------------------------------------------------------
# This function calculates average errors for different segment.
# ----------------------------------------------------------------------
segment_errs = {}
avg_segment_errs = {}
for len_ in self.lengths:
segment_errs[len_] = []
# ----------------------------------------------------------------------
# Get errors
# ----------------------------------------------------------------------
for err in seq_errs:
len_ = err[3]
t_err = err[2]
r_err = err[1]
segment_errs[len_].append([t_err, r_err])
# ----------------------------------------------------------------------
# Compute average
# ----------------------------------------------------------------------
for len_ in self.lengths:
if segment_errs[len_] != []:
avg_t_err = np.mean(np.asarray(segment_errs[len_])[:, 0])
avg_r_err = np.mean(np.asarray(segment_errs[len_])[:, 1])
avg_segment_errs[len_] = [avg_t_err, avg_r_err]
else:
avg_segment_errs[len_] = []
return avg_segment_errs
def scale_optimization(self, gt, pred):
""" Optimize scaling factor
Args:
gt (4x4 array dict): ground-truth poses
pred (4x4 array dict): predicted poses
Returns:
new_pred (4x4 array dict): predicted poses after optimization
"""
pred_updated = copy.deepcopy(pred)
xyz_pred = []
xyz_ref = []
for i in pred:
pose_pred = pred[i]
pose_ref = gt[i]
xyz_pred.append(pose_pred[:3, 3])
xyz_ref.append(pose_ref[:3, 3])
xyz_pred = np.asarray(xyz_pred)
xyz_ref = np.asarray(xyz_ref)
scale = scale_lse_solver(xyz_pred, xyz_ref)
for i in pred_updated:
pred_updated[i][:3, 3] *= scale
return pred_updated
def eval(self, gt_txt, result_txt, seq=None):
# gt_dir: the directory of groundtruth poses txt
# results_dir: the directory of predicted poses txt
self.plot_path_dir = os.path.dirname(result_txt) + "/plot_path"
if not os.path.exists(self.plot_path_dir):
os.makedirs(self.plot_path_dir)
self.gt_txt = gt_txt
ave_t_errs = []
ave_r_errs = []
poses_result = self.loadPoses(result_txt)
poses_gt = self.loadPoses(self.gt_txt)
# Pose alignment to first frame
idx_0 = sorted(list(poses_result.keys()))[0]
pred_0 = poses_result[idx_0]
gt_0 = poses_gt[idx_0]
for cnt in poses_result:
poses_result[cnt] = np.linalg.inv(pred_0) @ poses_result[cnt]
poses_gt[cnt] = np.linalg.inv(gt_0) @ poses_gt[cnt]
# get XYZ
xyz_gt = []
xyz_result = []
for cnt in poses_result:
xyz_gt.append([poses_gt[cnt][0, 3], poses_gt[cnt][1, 3], poses_gt[cnt][2, 3]])
xyz_result.append([poses_result[cnt][0, 3], poses_result[cnt][1, 3], poses_result[cnt][2, 3]])
xyz_gt = np.asarray(xyz_gt).transpose(1, 0)
xyz_result = np.asarray(xyz_result).transpose(1, 0)
r, t, scale = umeyama_alignment(xyz_result, xyz_gt, True)
align_transformation = np.eye(4)
align_transformation[:3:, :3] = r
align_transformation[:3, 3] = t
for cnt in poses_result:
poses_result[cnt][:3, 3] *= scale
poses_result[cnt] = align_transformation @ poses_result[cnt]
# ----------------------------------------------------------------------
# compute sequence errors
# ----------------------------------------------------------------------
seq_err = self.calc_sequence_errors(poses_gt, poses_result)
# ----------------------------------------------------------------------
# Compute segment errors
# ----------------------------------------------------------------------
avg_segment_errs = self.compute_segment_error(seq_err)
# ----------------------------------------------------------------------
# compute overall error
# ----------------------------------------------------------------------
ave_t_err, ave_r_err = self.compute_overall_err(seq_err)
print("Sequence: " + seq)
print("Translational error (%): ", ave_t_err*100)
print("Rotational error (deg/100m): ", ave_r_err/np.pi*180*100)
ave_t_errs.append(ave_t_err)
ave_r_errs.append(ave_r_err)
# Plotting
self.plotPath(seq, poses_gt, poses_result)
print("-------------------- For Copying ------------------------------")
for i in range(len(ave_t_errs)):
print("{0:.2f}".format(ave_t_errs[i]*100))
print("{0:.2f}".format(ave_r_errs[i]/np.pi*180*100))
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser(description='KITTI evaluation')
parser.add_argument('--gt_txt', type=str, required=True, help="Groundtruth directory")
parser.add_argument('--result_txt', type=str, required=True, help="Result directory")
parser.add_argument('--seq', type=str, help="sequences to be evaluated", default='09')
args = parser.parse_args()
eval_tool = KittiEvalOdom()
eval_tool.eval(args.gt_txt, args.result_txt, seq=args.seq)
================================================
FILE: core/evaluation/evaluate_depth.py
================================================
from evaluation_utils import *
def process_depth(gt_depth, pred_depth, min_depth, max_depth):
mask = gt_depth > 0
pred_depth[pred_depth < min_depth] = min_depth
pred_depth[pred_depth > max_depth] = max_depth
gt_depth[gt_depth < min_depth] = min_depth
gt_depth[gt_depth > max_depth] = max_depth
return gt_depth, pred_depth, mask
def eval_depth(gt_depths,
pred_depths,
min_depth=1e-3,
max_depth=80, nyu=False):
num_samples = len(pred_depths)
rms = np.zeros(num_samples, np.float32)
log_rms = np.zeros(num_samples, np.float32)
abs_rel = np.zeros(num_samples, np.float32)
sq_rel = np.zeros(num_samples, np.float32)
d1_all = np.zeros(num_samples, np.float32)
a1 = np.zeros(num_samples, np.float32)
a2 = np.zeros(num_samples, np.float32)
a3 = np.zeros(num_samples, np.float32)
for i in range(num_samples):
gt_depth = gt_depths[i]
pred_depth = pred_depths[i]
mask = np.logical_and(gt_depth > min_depth, gt_depth < max_depth)
if not nyu:
gt_height, gt_width = gt_depth.shape
crop = np.array([0.40810811 * gt_height, 0.99189189 * gt_height,
0.03594771 * gt_width, 0.96405229 * gt_width]).astype(np.int32)
crop_mask = np.zeros(mask.shape)
crop_mask[crop[0]:crop[1], crop[2]:crop[3]] = 1
mask = np.logical_and(mask, crop_mask)
gt_depth = gt_depth[mask]
pred_depth = pred_depth[mask]
scale = np.median(gt_depth) / np.median(pred_depth)
pred_depth *= scale
gt_depth, pred_depth, mask = process_depth(
gt_depth, pred_depth, min_depth, max_depth)
abs_rel[i], sq_rel[i], rms[i], log_rms[i], a1[i], a2[i], a3[
i] = compute_errors(gt_depth, pred_depth, nyu=nyu)
return [abs_rel.mean(), sq_rel.mean(), rms.mean(), log_rms.mean(), a1.mean(), a2.mean(), a3.mean()]
================================================
FILE: core/evaluation/evaluate_flow.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
import numpy as np
from flowlib import read_flow_png, flow_to_image
import cv2
import multiprocessing
import functools
def get_scaled_intrinsic_matrix(calib_file, zoom_x, zoom_y):
intrinsics = load_intrinsics_raw(calib_file)
intrinsics = scale_intrinsics(intrinsics, zoom_x, zoom_y)
intrinsics[0, 1] = 0.0
intrinsics[1, 0] = 0.0
intrinsics[2, 0] = 0.0
intrinsics[2, 1] = 0.0
return intrinsics
def load_intrinsics_raw(calib_file):
filedata = read_raw_calib_file(calib_file)
if "P_rect_02" in filedata:
P_rect = filedata['P_rect_02']
else:
P_rect = filedata['P2']
P_rect = np.reshape(P_rect, (3, 4))
intrinsics = P_rect[:3, :3]
return intrinsics
def read_raw_calib_file(filepath):
# From https://github.com/utiasSTARS/pykitti/blob/master/pykitti/utils.py
"""Read in a calibration file and parse into a dictionary."""
data = {}
with open(filepath, 'r') as f:
for line in f.readlines():
key, value = line.split(':', 1)
# The only non-float values in these files are dates, which
# we don't care about anyway
try:
data[key] = np.array([float(x) for x in value.split()])
except ValueError:
pass
return data
def scale_intrinsics(mat, sx, sy):
out = np.copy(mat)
out[0, 0] *= sx
out[0, 2] *= sx
out[1, 1] *= sy
out[1, 2] *= sy
return out
def read_flow_gt_worker(dir_gt, i):
flow_true = read_flow_png(
os.path.join(dir_gt, "flow_occ", str(i).zfill(6) + "_10.png"))
flow_noc_true = read_flow_png(
os.path.join(dir_gt, "flow_noc", str(i).zfill(6) + "_10.png"))
return flow_true, flow_noc_true[:, :, 2]
def load_gt_flow_kitti(gt_dataset_dir, mode):
gt_flows = []
noc_masks = []
if mode == "kitti_2012":
num_gt = 194
dir_gt = gt_dataset_dir
elif mode == "kitti_2015":
num_gt = 200
dir_gt = gt_dataset_dir
else:
num_gt = None
dir_gt = None
raise ValueError('Mode {} not found.'.format(mode))
fun = functools.partial(read_flow_gt_worker, dir_gt)
pool = multiprocessing.Pool(5)
results = pool.imap(fun, range(num_gt), chunksize=10)
pool.close()
pool.join()
for result in results:
gt_flows.append(result[0])
noc_masks.append(result[1])
return gt_flows, noc_masks
def calculate_error_rate(epe_map, gt_flow, mask):
bad_pixels = np.logical_and(
epe_map * mask > 3,
epe_map * mask / np.maximum(
np.sqrt(np.sum(np.square(gt_flow), axis=2)), 1e-10) > 0.05)
return bad_pixels.sum() / mask.sum()
def eval_flow_avg(gt_flows,
noc_masks,
pred_flows,
cfg,
moving_masks=None,
write_img=False):
error, error_noc, error_occ, error_move, error_static, error_rate = 0.0, 0.0, 0.0, 0.0, 0.0, 0.0
error_move_rate, error_static_rate = 0.0, 0.0
num = len(gt_flows)
for gt_flow, noc_mask, pred_flow, i in zip(gt_flows, noc_masks, pred_flows,
range(len(gt_flows))):
H, W = gt_flow.shape[0:2]
pred_flow = np.copy(pred_flow)
pred_flow[:, :, 0] = pred_flow[:, :, 0] / cfg.img_hw[1] * W
pred_flow[:, :, 1] = pred_flow[:, :, 1] / cfg.img_hw[0] * H
flo_pred = cv2.resize(
pred_flow, (W, H), interpolation=cv2.INTER_LINEAR)
if write_img:
if not os.path.exists(os.path.join(cfg.model_dir, "pred_flow")):
os.mkdir(os.path.join(cfg.model_dir, "pred_flow"))
cv2.imwrite(
os.path.join(cfg.model_dir, "pred_flow",
str(i).zfill(6) + "_10.png"),
flow_to_image(flo_pred))
cv2.imwrite(
os.path.join(cfg.model_dir, "pred_flow",
str(i).zfill(6) + "_10_gt.png"),
flow_to_image(gt_flow[:, :, 0:2]))
cv2.imwrite(
os.path.join(cfg.model_dir, "pred_flow",
str(i).zfill(6) + "_10_err.png"),
flow_to_image(
(flo_pred - gt_flow[:, :, 0:2]) * gt_flow[:, :, 2:3]))
epe_map = np.sqrt(
np.sum(np.square(flo_pred[:, :, 0:2] - gt_flow[:, :, 0:2]),
axis=2))
error += np.sum(epe_map * gt_flow[:, :, 2]) / np.sum(gt_flow[:, :, 2])
error_noc += np.sum(epe_map * noc_mask) / np.sum(noc_mask)
error_occ += np.sum(epe_map * (gt_flow[:, :, 2] - noc_mask)) / max(
np.sum(gt_flow[:, :, 2] - noc_mask), 1.0)
error_rate += calculate_error_rate(epe_map, gt_flow[:, :, 0:2],
gt_flow[:, :, 2])
if moving_masks:
move_mask = moving_masks[i]
error_move_rate += calculate_error_rate(
epe_map, gt_flow[:, :, 0:2], gt_flow[:, :, 2] * move_mask)
error_static_rate += calculate_error_rate(
epe_map, gt_flow[:, :, 0:2],
gt_flow[:, :, 2] * (1.0 - move_mask))
error_move += np.sum(epe_map * gt_flow[:, :, 2] *
move_mask) / np.sum(gt_flow[:, :, 2] *
move_mask)
error_static += np.sum(epe_map * gt_flow[:, :, 2] * (
1.0 - move_mask)) / np.sum(gt_flow[:, :, 2] *
(1.0 - move_mask))
if moving_masks:
result = "{:>10}, {:>10}, {:>10}, {:>10}, {:>10}, {:>10}, {:>10}, {:>10} \n".format(
'epe', 'epe_noc', 'epe_occ', 'epe_move', 'epe_static',
'move_err_rate', 'static_err_rate', 'err_rate')
result += "{:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f} \n".format(
error / num, error_noc / num, error_occ / num, error_move / num,
error_static / num, error_move_rate / num, error_static_rate / num,
error_rate / num)
return result
else:
result = "{:>10}, {:>10}, {:>10}, {:>10} \n".format(
'epe', 'epe_noc', 'epe_occ', 'err_rate')
result += "{:10.4f}, {:10.4f}, {:10.4f}, {:10.4f} \n".format(
error / num, error_noc / num, error_occ / num, error_rate / num)
return result
================================================
FILE: core/evaluation/evaluate_mask.py
================================================
import os
import numpy as np
import cv2
import functools
import matplotlib.pyplot as plt
import multiprocessing
"""
Adopted from https://github.com/martinkersner/py_img_seg_eval
"""
class EvalSegErr(Exception):
def __init__(self, value):
self.value = value
def __str__(self):
return repr(self.value)
def pixel_accuracy(eval_segm, gt_segm):
'''
sum_i(n_ii) / sum_i(t_i)
'''
check_size(eval_segm, gt_segm)
cl, n_cl = extract_classes(gt_segm)
eval_mask, gt_mask = extract_both_masks(eval_segm, gt_segm, cl, n_cl)
sum_n_ii = 0
sum_t_i = 0
for i, c in enumerate(cl):
curr_eval_mask = eval_mask[i, :, :]
curr_gt_mask = gt_mask[i, :, :]
sum_n_ii += np.sum(np.logical_and(curr_eval_mask, curr_gt_mask))
sum_t_i += np.sum(curr_gt_mask)
if (sum_t_i == 0):
pixel_accuracy_ = 0
else:
pixel_accuracy_ = sum_n_ii / sum_t_i
return pixel_accuracy_
def mean_accuracy(eval_segm, gt_segm):
'''
(1/n_cl) sum_i(n_ii/t_i)
'''
check_size(eval_segm, gt_segm)
cl, n_cl = extract_classes(gt_segm)
eval_mask, gt_mask = extract_both_masks(eval_segm, gt_segm, cl, n_cl)
accuracy = list([0]) * n_cl
for i, c in enumerate(cl):
curr_eval_mask = eval_mask[i, :, :]
curr_gt_mask = gt_mask[i, :, :]
n_ii = np.sum(np.logical_and(curr_eval_mask, curr_gt_mask))
t_i = np.sum(curr_gt_mask)
if (t_i != 0):
accuracy[i] = n_ii / t_i
mean_accuracy_ = np.mean(accuracy)
return mean_accuracy_
def mean_IU(eval_segm, gt_segm):
'''
(1/n_cl) * sum_i(n_ii / (t_i + sum_j(n_ji) - n_ii))
'''
check_size(eval_segm, gt_segm)
cl, n_cl = union_classes(eval_segm, gt_segm)
_, n_cl_gt = extract_classes(gt_segm)
eval_mask, gt_mask = extract_both_masks(eval_segm, gt_segm, cl, n_cl)
IU = list([0]) * n_cl
for i, c in enumerate(cl):
curr_eval_mask = eval_mask[i, :, :]
curr_gt_mask = gt_mask[i, :, :]
if (np.sum(curr_eval_mask) == 0) or (np.sum(curr_gt_mask) == 0):
continue
n_ii = np.sum(np.logical_and(curr_eval_mask, curr_gt_mask))
t_i = np.sum(curr_gt_mask)
n_ij = np.sum(curr_eval_mask)
IU[i] = n_ii / (t_i + n_ij - n_ii)
mean_IU_ = np.sum(IU) / n_cl_gt
return mean_IU_, np.array(IU)
def frequency_weighted_IU(eval_segm, gt_segm):
'''
sum_k(t_k)^(-1) * sum_i((t_i*n_ii)/(t_i + sum_j(n_ji) - n_ii))
'''
check_size(eval_segm, gt_segm)
cl, n_cl = union_classes(eval_segm, gt_segm)
eval_mask, gt_mask = extract_both_masks(eval_segm, gt_segm, cl, n_cl)
frequency_weighted_IU_ = list([0]) * n_cl
for i, c in enumerate(cl):
curr_eval_mask = eval_mask[i, :, :]
curr_gt_mask = gt_mask[i, :, :]
if (np.sum(curr_eval_mask) == 0) or (np.sum(curr_gt_mask) == 0):
continue
n_ii = np.sum(np.logical_and(curr_eval_mask, curr_gt_mask))
t_i = np.sum(curr_gt_mask)
n_ij = np.sum(curr_eval_mask)
frequency_weighted_IU_[i] = (t_i * n_ii) / (t_i + n_ij - n_ii)
sum_k_t_k = get_pixel_area(eval_segm)
frequency_weighted_IU_ = np.sum(frequency_weighted_IU_) / sum_k_t_k
return frequency_weighted_IU_
'''
Auxiliary functions used during evaluation.
'''
def get_pixel_area(segm):
return segm.shape[0] * segm.shape[1]
def extract_both_masks(eval_segm, gt_segm, cl, n_cl):
eval_mask = extract_masks(eval_segm, cl, n_cl)
gt_mask = extract_masks(gt_segm, cl, n_cl)
return eval_mask, gt_mask
def extract_classes(segm):
cl = np.unique(segm)
n_cl = len(cl)
return cl, n_cl
def union_classes(eval_segm, gt_segm):
eval_cl, _ = extract_classes(eval_segm)
gt_cl, _ = extract_classes(gt_segm)
cl = np.union1d(eval_cl, gt_cl)
n_cl = len(cl)
return cl, n_cl
def extract_masks(segm, cl, n_cl):
h, w = segm_size(segm)
masks = np.zeros((n_cl, h, w))
for i, c in enumerate(cl):
masks[i, :, :] = segm == c
return masks
def segm_size(segm):
try:
height = segm.shape[0]
width = segm.shape[1]
except IndexError:
raise
return height, width
def check_size(eval_segm, gt_segm):
h_e, w_e = segm_size(eval_segm)
h_g, w_g = segm_size(gt_segm)
if (h_e != h_g) or (w_e != w_g):
raise EvalSegErr("DiffDim: Different dimensions of matrices!")
def read_mask_gt_worker(gt_dataset_dir, idx):
return cv2.imread(
gt_dataset_dir + "/obj_map/" + str(idx).zfill(6) + "_10.png", -1)
def load_gt_mask(gt_dataset_dir):
num_gt = 200
# the dataset dir should be the directory of kitti-2015.
fun = functools.partial(read_mask_gt_worker, gt_dataset_dir)
pool = multiprocessing.Pool(5)
results = pool.imap(fun, range(num_gt), chunksize=10)
pool.close()
pool.join()
gt_masks = []
for m in results:
m[m > 0.0] = 1.0
gt_masks.append(m)
return gt_masks
def eval_mask(pred_masks, gt_masks, opt):
grey_cmap = plt.get_cmap("Greys")
if not os.path.exists(os.path.join(opt.trace, "pred_mask")):
os.mkdir(os.path.join(opt.trace, "pred_mask"))
pa_res, ma_res, mIU_res, fwIU_res = 0.0, 0.0, 0.0, 0.0
IU_res = np.array([0.0, 0.0])
num_total = len(gt_masks)
for i in range(num_total):
gt_mask = gt_masks[i]
H, W = gt_mask.shape[0:2]
pred_mask = cv2.resize(
pred_masks[i], (W, H), interpolation=cv2.INTER_LINEAR)
pred_mask[pred_mask >= 0.5] = 1.0
pred_mask[pred_mask < 0.5] = 0.0
cv2.imwrite(
os.path.join(opt.trace, "pred_mask",
str(i).zfill(6) + "_10_plot.png"),
grey_cmap(pred_mask))
cv2.imwrite(
os.path.join(opt.trace, "pred_mask", str(i).zfill(6) + "_10.png"),
pred_mask)
pa_res += pixel_accuracy(pred_mask, gt_mask)
ma_res += mean_accuracy(pred_mask, gt_mask)
mIU, IU = mean_IU(pred_mask, gt_mask)
mIU_res += mIU
IU_res += IU
fwIU_res += frequency_weighted_IU(pred_mask, gt_mask)
return pa_res / 200., ma_res / 200., mIU_res / 200., fwIU_res / 200., IU_res / 200.
================================================
FILE: core/evaluation/evaluation_utils.py
================================================
import numpy as np
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
import cv2, skimage
import skimage.io
#import scipy.misc as sm
import imageio as sm
# Adopted from https://github.com/mrharicot/monodepth
def compute_errors(gt, pred, nyu=False):
thresh = np.maximum((gt / pred), (pred / gt))
a1 = (thresh < 1.25).mean()
a2 = (thresh < 1.25**2).mean()
a3 = (thresh < 1.25**3).mean()
rmse = (gt - pred)**2
rmse = np.sqrt(rmse.mean())
rmse_log = (np.log(gt) - np.log(pred))**2
rmse_log = np.sqrt(rmse_log.mean())
log10 = np.mean(np.abs((np.log10(gt) - np.log10(pred))))
abs_rel = np.mean(np.abs(gt - pred) / (gt))
sq_rel = np.mean(((gt - pred)**2) / (gt))
if nyu:
return abs_rel, sq_rel, rmse, log10, a1, a2, a3
else:
return abs_rel, sq_rel, rmse, rmse_log, a1, a2, a3
================================================
FILE: core/evaluation/flowlib.py
================================================
#!/usr/bin/python
"""
Adopted from https://github.com/liruoteng/OpticalFlowToolkit
# ==============================
# flowlib.py
# library for optical flow processing
# Author: Ruoteng Li
# Date: 6th Aug 2016
# ==============================
"""
import png
import scipy
import numpy as np
import matplotlib.colors as cl
import matplotlib.pyplot as plt
from PIL import Image
#import cv2
UNKNOWN_FLOW_THRESH = 1e7
SMALLFLOW = 0.0
LARGEFLOW = 1e8
"""
=============
Flow Section
=============
"""
def show_flow(filename):
"""
visualize optical flow map using matplotlib
:param filename: optical flow file
:return: None
"""
flow = read_flow(filename)
img = flow_to_image(flow)
plt.imshow(img)
plt.show()
def visualize_flow(flow, mode='Y'):
"""
this function visualize the input flow
:param flow: input flow in array
:param mode: choose which color mode to visualize the flow (Y: Ccbcr, RGB: RGB color)
:return: None
"""
if mode == 'Y':
# Ccbcr color wheel
img = flow_to_image(flow)
plt.imshow(img)
plt.show()
elif mode == 'RGB':
(h, w) = flow.shape[0:2]
du = flow[:, :, 0]
dv = flow[:, :, 1]
valid = flow[:, :, 2]
max_flow = max(np.max(du), np.max(dv))
img = np.zeros((h, w, 3), dtype=np.float64)
# angle layer
img[:, :, 0] = np.arctan2(dv, du) / (2 * np.pi)
# magnitude layer, normalized to 1
img[:, :, 1] = np.sqrt(du * du + dv * dv) * 8 / max_flow
# phase layer
img[:, :, 2] = 8 - img[:, :, 1]
# clip to [0,1]
small_idx = img[:, :, 0:3] < 0
large_idx = img[:, :, 0:3] > 1
img[small_idx] = 0
img[large_idx] = 1
# convert to rgb
img = cl.hsv_to_rgb(img)
# remove invalid point
img[:, :, 0] = img[:, :, 0] * valid
img[:, :, 1] = img[:, :, 1] * valid
img[:, :, 2] = img[:, :, 2] * valid
# show
plt.imshow(img)
plt.show()
return None
def read_flow(filename):
"""
read optical flow from Middlebury .flo file
:param filename: name of the flow file
:return: optical flow data in matrix
"""
f = open(filename, 'rb')
magic = np.fromfile(f, np.float32, count=1)
data2d = None
if 202021.25 != magic:
print('Magic number incorrect. Invalid .flo file')
else:
w = np.fromfile(f, np.int32, count=1)[0]
h = np.fromfile(f, np.int32, count=1)[0]
print("Reading %d x %d flo file" % (h, w))
data2d = np.fromfile(f, np.float32, count=2 * w * h)
# reshape data into 3D array (columns, rows, channels)
data2d = np.resize(data2d, (h, w, 2))
f.close()
return data2d
def read_flow_png(flow_file):
"""
Read optical flow from KITTI .png file
:param flow_file: name of the flow file
:return: optical flow data in matrix
"""
flow_object = png.Reader(filename=flow_file)
flow_direct = flow_object.asDirect()
flow_data = list(flow_direct[2])
(w, h) = flow_direct[3]['size']
flow = np.zeros((h, w, 3), dtype=np.float64)
for i in range(len(flow_data)):
flow[i, :, 0] = flow_data[i][0::3]
flow[i, :, 1] = flow_data[i][1::3]
flow[i, :, 2] = flow_data[i][2::3]
invalid_idx = (flow[:, :, 2] == 0)
flow[:, :, 0:2] = (flow[:, :, 0:2] - 2**15) / 64.0
flow[invalid_idx, 0] = 0
flow[invalid_idx, 1] = 0
return flow
def write_flow_png(flo, flow_file):
h, w, _ = flo.shape
out_flo = np.ones((h, w, 3), dtype=np.float32)
out_flo[:, :, 0] = np.maximum(
np.minimum(flo[:, :, 0] * 64.0 + 2**15, 2**16 - 1), 0)
out_flo[:, :, 1] = np.maximum(
np.minimum(flo[:, :, 1] * 64.0 + 2**15, 2**16 - 1), 0)
out_flo = out_flo.astype(np.uint16)
with open(flow_file, 'wb') as f:
writer = png.Writer(width=w, height=h, bitdepth=16)
# Convert z to the Python list of lists expected by
# the png writer.
z2list = out_flo.reshape(-1, w * 3).tolist()
writer.write(f, z2list)
def write_flow(flow, filename):
"""
write optical flow in Middlebury .flo format
:param flow: optical flow map
:param filename: optical flow file path to be saved
:return: None
"""
f = open(filename, 'wb')
magic = np.array([202021.25], dtype=np.float32)
(height, width) = flow.shape[0:2]
w = np.array([width], dtype=np.int32)
h = np.array([height], dtype=np.int32)
magic.tofile(f)
w.tofile(f)
h.tofile(f)
flow.tofile(f)
f.close()
def segment_flow(flow):
h = flow.shape[0]
w = flow.shape[1]
u = flow[:, :, 0]
v = flow[:, :, 1]
idx = ((abs(u) > LARGEFLOW) | (abs(v) > LARGEFLOW))
idx2 = (abs(u) == SMALLFLOW)
class0 = (v == 0) & (u == 0)
u[idx2] = 0.00001
tan_value = v / u
class1 = (tan_value < 1) & (tan_value >= 0) & (u > 0) & (v >= 0)
class2 = (tan_value >= 1) & (u >= 0) & (v >= 0)
class3 = (tan_value < -1) & (u <= 0) & (v >= 0)
class4 = (tan_value < 0) & (tan_value >= -1) & (u < 0) & (v >= 0)
class8 = (tan_value >= -1) & (tan_value < 0) & (u > 0) & (v <= 0)
class7 = (tan_value < -1) & (u >= 0) & (v <= 0)
class6 = (tan_value >= 1) & (u <= 0) & (v <= 0)
class5 = (tan_value >= 0) & (tan_value < 1) & (u < 0) & (v <= 0)
seg = np.zeros((h, w))
seg[class1] = 1
seg[class2] = 2
seg[class3] = 3
seg[class4] = 4
seg[class5] = 5
seg[class6] = 6
seg[class7] = 7
seg[class8] = 8
seg[class0] = 0
seg[idx] = 0
return seg
def flow_error(tu, tv, u, v):
"""
Calculate average end point error
:param tu: ground-truth horizontal flow map
:param tv: ground-truth vertical flow map
:param u: estimated horizontal flow map
:param v: estimated vertical flow map
:return: End point error of the estimated flow
"""
smallflow = 0.0
'''
stu = tu[bord+1:end-bord,bord+1:end-bord]
stv = tv[bord+1:end-bord,bord+1:end-bord]
su = u[bord+1:end-bord,bord+1:end-bord]
sv = v[bord+1:end-bord,bord+1:end-bord]
'''
stu = tu[:]
stv = tv[:]
su = u[:]
sv = v[:]
idxUnknow = (abs(stu) > UNKNOWN_FLOW_THRESH) | (
abs(stv) > UNKNOWN_FLOW_THRESH)
stu[idxUnknow] = 0
stv[idxUnknow] = 0
su[idxUnknow] = 0
sv[idxUnknow] = 0
ind2 = [(np.absolute(stu) > smallflow) | (np.absolute(stv) > smallflow)]
index_su = su[ind2]
index_sv = sv[ind2]
an = 1.0 / np.sqrt(index_su**2 + index_sv**2 + 1)
un = index_su * an
vn = index_sv * an
index_stu = stu[ind2]
index_stv = stv[ind2]
tn = 1.0 / np.sqrt(index_stu**2 + index_stv**2 + 1)
tun = index_stu * tn
tvn = index_stv * tn
'''
angle = un * tun + vn * tvn + (an * tn)
index = [angle == 1.0]
angle[index] = 0.999
ang = np.arccos(angle)
mang = np.mean(ang)
mang = mang * 180 / np.pi
'''
epe = np.sqrt((stu - su)**2 + (stv - sv)**2)
epe = epe[ind2]
mepe = np.mean(epe)
return mepe
def flow_to_image(flow):
"""
Convert flow into middlebury color code image
:param flow: optical flow map
:return: optical flow image in middlebury color
"""
u = flow[:, :, 0]
v = flow[:, :, 1]
maxu = -999.
maxv = -999.
minu = 999.
minv = 999.
idxUnknow = (abs(u) > UNKNOWN_FLOW_THRESH) | (abs(v) > UNKNOWN_FLOW_THRESH)
u[idxUnknow] = 0
v[idxUnknow] = 0
maxu = max(maxu, np.max(u))
minu = min(minu, np.min(u))
maxv = max(maxv, np.max(v))
minv = min(minv, np.min(v))
rad = np.sqrt(u**2 + v**2)
maxrad = max(-1, np.max(rad))
print("max flow: %.4f\nflow range:\nu = %.3f .. %.3f\nv = %.3f .. %.3f" % (
maxrad, minu, maxu, minv, maxv))
u = u / (maxrad + np.finfo(float).eps)
v = v / (maxrad + np.finfo(float).eps)
img = compute_color(u, v)
idx = np.repeat(idxUnknow[:, :, np.newaxis], 3, axis=2)
img[idx] = 0
return np.uint8(img)
def evaluate_flow_file(gt, pred):
"""
evaluate the estimated optical flow end point error according to ground truth provided
:param gt: ground truth file path
:param pred: estimated optical flow file path
:return: end point error, float32
"""
# Read flow files and calculate the errors
gt_flow = read_flow(gt) # ground truth flow
eva_flow = read_flow(pred) # predicted flow
# Calculate errors
average_pe = flow_error(gt_flow[:, :, 0], gt_flow[:, :, 1],
eva_flow[:, :, 0], eva_flow[:, :, 1])
return average_pe
def evaluate_flow(gt_flow, pred_flow):
"""
gt: ground-truth flow
pred: estimated flow
"""
average_pe = flow_error(gt_flow[:, :, 0], gt_flow[:, :, 1],
pred_flow[:, :, 0], pred_flow[:, :, 1])
return average_pe
"""
==============
Disparity Section
==============
"""
def read_disp_png(file_name):
"""
Read optical flow from KITTI .png file
:param file_name: name of the flow file
:return: optical flow data in matrix
"""
image_object = png.Reader(filename=file_name)
image_direct = image_object.asDirect()
image_data = list(image_direct[2])
(w, h) = image_direct[3]['size']
channel = len(image_data[0]) / w
flow = np.zeros((h, w, channel), dtype=np.uint16)
for i in range(len(image_data)):
for j in range(channel):
flow[i, :, j] = image_data[i][j::channel]
return flow[:, :, 0] / 256
def disp_to_flowfile(disp, filename):
"""
Read KITTI disparity file in png format
:param disp: disparity matrix
:param filename: the flow file name to save
:return: None
"""
f = open(filename, 'wb')
magic = np.array([202021.25], dtype=np.float32)
(height, width) = disp.shape[0:2]
w = np.array([width], dtype=np.int32)
h = np.array([height], dtype=np.int32)
empty_map = np.zeros((height, width), dtype=np.float32)
data = np.dstack((disp, empty_map))
magic.tofile(f)
w.tofile(f)
h.tofile(f)
data.tofile(f)
f.close()
"""
==============
Image Section
==============
"""
def read_image(filename):
"""
Read normal image of any format
:param filename: name of the image file
:return: image data in matrix uint8 type
"""
img = Image.open(filename)
im = np.array(img)
return im
def warp_image(im, flow):
"""
Use optical flow to warp image to the next
:param im: image to warp
:param flow: optical flow
:return: warped image
"""
image_height = im.shape[0]
image_width = im.shape[1]
flow_height = flow.shape[0]
flow_width = flow.shape[1]
n = image_height * image_width
(iy, ix) = np.mgrid[0:image_height, 0:image_width]
(fy, fx) = np.mgrid[0:flow_height, 0:flow_width]
fx += flow[:, :, 0]
fy += flow[:, :, 1]
mask = fx < 0 | fx > flow_width | fy < 0 | fy > flow_height
fx = np.min(np.max(fx, 0), flow_width)
fy = np.min(np.max(fy, 0), flow_height)
points = np.concatenate((ix.reshape(n, 1), iy.reshape(n, 1)), axis=1)
xi = np.concatenate((fx.reshape(n, 1), fy.reshape(n, 1)), axis=1)
warp = np.zeros((image_height, image_width, im.shape[2]))
for i in range(im.shape[2]):
channel = im[:, :, i]
values = channel.reshape(n, 1)
new_channel = scipy.interpolate.griddata(
points, values, xi, method='cubic')
new_channel[mask] = 1
warp[:, :, i] = new_channel
return warp
"""
==============
Others
==============
"""
def scale_image(image, new_range):
"""
Linearly scale the image into desired range
:param image: input image
:param new_range: the new range to be aligned
:return: image normalized in new range
"""
min_val = np.min(image).astype(np.float32)
max_val = np.max(image).astype(np.float32)
min_val_new = np.array(min(new_range), dtype=np.float32)
max_val_new = np.array(max(new_range), dtype=np.float32)
scaled_image = (image - min_val) / (max_val - min_val) * (
max_val_new - min_val_new) + min_val_new
return scaled_image.astype(np.uint8)
def compute_color(u, v):
"""
compute optical flow color map
:param u: optical flow horizontal map
:param v: optical flow vertical map
:return: optical flow in color code
"""
[h, w] = u.shape
img = np.zeros([h, w, 3])
nanIdx = np.isnan(u) | np.isnan(v)
u[nanIdx] = 0
v[nanIdx] = 0
colorwheel = make_color_wheel()
ncols = np.size(colorwheel, 0)
rad = np.sqrt(u**2 + v**2)
a = np.arctan2(-v, -u) / np.pi
fk = (a + 1) / 2 * (ncols - 1) + 1
k0 = np.floor(fk).astype(int)
k1 = k0 + 1
k1[k1 == ncols + 1] = 1
f = fk - k0
for i in range(0, np.size(colorwheel, 1)):
tmp = colorwheel[:, i]
col0 = tmp[k0 - 1] / 255
col1 = tmp[k1 - 1] / 255
col = (1 - f) * col0 + f * col1
idx = rad <= 1
col[idx] = 1 - rad[idx] * (1 - col[idx])
notidx = np.logical_not(idx)
col[notidx] *= 0.75
img[:, :, i] = np.uint8(np.floor(255 * col * (1 - nanIdx)))
return img
def make_color_wheel():
"""
Generate color wheel according Middlebury color code
:return: Color wheel
"""
RY = 15
YG = 6
GC = 4
CB = 11
BM = 13
MR = 6
ncols = RY + YG + GC + CB + BM + MR
colorwheel = np.zeros([ncols, 3])
col = 0
# RY
colorwheel[0:RY, 0] = 255
colorwheel[0:RY, 1] = np.transpose(np.floor(255 * np.arange(0, RY) / RY))
col += RY
# YG
colorwheel[col:col + YG, 0] = 255 - np.transpose(
np.floor(255 * np.arange(0, YG) / YG))
colorwheel[col:col + YG, 1] = 255
col += YG
# GC
colorwheel[col:col + GC, 1] = 255
colorwheel[col:col + GC, 2] = np.transpose(
np.floor(255 * np.arange(0, GC) / GC))
col += GC
# CB
colorwheel[col:col + CB, 1] = 255 - np.transpose(
np.floor(255 * np.arange(0, CB) / CB))
colorwheel[col:col + CB, 2] = 255
col += CB
# BM
colorwheel[col:col + BM, 2] = 255
colorwheel[col:col + BM, 0] = np.transpose(
np.floor(255 * np.arange(0, BM) / BM))
col += +BM
# MR
colorwheel[col:col + MR, 2] = 255 - np.transpose(
np.floor(255 * np.arange(0, MR) / MR))
colorwheel[col:col + MR, 0] = 255
return colorwheel
================================================
FILE: core/networks/__init__.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from model_flow import Model_flow
from model_triangulate_pose import Model_triangulate_pose
from model_depth_pose import Model_depth_pose
from model_flowposenet import Model_flowposenet
def get_model(mode):
if mode == 'flow':
return Model_flow
elif mode == 'pose' or mode == 'pose_flow':
return Model_triangulate_pose
elif mode == 'depth' or mode == 'depth_pose':
return Model_depth_pose
elif mode == 'flowposenet':
return Model_flowposenet
else:
raise ValueError('Mode {} not found.'.format(mode))
================================================
FILE: core/networks/model_depth_pose.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from structures import *
from model_triangulate_pose import Model_triangulate_pose
from pytorch_ssim import SSIM
sys.path.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', 'visualize'))
from visualizer import *
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import pdb
class Model_depth_pose(nn.Module):
def __init__(self, cfg):
super(Model_depth_pose, self).__init__()
self.depth_match_num = cfg.depth_match_num
self.depth_sample_ratio = cfg.depth_sample_ratio
self.depth_scale = cfg.depth_scale
self.w_flow_error = cfg.w_flow_error
self.dataset = cfg.dataset
self.depth_net = Depth_Model(cfg.depth_scale)
self.model_pose = Model_triangulate_pose(cfg)
def meshgrid(self, h, w):
xx, yy = np.meshgrid(np.arange(0,w), np.arange(0,h))
meshgrid = np.transpose(np.stack([xx,yy], axis=-1), [2,0,1]) # [2,h,w]
meshgrid = torch.from_numpy(meshgrid)
return meshgrid
def robust_rand_sample(self, match, mask, num):
# match: [b, 4, -1] mask: [b, 1, -1]
b, n = match.shape[0], match.shape[2]
nonzeros_num = torch.min(torch.sum(mask > 0, dim=-1)) # []
if nonzeros_num.detach().cpu().numpy() == n:
rand_int = torch.randint(0, n, [num])
select_match = match[:,:,rand_int]
else:
# If there is zero score in match, sample the non-zero matches.
num = np.minimum(nonzeros_num.detach().cpu().numpy(), num)
select_idxs = []
for i in range(b):
nonzero_idx = torch.nonzero(mask[i,0,:]) # [nonzero_num,1]
rand_int = torch.randint(0, nonzero_idx.shape[0], [int(num)])
select_idx = nonzero_idx[rand_int, :] # [num, 1]
select_idxs.append(select_idx)
select_idxs = torch.stack(select_idxs, 0) # [b,num,1]
select_match = torch.gather(match.transpose(1,2), index=select_idxs.repeat(1,1,4), dim=1).transpose(1,2) # [b, 4, num]
return select_match, num
def top_ratio_sample(self, match, mask, ratio):
# match: [b, 4, -1] mask: [b, 1, -1]
b, total_num = match.shape[0], match.shape[-1]
scores, indices = torch.topk(mask, int(ratio*total_num), dim=-1) # [B, 1, ratio*tnum]
select_match = torch.gather(match.transpose(1,2), index=indices.squeeze(1).unsqueeze(-1).repeat(1,1,4), dim=1).transpose(1,2) # [b, 4, ratio*tnum]
return select_match, scores
def rand_sample(self, match, num):
b, c, n = match.shape[0], match.shape[1], match.shape[2]
rand_int = torch.randint(0, match.shape[-1], size=[num])
select_pts = match[:,:,rand_int]
return select_pts
def filt_negative_depth(self, point2d_1_depth, point2d_2_depth, point2d_1_coord, point2d_2_coord):
# Filter out the negative projection depth.
# point2d_1_depth: [b, n, 1]
b, n = point2d_1_depth.shape[0], point2d_1_depth.shape[1]
mask = (point2d_1_depth > 0.01).float() * (point2d_2_depth > 0.01).float()
select_idxs = []
flag = 0
for i in range(b):
if torch.sum(mask[i,:,0]) == n:
idx = torch.arange(n).to(mask.get_device())
else:
nonzero_idx = torch.nonzero(mask[i,:,0]).squeeze(1) # [k]
if nonzero_idx.shape[0] < 0.1*n:
idx = torch.arange(n).to(mask.get_device())
flag = 1
else:
res = torch.randint(0, nonzero_idx.shape[0], size=[n-nonzero_idx.shape[0]]).to(mask.get_device()) # [n-nz]
idx = torch.cat([nonzero_idx, nonzero_idx[res]], 0)
select_idxs.append(idx)
select_idxs = torch.stack(select_idxs, dim=0) # [b,n]
point2d_1_depth = torch.gather(point2d_1_depth, index=select_idxs.unsqueeze(-1), dim=1) # [b,n,1]
point2d_2_depth = torch.gather(point2d_2_depth, index=select_idxs.unsqueeze(-1), dim=1) # [b,n,1]
point2d_1_coord = torch.gather(point2d_1_coord, index=select_idxs.unsqueeze(-1).repeat(1,1,2), dim=1) # [b,n,2]
point2d_2_coord = torch.gather(point2d_2_coord, index=select_idxs.unsqueeze(-1).repeat(1,1,2), dim=1) # [b,n,2]
return point2d_1_depth, point2d_2_depth, point2d_1_coord, point2d_2_coord, flag
def filt_invalid_coord(self, point2d_1_depth, point2d_2_depth, point2d_1_coord, point2d_2_coord, max_h, max_w):
# Filter out the negative projection depth.
# point2d_1_depth: [b, n, 1]
b, n = point2d_1_coord.shape[0], point2d_1_coord.shape[1]
max_coord = torch.Tensor([max_w, max_h]).to(point2d_1_coord.get_device())
mask = (point2d_1_coord > 0).all(dim=-1, keepdim=True).float() * (point2d_2_coord > 0).all(dim=-1, keepdim=True).float() * \
(point2d_1_coord < max_coord).all(dim=-1, keepdim=True).float() * (point2d_2_coord < max_coord).all(dim=-1, keepdim=True).float()
flag = 0
if torch.sum(1.0-mask) == 0:
return point2d_1_depth, point2d_2_depth, point2d_1_coord, point2d_2_coord, flag
select_idxs = []
for i in range(b):
if torch.sum(mask[i,:,0]) == n:
idx = torch.arange(n).to(mask.get_device())
else:
nonzero_idx = torch.nonzero(mask[i,:,0]).squeeze(1) # [k]
if nonzero_idx.shape[0] < 0.1*n:
idx = torch.arange(n).to(mask.get_device())
flag = 1
else:
res = torch.randint(0, nonzero_idx.shape[0], size=[n-nonzero_idx.shape[0]]).to(mask.get_device())
idx = torch.cat([nonzero_idx, nonzero_idx[res]], 0)
select_idxs.append(idx)
select_idxs = torch.stack(select_idxs, dim=0) # [b,n]
point2d_1_depth = torch.gather(point2d_1_depth, index=select_idxs.unsqueeze(-1), dim=1) # [b,n,1]
point2d_2_depth = torch.gather(point2d_2_depth, index=select_idxs.unsqueeze(-1), dim=1) # [b,n,1]
point2d_1_coord = torch.gather(point2d_1_coord, index=select_idxs.unsqueeze(-1).repeat(1,1,2), dim=1) # [b,n,2]
point2d_2_coord = torch.gather(point2d_2_coord, index=select_idxs.unsqueeze(-1).repeat(1,1,2), dim=1) # [b,n,2]
return point2d_1_depth, point2d_2_depth, point2d_1_coord, point2d_2_coord, flag
def ray_angle_filter(self, match, P1, P2, return_angle=False):
# match: [b, 4, n] P: [B, 3, 4]
b, n = match.shape[0], match.shape[2]
K = P1[:,:,:3] # P1 with identity rotation and zero translation
K_inv = torch.inverse(K)
RT1 = K_inv.bmm(P1) # [b, 3, 4]
RT2 = K_inv.bmm(P2)
ones = torch.ones([b,1,n]).to(match.get_device())
pts1 = torch.cat([match[:,:2,:], ones], 1)
pts2 = torch.cat([match[:,2:,:], ones], 1)
ray1_dir = (RT1[:,:,:3].transpose(1,2)).bmm(K_inv).bmm(pts1)# [b,3,n]
ray1_dir = ray1_dir / (torch.norm(ray1_dir, dim=1, keepdim=True, p=2) + 1e-12)
ray1_origin = (-1) * RT1[:,:,:3].transpose(1,2).bmm(RT1[:,:,3].unsqueeze(-1)) # [b, 3, 1]
ray2_dir = (RT2[:,:,:3].transpose(1,2)).bmm(K_inv).bmm(pts2) # [b,3,n]
ray2_dir = ray2_dir / (torch.norm(ray2_dir, dim=1, keepdim=True, p=2) + 1e-12)
ray2_origin = (-1) * RT2[:,:,:3].transpose(1,2).bmm(RT2[:,:,3].unsqueeze(-1)) # [b, 3, 1]
# We compute the angle betwwen vertical line from ray1 origin to ray2 and ray1.
p1p2 = (ray1_origin - ray2_origin).repeat(1,1,n)
verline = ray2_origin.repeat(1,1,n) + torch.sum(p1p2 * ray2_dir, dim=1, keepdim=True) * ray2_dir - ray1_origin.repeat(1,1,n) # [b,3,n]
cosvalue = torch.sum(ray1_dir * verline, dim=1, keepdim=True) / \
((torch.norm(ray1_dir, dim=1, keepdim=True, p=2) + 1e-12) * (torch.norm(verline, dim=1, keepdim=True, p=2) + 1e-12))# [b,1,n]
mask = (cosvalue > 0.001).float() # we drop out angles less than 1' [b,1,n]
flag = 0
num = torch.min(torch.sum(mask, -1)).int()
if num.cpu().detach().numpy() == 0:
flag = 1
filt_match = match[:,:,:100]
if return_angle:
return filt_match, flag, torch.zeros_like(mask).to(filt_match.get_device())
else:
return filt_match, flag
nonzero_idx = []
for i in range(b):
idx = torch.nonzero(mask[i,0,:])[:num] # [num,1]
nonzero_idx.append(idx)
nonzero_idx = torch.stack(nonzero_idx, 0) # [b,num,1]
filt_match = torch.gather(match.transpose(1,2), index=nonzero_idx.repeat(1,1,4), dim=1).transpose(1,2) # [b,4,num]
if return_angle:
return filt_match, flag, mask
else:
return filt_match, flag
def midpoint_triangulate(self, match, K_inv, P1, P2):
# match: [b, 4, num] P1: [b, 3, 4]
# Match is in the image coordinates. P1, P2 is camera parameters. [B, 3, 4] match: [B, M, 4]
b, n = match.shape[0], match.shape[2]
RT1 = K_inv.bmm(P1) # [b, 3, 4]
RT2 = K_inv.bmm(P2)
ones = torch.ones([b,1,n]).to(match.get_device())
pts1 = torch.cat([match[:,:2,:], ones], 1)
pts2 = torch.cat([match[:,2:,:], ones], 1)
ray1_dir = (RT1[:,:,:3].transpose(1,2)).bmm(K_inv).bmm(pts1)# [b,3,n]
ray1_dir = ray1_dir / (torch.norm(ray1_dir, dim=1, keepdim=True, p=2) + 1e-12)
ray1_origin = (-1) * RT1[:,:,:3].transpose(1,2).bmm(RT1[:,:,3].unsqueeze(-1)) # [b, 3, 1]
ray2_dir = (RT2[:,:,:3].transpose(1,2)).bmm(K_inv).bmm(pts2) # [b,3,n]
ray2_dir = ray2_dir / (torch.norm(ray2_dir, dim=1, keepdim=True, p=2) + 1e-12)
ray2_origin = (-1) * RT2[:,:,:3].transpose(1,2).bmm(RT2[:,:,3].unsqueeze(-1)) # [b, 3, 1]
dir_cross = torch.cross(ray1_dir, ray2_dir, dim=1) # [b,3,n]
denom = 1.0 / (torch.sum(dir_cross * dir_cross, dim=1, keepdim=True)+1e-12) # [b,1,n]
origin_vec = (ray2_origin - ray1_origin).repeat(1,1,n) # [b,3,n]
a1 = origin_vec.cross(ray2_dir, dim=1) # [b,3,n]
a1 = torch.sum(a1 * dir_cross, dim=1, keepdim=True) * denom # [b,1,n]
a2 = origin_vec.cross(ray1_dir, dim=1) # [b,3,n]
a2 = torch.sum(a2 * dir_cross, dim=1, keepdim=True) * denom # [b,1,n]
p1 = ray1_origin + a1 * ray1_dir
p2 = ray2_origin + a2 * ray2_dir
point = (p1 + p2) / 2.0 # [b,3,n]
# Convert to homo coord to get consistent with other functions.
point_homo = torch.cat([point, ones], dim=1).transpose(1,2) # [b,n,4]
return point_homo
def rt_from_fundamental_mat_nyu(self, fmat, K, depth_match):
# F: [b, 3, 3] K: [b, 3, 3] depth_match: [b ,4, n]
#verify_match = self.rand_sample(depth_match, 5000) # [b,4,100]
verify_match = depth_match.transpose(1,2).cpu().detach().numpy()
K_inv = torch.inverse(K)
b = fmat.shape[0]
fmat_ = K.transpose(1,2).bmm(fmat)
essential_mat = fmat_.bmm(K)
iden = torch.cat([torch.eye(3), torch.zeros([3,1])], -1).unsqueeze(0).repeat(b,1,1).to(K.get_device()) # [b,3,4]
P1 = K.bmm(iden)
flags = []
number_inliers = []
P2 = []
for i in range(b):
cnum, R, t, _ = cv2.recoverPose(essential_mat[i].cpu().detach().numpy().astype('float64'), verify_match[i,:,:2].astype('float64'), \
verify_match[i,:,2:].astype('float64'), cameraMatrix=K[i,:,:].cpu().detach().numpy().astype('float64'))
p2 = torch.from_numpy(np.concatenate([R, t], axis=-1)).float().to(K.get_device())
P2.append(p2)
if cnum > depth_match.shape[-1] / 7.0:
flags.append(1)
else:
flags.append(0)
number_inliers.append(cnum)
P2 = K.bmm(torch.stack(P2, axis=0))
#pdb.set_trace()
return P1, P2, flags
def verifyRT(self, match, K_inv, P1, P2):
# match: [b, 4, n] P1: [b,3,4] P2: [b,3,4]
b, n = match.shape[0], match.shape[2]
point3d = self.midpoint_triangulate(match, K_inv, P1, P2).reshape([-1,4]).unsqueeze(-1) # [b*n, 4, 1]
P1_ = P1.repeat(n,1,1)
P2_ = P2.repeat(n,1,1)
depth1 = P1_.bmm(point3d)[:,-1,:] / point3d[:,-1,:] # [b*n, 1]
depth2 = P2_.bmm(point3d)[:,-1,:] / point3d[:,-1,:]
inlier_num = torch.sum((depth1.view([b,n]) > 0).float() * (depth2.view([b,n]) > 0).float(), 1) # [b]
return inlier_num
def rt_from_fundamental_mat(self, fmat, K, depth_match):
# F: [b, 3, 3] K: [b, 3, 3] depth_match: [b ,4, n]
verify_match = self.rand_sample(depth_match, 200) # [b,4,100]
K_inv = torch.inverse(K)
b = fmat.shape[0]
fmat_ = K.transpose(1,2).bmm(fmat)
essential_mat = fmat_.bmm(K)
essential_mat_cpu = essential_mat.cpu()
U, S, V = torch.svd(essential_mat_cpu)
U, S, V = U.to(K.get_device()), S.to(K.get_device()), V.to(K.get_device())
W = torch.from_numpy(np.array([[[0., -1., 0.],[1., 0., 0.],[0., 0., 1.]]])).float().repeat(b,1,1).to(K.get_device())
# R = UWV^t or UW^tV^t t = U[:,2] the third column of U
R1 = U.bmm(W).bmm(V.transpose(1,2)) # Do we need matrix determinant sign?
R1 = torch.sign(torch.det(R1)).unsqueeze(-1).unsqueeze(-1) * R1
R2 = U.bmm(W.transpose(1,2)).bmm(V.transpose(1,2))
R2 = torch.sign(torch.det(R2)).unsqueeze(-1).unsqueeze(-1) * R2
t1 = U[:,:,2].unsqueeze(-1) # The third column
t2 = -U[:,:,2].unsqueeze(-1) # Inverse direction
iden = torch.cat([torch.eye(3), torch.zeros([3,1])], -1).unsqueeze(0).repeat(b,1,1).to(K.get_device()) # [b,3,4]
P1 = K.bmm(iden)
P2_1 = K.bmm(torch.cat([R1, t1], -1))
P2_2 = K.bmm(torch.cat([R2, t1], -1))
P2_3 = K.bmm(torch.cat([R1, t2], -1))
P2_4 = K.bmm(torch.cat([R2, t2], -1))
P2_c = [P2_1, P2_2, P2_3, P2_4]
flags = []
for i in range(4):
with torch.no_grad():
inlier_num = self.verifyRT(verify_match, K_inv, P1, P2_c[i])
flags.append(inlier_num)
P2_c = torch.stack(P2_c, dim=1) # [B, 4, 3, 4]
flags = torch.stack(flags, dim=1) # [B, 4]
idx = torch.argmax(flags, dim=-1, keepdim=True) # [b,1]
P2 = torch.gather(P2_c, index=idx.unsqueeze(-1).unsqueeze(-1).repeat(1,1,3,4), dim=1).squeeze(1) # [b,3,4]
#pdb.set_trace()
return P1, P2
def reproject(self, P, point3d):
# P: [b,3,4] point3d: [b,n,4]
point2d = P.bmm(point3d.transpose(1,2)) # [b,4,n]
point2d_coord = (point2d[:,:2,:] / (point2d[:,2,:].unsqueeze(1) + 1e-12)).transpose(1,2) # [b,n,2]
point2d_depth = point2d[:,2,:].unsqueeze(1).transpose(1,2) # [b,n,1]
return point2d_coord, point2d_depth
def scale_adapt(self, depth1, depth2, eps=1e-12):
with torch.no_grad():
A = torch.sum((depth1 ** 2) / (depth2 ** 2 + eps), dim=1) # [b,1]
C = torch.sum(depth1 / (depth2 + eps), dim=1) # [b,1]
a = C / (A + eps)
return a
def affine_adapt(self, depth1, depth2, use_translation=True, eps=1e-12):
a_scale = self.scale_adapt(depth1, depth2, eps=eps)
if not use_translation: # only fit the scale parameter
return a_scale, torch.zeros_like(a_scale)
else:
with torch.no_grad():
A = torch.sum((depth1 ** 2) / (depth2 ** 2 + eps), dim=1) # [b,1]
B = torch.sum(depth1 / (depth2 ** 2 + eps), dim=1) # [b,1]
C = torch.sum(depth1 / (depth2 + eps), dim=1) # [b,1]
D = torch.sum(1.0 / (depth2 ** 2 + eps), dim=1) # [b,1]
E = torch.sum(1.0 / (depth2 + eps), dim=1) # [b,1]
a = (B*E - D*C) / (B*B - A*D + 1e-12)
b = (B*C - A*E) / (B*B - A*D + 1e-12)
# check ill condition
cond = (B*B - A*D)
valid = (torch.abs(cond) > 1e-4).float()
a = a * valid + a_scale * (1 - valid)
b = b * valid
return a, b
def register_depth(self, depth_pred, coord_tri, depth_tri):
# depth_pred: [b, 1, h, w] coord_tri: [b,n,2] depth_tri: [b,n,1]
batch, _, h, w = depth_pred.shape[0], depth_pred.shape[1], depth_pred.shape[2], depth_pred.shape[3]
n = depth_tri.shape[1]
coord_tri_nor = torch.stack([2.0*coord_tri[:,:,0] / (w-1.0) - 1.0, 2.0*coord_tri[:,:,1] / (h-1.0) - 1.0], -1)
depth_inter = F.grid_sample(depth_pred, coord_tri_nor.view([batch,n,1,2]), padding_mode='reflection').squeeze(-1).transpose(1,2) # [b,n,1]
# Normalize
scale = torch.median(depth_inter, 1)[0] / (torch.median(depth_tri, 1)[0] + 1e-12)
scale = scale.detach() # [b,1]
scale_depth_inter = depth_inter / (scale.unsqueeze(-1) + 1e-12)
scale_depth_pred = depth_pred / (scale.unsqueeze(-1).unsqueeze(-1) + 1e-12)
# affine adapt
a, b = self.affine_adapt(scale_depth_inter, depth_tri, use_translation=False)
affine_depth_inter = a.unsqueeze(1) * scale_depth_inter + b.unsqueeze(1) # [b,n,1]
affine_depth_pred = a.unsqueeze(-1).unsqueeze(-1) * scale_depth_pred + b.unsqueeze(-1).unsqueeze(-1) # [b,1,h,w]
return affine_depth_pred, affine_depth_inter
def get_trian_loss(self, tri_depth, pred_tri_depth):
# depth: [b,n,1]
loss = torch.pow(1.0 - pred_tri_depth / (tri_depth + 1e-12), 2).mean((1,2))
return loss
def get_reproj_fdp_loss(self, pred1, pred2, P2, K, K_inv, valid_mask, rigid_mask, flow, visualizer=None):
# pred: [b,1,h,w] Rt: [b,3,4] K: [b,3,3] mask: [b,1,h,w] flow: [b,2,h,w]
b, h, w = pred1.shape[0], pred1.shape[2], pred1.shape[3]
xy = self.meshgrid(h,w).unsqueeze(0).repeat(b,1,1,1).float().to(flow.get_device()) # [b,2,h,w]
ones = torch.ones([b,1,h,w]).float().to(flow.get_device())
pts1_3d = K_inv.bmm(torch.cat([xy, ones], 1).view([b,3,-1])) * pred1.view([b,1,-1]) # [b,3,h*w]
pts2_coord, pts2_depth = self.reproject(P2, torch.cat([pts1_3d, ones.view([b,1,-1])], 1).transpose(1,2)) # [b,h*w, 2]
# TODO Here some of the reprojection coordinates are invalid. (<0 or >max)
reproj_valid_mask = (pts2_coord > torch.Tensor([0,0]).to(pred1.get_device())).all(-1, True).float() * \
(pts2_coord < torch.Tensor([w-1,h-1]).to(pred1.get_device())).all(-1, True).float() # [b,h*w, 1]
reproj_valid_mask = (valid_mask * reproj_valid_mask.view([b,h,w,1]).permute([0,3,1,2])).detach()
rigid_mask = rigid_mask.detach()
pts2_depth = pts2_depth.transpose(1,2).view([b,1,h,w])
# Get the interpolated depth prediction2
pts2_coord_nor = torch.cat([2.0 * pts2_coord[:,:,0].unsqueeze(-1) / (w - 1.0) - 1.0, 2.0 * pts2_coord[:,:,1].unsqueeze(-1) / (h - 1.0) - 1.0], -1)
inter_depth2 = F.grid_sample(pred2, pts2_coord_nor.view([b, h, w, 2]), padding_mode='reflection') # [b,1,h,w]
pj_loss_map = (torch.abs(1.0 - pts2_depth / (inter_depth2 + 1e-12)) * rigid_mask * reproj_valid_mask)
pj_loss = pj_loss_map.mean((1,2,3)) / ((reproj_valid_mask * rigid_mask).mean((1,2,3))+1e-12)
#pj_loss = (valid_mask * mask * torch.abs(pts2_depth - inter_depth2) / (torch.abs(pts2_depth + inter_depth2)+1e-12)).mean((1,2,3)) / ((valid_mask * mask).mean((1,2,3))+1e-12) # [b]
flow_loss = (rigid_mask * torch.abs(flow + xy - pts2_coord.detach().permute(0,2,1).view([b,2,h,w]))).mean((1,2,3)) / (rigid_mask.mean((1,2,3)) + 1e-12)
return pj_loss, flow_loss
def disp2depth(self, disp, min_depth=0.1, max_depth=100.0):
min_disp = 1 / max_depth
max_disp = 1 / min_depth
scaled_disp = min_disp + (max_disp - min_disp) * disp
depth = 1 / scaled_disp
return scaled_disp, depth
def get_smooth_loss(self, img, disp):
# img: [b,3,h,w] depth: [b,1,h,w]
"""Computes the smoothness loss for a disparity image
The color image is used for edge-aware smoothness
"""
grad_disp_x = torch.abs(disp[:, :, :, :-1] - disp[:, :, :, 1:])
grad_disp_y = torch.abs(disp[:, :, :-1, :] - disp[:, :, 1:, :])
grad_img_x = torch.mean(torch.abs(img[:, :, :, :-1] - img[:, :, :, 1:]), 1, keepdim=True)
grad_img_y = torch.mean(torch.abs(img[:, :, :-1, :] - img[:, :, 1:, :]), 1, keepdim=True)
grad_disp_x *= torch.exp(-grad_img_x)
grad_disp_y *= torch.exp(-grad_img_y)
return grad_disp_x.mean((1,2,3)) + grad_disp_y.mean((1,2,3))
def infer_depth(self, img):
disp_list = self.depth_net(img)
disp, depth = self.disp2depth(disp_list[0])
return disp_list[0]
def infer_vo(self, img1, img2, K, K_inv, match_num=6000):
b, img_h, img_w = img1.shape[0], img1.shape[2], img1.shape[3]
F_final, img1_valid_mask, img1_rigid_mask, fwd_flow, fwd_match = self.model_pose.inference(img1, img2, K, K_inv)
# infer depth
disp1_list = self.depth_net(img1) # Nscales * [B, 1, H, W]
disp2_list = self.depth_net(img2)
disp1, depth1 = self.disp2depth(disp1_list[0])
disp2, depth2 = self.disp2depth(disp2_list[0])
img1_depth_mask = img1_rigid_mask * img1_valid_mask
# [b, 4, match_num]
top_ratio_match, top_ratio_mask = self.top_ratio_sample(fwd_match.view([b,4,-1]), img1_depth_mask.view([b,1,-1]), ratio=0.30) # [b, 4, ratio*h*w]
depth_match, depth_match_num = self.robust_rand_sample(top_ratio_match, top_ratio_mask, num=match_num)
return depth_match, depth1, depth2
def check_rt(self, img1, img2, K, K_inv):
# initialization
b = img1.shape[0]
flag1, flag2, flag3 = 0, 0, 0
images = torch.cat([img1, img2], dim=2)
inputs = [images, K.unsqueeze(1), K_inv.unsqueeze(1)]
# Pose Network
#self.profiler.reset()
loss_pack, F_final, img1_valid_mask, img1_rigid_score, img1_inlier_mask, fwd_flow, fwd_match = self.model_pose(inputs, output_F=True)
# Get masks
img1_depth_mask = img1_rigid_score * img1_valid_mask
# Select top score matches to triangulate depth.
top_ratio_match, top_ratio_mask = self.top_ratio_sample(fwd_match.view([b,4,-1]), img1_depth_mask.view([b,1,-1]), ratio=0.20) # [b, 4, ratio*h*w]
depth_match, depth_match_num = self.robust_rand_sample(top_ratio_match, top_ratio_mask, num=self.depth_match_num)
P1, P2, flags = self.rt_from_fundamental_mat_nyu(F_final.detach(), K, depth_match)
P1 = P1.detach()
P2 = P2.detach()
flags = torch.from_numpy(np.stack(flags, axis=0)).float().to(K.get_device())
return flags
def inference(self, img1, img2, K, K_inv):
b, img_h, img_w = img1.shape[0], img1.shape[2], img1.shape[3]
visualizer = Visualizer_debug('./vis/', np.transpose(255*img1.detach().cpu().numpy(), [0,2,3,1]), \
np.transpose(255*img2.detach().cpu().numpy(), [0,2,3,1]))
F_final, img1_valid_mask, img1_rigid_mask, fwd_flow, fwd_match = self.model_pose.inference(img1, img2, K, K_inv)
# infer depth
disp1_list = self.depth_net(img1) # Nscales * [B, 1, H, W]
disp2_list = self.depth_net(img2)
disp1, _ = self.disp2depth(disp1_list[0])
disp2, _ = self.disp2depth(disp2_list[0])
# Get Camera Matrix
img1_depth_mask = img1_rigid_mask * img1_valid_mask
# [b, 4, match_num]
top_ratio_match, top_ratio_mask = self.top_ratio_sample(fwd_match.view([b,4,-1]), img1_depth_mask.view([b,1,-1]), ratio=0.20) # [b, 4, ratio*h*w]
depth_match, depth_match_num = self.robust_rand_sample(top_ratio_match, top_ratio_mask, num=self.depth_match_num)
if self.dataset == 'nyuv2':
P1, P2, _ = self.rt_from_fundamental_mat_nyu(F_final, K, depth_match)
else:
P1, P2 = self.rt_from_fundamental_mat(F_final, K, depth_match)
Rt = K_inv.bmm(P2)
filt_depth_match, flag1 = self.ray_angle_filter(depth_match, P1, P2) # [b, 4, filt_num]
point3d_1 = self.midpoint_triangulate(filt_depth_match, K_inv, P1, P2)
point2d_1_coord, point2d_1_depth = self.reproject(P1, point3d_1) # [b,n,2], [b,n,1]
point2d_2_coord, point2d_2_depth = self.reproject(P2, point3d_1)
# Filter out some invalid triangulation results to stablize training.
point2d_1_depth, point2d_2_depth, point2d_1_coord, point2d_2_coord, flag2 = self.filt_negative_depth(point2d_1_depth, \
point2d_2_depth, point2d_1_coord, point2d_2_coord)
point2d_1_depth, point2d_2_depth, point2d_1_coord, point2d_2_coord, flag3 = self.filt_invalid_coord(point2d_1_depth, \
point2d_2_depth, point2d_1_coord, point2d_2_coord, max_h=img_h, max_w=img_w)
return fwd_flow, disp1, disp2, Rt, point2d_1_coord, point2d_1_depth
def forward(self, inputs):
# initialization
images, K_ms, K_inv_ms = inputs
K, K_inv = K_ms[:,0,:,:], K_inv_ms[:,0,:,:]
assert (images.shape[1] == 3)
img_h, img_w = int(images.shape[2] / 2), images.shape[3]
img1, img2 = images[:,:,:img_h,:], images[:,:,img_h:,:]
b = img1.shape[0]
flag1, flag2, flag3 = 0, 0, 0
visualizer = Visualizer_debug('./vis/', img1=255*img1.permute([0,2,3,1]).detach().cpu().numpy(), \
img2=255*img2.permute([0,2,3,1]).detach().cpu().numpy())
# Pose Network
loss_pack, F_final, img1_valid_mask, img1_rigid_mask, fwd_flow, fwd_match = self.model_pose(inputs, output_F=True, visualizer=visualizer)
# infer depth
disp1_list = self.depth_net(img1) # Nscales * [B, 1, H, W]
disp2_list = self.depth_net(img2)
# Get masks
img1_depth_mask = img1_rigid_mask * img1_valid_mask
# Select top score matches to triangulate depth.
top_ratio_match, top_ratio_mask = self.top_ratio_sample(fwd_match.view([b,4,-1]), img1_depth_mask.view([b,1,-1]), ratio=self.depth_sample_ratio) # [b, 4, ratio*h*w]
depth_match, depth_match_num = self.robust_rand_sample(top_ratio_match, top_ratio_mask, num=self.depth_match_num)
if self.dataset == 'nyuv2':
P1, P2, flags = self.rt_from_fundamental_mat_nyu(F_final.detach(), K, depth_match)
flags = torch.from_numpy(np.stack(flags, axis=0)).float().to(K.get_device())
else:
P1, P2 = self.rt_from_fundamental_mat(F_final.detach(), K, depth_match)
P1 = P1.detach()
P2 = P2.detach()
# Get triangulated points
filt_depth_match, flag1 = self.ray_angle_filter(depth_match, P1, P2, return_angle=False) # [b, 4, filt_num]
point3d_1 = self.midpoint_triangulate(filt_depth_match, K_inv, P1, P2)
point2d_1_coord, point2d_1_depth = self.reproject(P1, point3d_1) # [b,n,2], [b,n,1]
point2d_2_coord, point2d_2_depth = self.reproject(P2, point3d_1)
# Filter out some invalid triangulation results to stablize training.
point2d_1_depth, point2d_2_depth, point2d_1_coord, point2d_2_coord, flag2 = self.filt_negative_depth(point2d_1_depth, \
point2d_2_depth, point2d_1_coord, point2d_2_coord)
point2d_1_depth, point2d_2_depth, point2d_1_coord, point2d_2_coord, flag3 = self.filt_invalid_coord(point2d_1_depth, \
point2d_2_depth, point2d_1_coord, point2d_2_coord, max_h=img_h, max_w=img_w)
if flag1 + flag2 + flag3 > 0:
loss_pack['pt_depth_loss'] = torch.zeros([2]).to(point3d_1.get_device()).requires_grad_()
loss_pack['pj_depth_loss'] = torch.zeros([2]).to(point3d_1.get_device()).requires_grad_()
loss_pack['flow_error'] = torch.zeros([2]).to(point3d_1.get_device()).requires_grad_()
loss_pack['depth_smooth_loss'] = torch.zeros([2]).to(point3d_1.get_device()).requires_grad_()
return loss_pack
pt_depth_loss = 0
pj_depth_loss = 0
flow_error = 0
depth_smooth_loss = 0
for s in range(self.depth_scale):
disp_pred1 = F.interpolate(disp1_list[s], size=(img_h, img_w), mode='bilinear') # [b,1,h,w]
disp_pred2 = F.interpolate(disp2_list[s], size=(img_h, img_w), mode='bilinear')
scaled_disp1, depth_pred1 = self.disp2depth(disp_pred1)
scaled_disp2, depth_pred2 = self.disp2depth(disp_pred2)
# Rescale predicted depth according to triangulated depth
# [b,1,h,w], [b,n,1]
rescaled_pred1, inter_pred1 = self.register_depth(depth_pred1, point2d_1_coord, point2d_1_depth)
rescaled_pred2, inter_pred2 = self.register_depth(depth_pred2, point2d_2_coord, point2d_2_depth)
# Get Losses
pt_depth_loss += self.get_trian_loss(point2d_1_depth, inter_pred1) + self.get_trian_loss(point2d_2_depth, inter_pred2)
pj_depth, flow_loss = self.get_reproj_fdp_loss(rescaled_pred1, rescaled_pred2, P2, K, K_inv, img1_valid_mask, img1_rigid_mask, fwd_flow, visualizer=visualizer)
depth_smooth_loss += self.get_smooth_loss(img1, disp_pred1 / (disp_pred1.mean((2,3), True) + 1e-12)) + \
self.get_smooth_loss(img2, disp_pred2 / (disp_pred2.mean((2,3), True) + 1e-12))
pj_depth_loss += pj_depth
flow_error += flow_loss
if self.dataset == 'nyuv2':
loss_pack['pt_depth_loss'] = pt_depth_loss * flags
loss_pack['pj_depth_loss'], loss_pack['flow_error'] = pj_depth_loss * flags, flow_error * flags
loss_pack['depth_smooth_loss'] = depth_smooth_loss * flags
else:
loss_pack['pt_depth_loss'] = pt_depth_loss
loss_pack['pj_depth_loss'], loss_pack['flow_error'] = pj_depth_loss, flow_error
loss_pack['depth_smooth_loss'] = depth_smooth_loss
return loss_pack
================================================
FILE: core/networks/model_flow.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from structures import *
from pytorch_ssim import SSIM
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import pdb
import cv2
def transformerFwd(U,
flo,
out_size,
name='SpatialTransformerFwd'):
"""Forward Warping Layer described in
'Occlusion Aware Unsupervised Learning of Optical Flow by Yang Wang et al'
Parameters
----------
U : float
The output of a convolutional net should have the
shape [num_batch, height, width, num_channels].
flo: float
The optical flow used for forward warping
having the shape of [num_batch, height, width, 2].
backprop: boolean
Indicates whether to back-propagate through forward warping layer
out_size: tuple of two ints
The size of the output of the network (height, width)
"""
def _repeat(x, n_repeats):
rep = torch.ones(size=[n_repeats], dtype=torch.long).unsqueeze(1).transpose(1,0)
x = x.view([-1,1]).mm(rep)
return x.view([-1]).int()
def _interpolate(im, x, y, out_size):
# constants
num_batch, height, width, channels = im.shape[0], im.shape[1], im.shape[2], im.shape[3]
out_height = out_size[0]
out_width = out_size[1]
max_y = int(height - 1)
max_x = int(width - 1)
# scale indices from [-1, 1] to [0, width/height]
x = (x + 1.0) * (width - 1.0) / 2.0
y = (y + 1.0) * (height - 1.0) / 2.0
# do sampling
x0 = (torch.floor(x)).int()
x1 = x0 + 1
y0 = (torch.floor(y)).int()
y1 = y0 + 1
x0_c = torch.clamp(x0, 0, max_x)
x1_c = torch.clamp(x1, 0, max_x)
y0_c = torch.clamp(y0, 0, max_y)
y1_c = torch.clamp(y1, 0, max_y)
dim2 = width
dim1 = width * height
base = _repeat(torch.arange(0, num_batch) * dim1, out_height * out_width).to(im.get_device())
base_y0 = base + y0_c * dim2
base_y1 = base + y1_c * dim2
idx_a = base_y0 + x0_c
idx_b = base_y1 + x0_c
idx_c = base_y0 + x1_c
idx_d = base_y1 + x1_c
# use indices to lookup pixels in the flat image and restore
# channels dim
im_flat = im.view([-1, channels])
im_flat = im_flat.float()
# and finally calculate interpolated values
x0_f = x0.float()
x1_f = x1.float()
y0_f = y0.float()
y1_f = y1.float()
wa = ((x1_f - x) * (y1_f - y)).unsqueeze(1)
wb = ((x1_f - x) * (y - y0_f)).unsqueeze(1)
wc = ((x - x0_f) * (y1_f - y)).unsqueeze(1)
wd = ((x - x0_f) * (y - y0_f)).unsqueeze(1)
zerof = torch.zeros_like(wa)
wa = torch.where(
(torch.eq(x0_c, x0) & torch.eq(y0_c, y0)).unsqueeze(1), wa, zerof)
wb = torch.where(
(torch.eq(x0_c, x0) & torch.eq(y1_c, y1)).unsqueeze(1), wb, zerof)
wc = torch.where(
(torch.eq(x1_c, x1) & torch.eq(y0_c, y0)).unsqueeze(1), wc, zerof)
wd = torch.where(
(torch.eq(x1_c, x1) & torch.eq(y1_c, y1)).unsqueeze(1), wd, zerof)
zeros = torch.zeros(
size=[
int(num_batch) * int(height) *
int(width), int(channels)
],
dtype=torch.float)
output = zeros.to(im.get_device())
output = output.scatter_add(dim=0, index=idx_a.long().unsqueeze(1).repeat(1,channels), src=im_flat * wa)
output = output.scatter_add(dim=0, index=idx_b.long().unsqueeze(1).repeat(1,channels), src=im_flat * wb)
output = output.scatter_add(dim=0, index=idx_c.long().unsqueeze(1).repeat(1,channels), src=im_flat * wc)
output = output.scatter_add(dim=0, index=idx_d.long().unsqueeze(1).repeat(1,channels), src=im_flat * wd)
return output
def _meshgrid(height, width):
# This should be equivalent to:
x_t, y_t = np.meshgrid(np.linspace(-1, 1, width),
np.linspace(-1, 1, height))
# ones = np.ones(np.prod(x_t.shape))
# grid = np.vstack([x_t.flatten(), y_t.flatten(), ones])
return torch.from_numpy(x_t).float(), torch.from_numpy(y_t).float()
def _transform(flo, input_dim, out_size):
num_batch, height, width, num_channels = input_dim.shape[0:4]
# grid of (x_t, y_t, 1), eq (1) in ref [1]
height_f = float(height)
width_f = float(width)
out_height = out_size[0]
out_width = out_size[1]
x_s, y_s = _meshgrid(out_height, out_width)
x_s = x_s.to(flo.get_device()).unsqueeze(0)
x_s = x_s.repeat([num_batch, 1, 1])
y_s = y_s.to(flo.get_device()).unsqueeze(0)
y_s =y_s.repeat([num_batch, 1, 1])
x_t = x_s + flo[:, :, :, 0] / ((out_width - 1.0) / 2.0)
y_t = y_s + flo[:, :, :, 1] / ((out_height - 1.0) / 2.0)
x_t_flat = x_t.view([-1])
y_t_flat = y_t.view([-1])
input_transformed = _interpolate(input_dim, x_t_flat, y_t_flat,
out_size)
output = input_transformed.view([num_batch, out_height, out_width, num_channels])
return output
#out_size = int(out_size)
output = _transform(flo, U, out_size)
return output
class Model_flow(nn.Module):
def __init__(self, cfg):
super(Model_flow, self).__init__()
self.fpyramid = FeaturePyramid()
self.pwc_model = PWC_tf()
if cfg.mode == 'depth' or cfg.mode == 'flowposenet':
# Stage 2 training
for param in self.fpyramid.parameters():
param.requires_grad = False
for param in self.pwc_model.parameters():
param.requires_grad = False
# hyperparameters
self.dataset = cfg.dataset
self.num_scales = cfg.num_scales
self.flow_consist_alpha = cfg.h_flow_consist_alpha
self.flow_consist_beta = cfg.h_flow_consist_beta
def get_occlusion_mask_from_flow(self, tensor_size, flow):
mask = torch.ones(tensor_size).to(flow.get_device())
h, w = mask.shape[2], mask.shape[3]
occ_mask = transformerFwd(mask.permute(0,2,3,1), flow.permute(0,2,3,1), out_size=[h,w]).permute(0,3,1,2)
with torch.no_grad():
occ_mask = torch.clamp(occ_mask, 0.0, 1.0)
return occ_mask
def get_flow_norm(self, flow, p=2):
'''
Inputs:
flow (bs, 2, H, W)
'''
flow_norm = torch.norm(flow, p=p, dim=1).unsqueeze(1) + 1e-12
return flow_norm
def get_visible_masks(self, optical_flows, optical_flows_rev):
# get occlusion masks
batch_size, _, img_h, img_w = optical_flows[0].shape
img2_visible_masks, img1_visible_masks = [], []
for s, (optical_flow, optical_flow_rev) in enumerate(zip(optical_flows, optical_flows_rev)):
shape = [batch_size, 1, int(img_h / (2**s)), int(img_w / (2**s))]
img2_visible_masks.append(self.get_occlusion_mask_from_flow(shape, optical_flow))
img1_visible_masks.append(self.get_occlusion_mask_from_flow(shape, optical_flow_rev))
return img2_visible_masks, img1_visible_masks
def get_consistent_masks(self, optical_flows, optical_flows_rev):
# get consist masks
batch_size, _, img_h, img_w = optical_flows[0].shape
img2_consis_masks, img1_consis_masks, fwd_flow_diff_pyramid, bwd_flow_diff_pyramid = [], [], [], []
for s, (optical_flow, optical_flow_rev) in enumerate(zip(optical_flows, optical_flows_rev)):
bwd2fwd_flow = warp_flow(optical_flow_rev, optical_flow)
fwd2bwd_flow = warp_flow(optical_flow, optical_flow_rev)
fwd_flow_diff = torch.abs(bwd2fwd_flow + optical_flow)
fwd_flow_diff_pyramid.append(fwd_flow_diff)
bwd_flow_diff = torch.abs(fwd2bwd_flow + optical_flow_rev)
bwd_flow_diff_pyramid.append(bwd_flow_diff)
# flow consistency condition
bwd_consist_bound = torch.max(self.flow_consist_beta * self.get_flow_norm(optical_flow_rev), torch.from_numpy(np.array([self.flow_consist_alpha])).float().to(optical_flow_rev.get_device()))
fwd_consist_bound = torch.max(self.flow_consist_beta * self.get_flow_norm(optical_flow), torch.from_numpy(np.array([self.flow_consist_alpha])).float().to(optical_flow.get_device()))
with torch.no_grad():
noc_masks_img2 = (self.get_flow_norm(bwd_flow_diff) < bwd_consist_bound).float()
noc_masks_img1 = (self.get_flow_norm(fwd_flow_diff) < fwd_consist_bound).float()
img2_consis_masks.append(noc_masks_img2)
img1_consis_masks.append(noc_masks_img1)
return img2_consis_masks, img1_consis_masks, fwd_flow_diff_pyramid, bwd_flow_diff_pyramid
def generate_img_pyramid(self, img, num_pyramid):
img_h, img_w = img.shape[2], img.shape[3]
img_pyramid = []
for s in range(num_pyramid):
img_new = F.adaptive_avg_pool2d(img, [int(img_h / (2**s)), int(img_w / (2**s))]).data
img_pyramid.append(img_new)
return img_pyramid
def warp_flow_pyramid(self, img_pyramid, flow_pyramid):
img_warped_pyramid = []
for img, flow in zip(img_pyramid, flow_pyramid):
img_warped_pyramid.append(warp_flow(img, flow))
return img_warped_pyramid
def compute_loss_pixel(self, img_pyramid, img_warped_pyramid, occ_mask_list):
loss_list = []
for scale in range(self.num_scales):
img, img_warped, occ_mask = img_pyramid[scale], img_warped_pyramid[scale], occ_mask_list[scale]
divider = occ_mask.mean((1,2,3))
img_diff = torch.abs((img - img_warped)) * occ_mask.repeat(1,3,1,1)
loss_pixel = img_diff.mean((1,2,3)) / (divider + 1e-12) # (B)
loss_list.append(loss_pixel[:,None])
loss = torch.cat(loss_list, 1).sum(1) # (B)
return loss
def compute_loss_ssim(self, img_pyramid, img_warped_pyramid, occ_mask_list):
loss_list = []
for scale in range(self.num_scales):
img, img_warped, occ_mask = img_pyramid[scale], img_warped_pyramid[scale], occ_mask_list[scale]
divider = occ_mask.mean((1,2,3))
occ_mask_pad = occ_mask.repeat(1,3,1,1)
ssim = SSIM(img * occ_mask_pad, img_warped * occ_mask_pad)
loss_ssim = torch.clamp((1.0 - ssim) / 2.0, 0, 1).mean((1,2,3))
loss_ssim = loss_ssim / (divider + 1e-12)
loss_list.append(loss_ssim[:,None])
loss = torch.cat(loss_list, 1).sum(1)
return loss
def gradients(self, img):
dy = img[:,:,1:,:] - img[:,:,:-1,:]
dx = img[:,:,:,1:] - img[:,:,:,:-1]
return dx, dy
def cal_grad2_error(self, flow, img):
img_grad_x, img_grad_y = self.gradients(img)
w_x = torch.exp(-10.0 * torch.abs(img_grad_x).mean(1).unsqueeze(1))
w_y = torch.exp(-10.0 * torch.abs(img_grad_y).mean(1).unsqueeze(1))
dx, dy = self.gradients(flow)
dx2, _ = self.gradients(dx)
_, dy2 = self.gradients(dy)
error = (w_x[:,:,:,1:] * torch.abs(dx2)).mean((1,2,3)) + (w_y[:,:,1:,:] * torch.abs(dy2)).mean((1,2,3))
return error / 2.0
def compute_loss_flow_smooth(self, optical_flows, img_pyramid):
loss_list = []
for scale in range(self.num_scales):
flow, img = optical_flows[scale], img_pyramid[scale]
error = self.cal_grad2_error(flow/20.0, img)
loss_list.append(error[:,None])
loss = torch.cat(loss_list, 1).sum(1)
return loss
def compute_loss_flow_consis(self, fwd_flow_diff_pyramid, occ_mask_list):
loss_list = []
for scale in range(self.num_scales):
fwd_flow_diff, occ_mask = fwd_flow_diff_pyramid[scale], occ_mask_list[scale]
divider = occ_mask.mean((1,2,3))
loss_consis = (fwd_flow_diff * occ_mask).mean((1,2,3))
loss_consis = loss_consis / (divider + 1e-12)
loss_list.append(loss_consis[:,None])
loss = torch.cat(loss_list, 1).sum(1)
return loss
def inference_flow(self, img1, img2):
img_hw = [img1.shape[2], img1.shape[3]]
feature_list_1, feature_list_2 = self.fpyramid(img1), self.fpyramid(img2)
optical_flow = self.pwc_model(feature_list_1, feature_list_2, img_hw)[0]
return optical_flow
def inference_corres(self, img1, img2):
batch_size, img_h, img_w = img1.shape[0], img1.shape[2], img1.shape[3]
# get the optical flows and reverse optical flows for each pair of adjacent images
feature_list_1, feature_list_2 = self.fpyramid(img1), self.fpyramid(img2)
optical_flows = self.pwc_model(feature_list_1, feature_list_2, [img_h, img_w])
optical_flows_rev = self.pwc_model(feature_list_2, feature_list_1, [img_h, img_w])
# get occlusion masks
img2_visible_masks, img1_visible_masks = self.get_visible_masks(optical_flows, optical_flows_rev)
# get consistent masks
img2_consis_masks, img1_consis_masks, fwd_flow_diff_pyramid, bwd_flow_diff_pyramid = self.get_consistent_masks(optical_flows, optical_flows_rev)
# get final valid masks
img2_valid_masks, img1_valid_masks = [], []
for i, (img2_visible_mask, img1_visible_mask, img2_consis_mask, img1_consis_mask) in enumerate(zip(img2_visible_masks, img1_visible_masks, img2_consis_masks, img1_consis_masks)):
img2_valid_masks.append(img2_visible_mask * img2_consis_mask)
img1_valid_masks.append(img1_visible_mask * img1_consis_mask)
return optical_flows[0], optical_flows_rev[0], img1_valid_masks[0], img2_valid_masks[0], fwd_flow_diff_pyramid[0], bwd_flow_diff_pyramid[0]
def forward(self, inputs, output_flow=False, use_flow_loss=True):
images, K_ms, K_inv_ms = inputs
assert (images.shape[1] == 3)
img_h, img_w = int(images.shape[2] / 2), images.shape[3]
img1, img2 = images[:,:,:img_h,:], images[:,:,img_h:,:]
batch_size = img1.shape[0]
#cv2.imwrite('./test1.png', np.transpose(255*img1[0].cpu().detach().numpy(), [1,2,0]).astype(np.uint8))
#cv2.imwrite('./test2.png', np.transpose(255*img2[0].cpu().detach().numpy(), [1,2,0]).astype(np.uint8))
#pdb.set_trace()
# get the optical flows and reverse optical flows for each pair of adjacent images
feature_list_1, feature_list_2 = self.fpyramid(img1), self.fpyramid(img2)
optical_flows = self.pwc_model(feature_list_1, feature_list_2, [img_h, img_w])
optical_flows_rev = self.pwc_model(feature_list_2, feature_list_1, [img_h, img_w])
# get occlusion masks
img2_visible_masks, img1_visible_masks = self.get_visible_masks(optical_flows, optical_flows_rev)
# get consistent masks
img2_consis_masks, img1_consis_masks, fwd_flow_diff_pyramid, bwd_flow_diff_pyramid = self.get_consistent_masks(optical_flows, optical_flows_rev)
# get final valid masks
img2_valid_masks, img1_valid_masks = [], []
for i, (img2_visible_mask, img1_visible_mask, img2_consis_mask, img1_consis_mask) in enumerate(zip(img2_visible_masks, img1_visible_masks, img2_consis_masks, img1_consis_masks)):
if self.dataset == 'nyuv2':
img2_valid_masks.append(img2_visible_mask)
img1_valid_masks.append(img1_visible_mask)
else:
img2_valid_masks.append(img2_visible_mask * img2_consis_mask)
img1_valid_masks.append(img1_visible_mask * img1_consis_mask)
loss_pack = {}
if not use_flow_loss:
loss_pack['loss_pixel'] = torch.zeros([2]).to(img1.get_device()).requires_grad_()
loss_pack['loss_ssim'] = torch.zeros([2]).to(img1.get_device()).requires_grad_()
loss_pack['loss_flow_smooth'] = torch.zeros([2]).to(img1.get_device()).requires_grad_()
loss_pack['loss_flow_consis'] = torch.zeros([2]).to(img1.get_device()).requires_grad_()
return loss_pack, optical_flows[0], optical_flows_rev[0], img1_valid_masks[0], img2_valid_masks[0], fwd_flow_diff_pyramid[0], bwd_flow_diff_pyramid[0]
# warp images
img1_pyramid = self.generate_img_pyramid(img1, len(optical_flows_rev))
img2_pyramid = self.generate_img_pyramid(img2, len(optical_flows))
img1_warped_pyramid = self.warp_flow_pyramid(img2_pyramid, optical_flows)
img2_warped_pyramid = self.warp_flow_pyramid(img1_pyramid, optical_flows_rev)
# compute loss
loss_pack['loss_pixel'] = self.compute_loss_pixel(img1_pyramid, img1_warped_pyramid, img1_valid_masks) + \
self.compute_loss_pixel(img2_pyramid, img2_warped_pyramid, img2_valid_masks)
loss_pack['loss_ssim'] = self.compute_loss_ssim(img1_pyramid, img1_warped_pyramid, img1_valid_masks) + \
self.compute_loss_ssim(img2_pyramid, img2_warped_pyramid, img2_valid_masks)
loss_pack['loss_flow_smooth'] = self.compute_loss_flow_smooth(optical_flows, img1_pyramid) + \
self.compute_loss_flow_smooth(optical_flows_rev, img2_pyramid)
#loss_pack['loss_flow_consis'] = self.compute_loss_flow_consis(fwd_flow_diff_pyramid, img1_valid_masks) + \
# self.compute_loss_flow_consis(bwd_flow_diff_pyramid, img2_valid_masks)
loss_pack['loss_flow_consis'] = torch.zeros([2]).to(img1.get_device()).requires_grad_()
if output_flow:
return loss_pack, optical_flows[0], optical_flows_rev[0], img1_valid_masks[0], img2_valid_masks[0], fwd_flow_diff_pyramid[0], bwd_flow_diff_pyramid[0]
else:
return loss_pack
================================================
FILE: core/networks/model_flowposenet.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from structures import *
from pytorch_ssim import SSIM
from model_flow import Model_flow
sys.path.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', 'visualize'))
from visualizer import *
from profiler import Profiler
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import pdb
import cv2
def mean_on_mask(diff, valid_mask):
mask = valid_mask.expand_as(diff)
mean_value = (diff * mask).sum() / mask.sum()
return mean_value
def edge_aware_smoothness_loss(pred_disp, img, max_scales):
def gradient_x(img):
gx = img[:, :, :-1, :] - img[:, :, 1:, :]
return gx
def gradient_y(img):
gy = img[:, :, :, :-1] - img[:, :, :, 1:]
return gy
def get_edge_smoothness(img, pred):
pred_gradients_x = gradient_x(pred)
pred_gradients_y = gradient_y(pred)
image_gradients_x = gradient_x(img)
image_gradients_y = gradient_y(img)
weights_x = torch.exp(-torch.mean(torch.abs(image_gradients_x),
1, keepdim=True))
weights_y = torch.exp(-torch.mean(torch.abs(image_gradients_y),
1, keepdim=True))
smoothness_x = torch.abs(pred_gradients_x) * weights_x
smoothness_y = torch.abs(pred_gradients_y) * weights_y
return torch.mean(smoothness_x) + torch.mean(smoothness_y)
loss = 0
weight = 1.
s = 0
for scaled_disp in pred_disp:
s += 1
if s > max_scales:
break
b, _, h, w = scaled_disp.size()
scaled_img = nn.functional.adaptive_avg_pool2d(img, (h, w))
loss += get_edge_smoothness(scaled_img, scaled_disp) * weight
weight /= 4.0
return loss
def compute_smooth_loss(tgt_depth, tgt_img, ref_depth, ref_img, max_scales=1):
loss = edge_aware_smoothness_loss(tgt_depth, tgt_img, max_scales)
loss = edge_aware_smoothness_loss(ref_depth, ref_img, max_scales)
return loss
class Model_flowposenet(nn.Module):
def __init__(self, cfg):
super(Model_flowposenet, self).__init__()
assert cfg.depth_scale == 1
self.pose_net = FlowPoseNet()
self.model_flow = Model_flow(cfg)
self.depth_net = Depth_Model(cfg.depth_scale)
def compute_pairwise_loss(self, tgt_img, ref_img, tgt_depth, ref_depth, pose, intrinsic):
ref_img_warped, valid_mask, projected_depth, computed_depth = inverse_warp2(ref_img, tgt_depth, ref_depth,
pose, intrinsic, 'zeros')
diff_img = (tgt_img - ref_img_warped).abs()
diff_depth = ((computed_depth - projected_depth).abs() /
(computed_depth + projected_depth).abs()).clamp(0, 1)
ssim_map = (0.5*(1 - SSIM(tgt_img, ref_img_warped))).clamp(0, 1)
diff_img = (0.15 * diff_img + 0.85 * ssim_map)
# Modified in 01.19.2020
#weight_mask = (1 - diff_depth)
#diff_img = diff_img * weight_mask
# compute loss
reconstruction_loss = diff_img.mean()
geometry_consistency_loss = diff_depth.mean()
#reconstruction_loss = mean_on_mask(diff_img, valid_mask)
#geometry_consistency_loss = mean_on_mask(diff_depth, valid_mask)
return reconstruction_loss, geometry_consistency_loss
def disp2depth(self, disp, min_depth=0.01, max_depth=80.0):
min_disp = 1 / max_depth
max_disp = 1 / min_depth
scaled_disp = min_disp + (max_disp - min_disp) * disp
depth = 1 / scaled_disp
return scaled_disp, depth
def infer_depth(self, img):
b, img_h, img_w = img.shape[0], img.shape[2], img.shape[3]
disp_list = self.depth_net(img)
disp, depth = self.disp2depth(disp_list[0])
return disp_list[0]
def inference(self, img1, img2, K, K_inv):
flow = self.model_flow.inference_flow(img1, img2)
return flow, None, None, None, None, None
def inference_flow(self, img1, img2):
flow = self.model_flow.inference_flow(img1, img2)
return flow
def infer_pose(self, img1, img2, K, K_inv):
img_h, img_w = img1.shape[2], img1.shape[3]
flow = self.model_flow.inference_flow(img1, img2)
flow[:,0,:,:] /= img_w
flow[:,1,:,:] /= img_h
pose = self.pose_net(flow)
return pose
def forward(self, inputs):
# initialization
images, K_ms, K_inv_ms = inputs
K, K_inv = K_ms[:,0,:,:], K_inv_ms[:,0,:,:]
assert (images.shape[1] == 3)
img_h, img_w = int(images.shape[2] / 2), images.shape[3]
img1, img2 = images[:,:,:img_h,:], images[:,:,img_h:,:]
b = img1.shape[0]
visualizer = Visualizer_debug('./vis/', img1=255*img1.permute([0,2,3,1]).detach().cpu().numpy(), \
img2=255*img2.permute([0,2,3,1]).detach().cpu().numpy())
# Flow Network
loss_pack, fwd_flow, bwd_flow, img1_valid_mask, img2_valid_mask, img1_flow_diff_mask, img2_flow_diff_mask = self.model_flow(inputs, output_flow=True, use_flow_loss=False)
fwd_flow[:,0,:,:] /= img_w
fwd_flow[:,1,:,:] /= img_h
bwd_flow[:,0,:,:] /= img_w
bwd_flow[:,1,:,:] /= img_h
# Pose Network
pose = self.pose_net(fwd_flow)
pose_inv = self.pose_net(bwd_flow)
disp1_list = self.depth_net(img1) # Nscales * [B, 1, H, W]
disp2_list = self.depth_net(img2)
disp1, depth1 = self.disp2depth(disp1_list[0])
disp2, depth2 = self.disp2depth(disp2_list[0])
#pdb.set_trace()
loss_1, loss_3 = self.compute_pairwise_loss(img1, img2, depth1, depth2, pose, K)
loss_1_2, loss_3_2 = self.compute_pairwise_loss(img2, img1, depth2, depth1, pose_inv, K)
loss_ph = loss_1 + loss_1_2
loss_pj = loss_3 + loss_3_2
loss_2 = compute_smooth_loss([depth1], img1, [depth2], img2)
loss_pack['pt_depth_loss'] = torch.zeros([2]).to(loss_2.get_device()).requires_grad_()
loss_pack['pj_depth_loss'], loss_pack['flow_error'] = loss_pj, loss_ph
loss_pack['depth_smooth_loss'] = loss_2
#loss_pack['depth_smooth_loss'] = torch.zeros([2]).to(loss_2.get_device()).requires_grad_()
loss_pack['geo_loss'] = torch.zeros([2]).to(loss_2.get_device()).requires_grad_()
return loss_pack
================================================
FILE: core/networks/model_triangulate_pose.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
import torch
import torch.nn as nn
import numpy as np
from structures import *
from model_flow import Model_flow
import pdb
import cv2
class Model_triangulate_pose(nn.Module):
def __init__(self, cfg):
super(Model_triangulate_pose, self).__init__()
self.model_flow = Model_flow(cfg)
self.mode = cfg.mode
if cfg.dataset == 'nyuv2':
self.inlier_thres = 0.1
self.rigid_thres = 1.0
else:
self.inlier_thres = 0.1
self.rigid_thres = 0.5
self.filter = reduced_ransac(check_num=cfg.ransac_points, thres=self.inlier_thres, dataset=cfg.dataset)
def meshgrid(self, h, w):
xx, yy = np.meshgrid(np.arange(0,w), np.arange(0,h))
meshgrid = np.transpose(np.stack([xx,yy], axis=-1), [2,0,1]) # [2,h,w]
meshgrid = torch.from_numpy(meshgrid)
return meshgrid
def compute_epipolar_loss(self, fmat, match, mask):
# fmat: [b, 3, 3] match: [b, 4, h*w] mask: [b,1,h*w]
num_batch = match.shape[0]
match_num = match.shape[-1]
points1 = match[:,:2,:]
points2 = match[:,2:,:]
ones = torch.ones(num_batch, 1, match_num).to(points1.get_device())
points1 = torch.cat([points1, ones], 1) # [b,3,n]
points2 = torch.cat([points2, ones], 1).transpose(1,2) # [b,n,3]
# compute fundamental matrix loss
fmat = fmat.unsqueeze(1)
fmat_tiles = fmat.view([-1,3,3])
epi_lines = fmat_tiles.bmm(points1) #[b,3,n] [b*n, 3, 1]
dist_p2l = torch.abs((epi_lines.permute([0, 2, 1]) * points2).sum(-1, keepdim=True)) # [b,n,1]
a = epi_lines[:,0,:].unsqueeze(1).transpose(1,2) # [b,n,1]
b = epi_lines[:,1,:].unsqueeze(1).transpose(1,2) # [b,n,1]
dist_div = torch.sqrt(a*a + b*b) + 1e-6
dist_map = dist_p2l / dist_div # [B, n, 1]
loss = (dist_map * mask.transpose(1,2)).mean([1,2]) / mask.mean([1,2])
return loss, dist_map
def get_rigid_mask(self, dist_map):
rigid_mask = (dist_map < self.rigid_thres).float()
inlier_mask = (dist_map < self.inlier_thres).float()
rigid_score = rigid_mask * 1.0 / (1.0 + dist_map)
return rigid_mask, inlier_mask, rigid_score
def inference(self, img1, img2, K, K_inv):
batch_size, img_h, img_w = img1.shape[0], img1.shape[2], img1.shape[3]
fwd_flow, bwd_flow, img1_valid_mask, img2_valid_mask, img1_flow_diff_mask, img2_flow_diff_mask = self.model_flow.inference_corres(img1, img2)
grid = self.meshgrid(img_h, img_w).float().to(img1.get_device()).unsqueeze(0).repeat(batch_size,1,1,1) #[b,2,h,w]
corres = torch.cat([(grid[:,0,:,:] + fwd_flow[:,0,:,:]).clamp(0,img_w-1.0).unsqueeze(1), \
(grid[:,1,:,:] + fwd_flow[:,1,:,:]).clamp(0,img_h-1.0).unsqueeze(1)], 1)
match = torch.cat([grid, corres], 1) # [b,4,h,w]
img1_score_mask = img1_valid_mask * 1.0 / (0.1 + img1_flow_diff_mask.mean(1).unsqueeze(1))
F_final = self.filter(match, img1_score_mask)
geo_loss, rigid_mask = self.compute_epipolar_loss(F_final, match.view([batch_size,4,-1]), img1_valid_mask.view([batch_size,1,-1]))
img1_rigid_mask = (rigid_mask.view([batch_size,img_h,img_w,1]) < self.inlier_thres).float()
return F_final, img1_valid_mask, img1_rigid_mask.permute(0,3,1,2), fwd_flow, match
def forward(self, inputs, output_F=False, visualizer=None):
images, K_ms, K_inv_ms = inputs
K, K_inv = K_ms[:,0,:,:], K_inv_ms[:,0,:,:]
assert (images.shape[1] == 3)
img_h, img_w = int(images.shape[2] / 2), images.shape[3]
img1, img2 = images[:,:,:img_h,:], images[:,:,img_h:,:]
batch_size = img1.shape[0]
if self.mode == 'depth':
loss_pack, fwd_flow, bwd_flow, img1_valid_mask, img2_valid_mask, img1_flow_diff_mask, img2_flow_diff_mask = self.model_flow(inputs, output_flow=True, use_flow_loss=False)
else:
loss_pack, fwd_flow, bwd_flow, img1_valid_mask, img2_valid_mask, img1_flow_diff_mask, img2_flow_diff_mask = self.model_flow(inputs, output_flow=True)
grid = self.meshgrid(img_h, img_w).float().to(img1.get_device()).unsqueeze(0).repeat(batch_size,1,1,1) #[b,2,h,w]
fwd_corres = torch.cat([(grid[:,0,:,:] + fwd_flow[:,0,:,:]).unsqueeze(1), (grid[:,1,:,:] + fwd_flow[:,1,:,:]).unsqueeze(1)], 1)
fwd_match = torch.cat([grid, fwd_corres], 1) # [b,4,h,w]
bwd_corres = torch.cat([(grid[:,0,:,:] + bwd_flow[:,0,:,:]).unsqueeze(1), (grid[:,1,:,:] + bwd_flow[:,1,:,:]).unsqueeze(1)], 1)
bwd_match = torch.cat([grid, bwd_corres], 1) # [b,4,h,w]
# Use fwd-bwd consistency map for filter
img1_score_mask = img1_valid_mask * 1.0 / (0.1+img1_flow_diff_mask.mean(1).unsqueeze(1))
img2_score_mask = img2_valid_mask * 1.0 / (0.1+img2_flow_diff_mask.mean(1).unsqueeze(1))
# img1_score_mask = img1_valid_mask
F_final_1 = self.filter(fwd_match, img1_score_mask, visualizer=visualizer)
_, dist_map_1 = self.compute_epipolar_loss(F_final_1, fwd_match.view([batch_size,4,-1]), img1_valid_mask.view([batch_size,1,-1]))
dist_map_1 = dist_map_1.view([batch_size, img_h, img_w, 1])
# Compute geo loss for regularize correspondence.
rigid_mask_1, inlier_mask_1, rigid_score_1 = self.get_rigid_mask(dist_map_1)
# We only use rigid mask to filter out the moving objects for computing geo loss.
geo_loss = (dist_map_1 * (rigid_mask_1 - inlier_mask_1)).mean((1,2,3)) / \
(rigid_mask_1 - inlier_mask_1).mean((1,2,3))
loss_pack['geo_loss'] = geo_loss
if output_F:
return loss_pack, F_final_1, img1_score_mask, rigid_score_1.permute(0,3,1,2), fwd_flow, fwd_match
else:
return loss_pack
================================================
FILE: core/networks/pytorch_ssim/__init__.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from ssim import SSIM
================================================
FILE: core/networks/pytorch_ssim/ssim.py
================================================
import torch
import torch.nn as nn
def SSIM(x, y):
C1 = 0.01 ** 2
C2 = 0.03 ** 2
mu_x = nn.AvgPool2d(3, 1, padding=1)(x)
mu_y = nn.AvgPool2d(3, 1, padding=1)(y)
sigma_x = nn.AvgPool2d(3, 1, padding=1)(x**2) - mu_x**2
sigma_y = nn.AvgPool2d(3, 1, padding=1)(y**2) - mu_y**2
sigma_xy = nn.AvgPool2d(3, 1, padding=1)(x * y) - mu_x * mu_y
SSIM_n = (2 * mu_x * mu_y + C1) * (2 * sigma_xy + C2)
SSIM_d = (mu_x**2 + mu_y**2 + C1) * (sigma_x + sigma_y + C2)
SSIM = SSIM_n / SSIM_d
return SSIM
================================================
FILE: core/networks/structures/__init__.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from feature_pyramid import FeaturePyramid
from pwc_tf import PWC_tf
from ransac import reduced_ransac
from depth_model import Depth_Model
from net_utils import conv, deconv, warp_flow
from flowposenet import FlowPoseNet
from inverse_warp import inverse_warp2
================================================
FILE: core/networks/structures/depth_model.py
================================================
'''
This code was ported from existing repos
[LINK] https://github.com/nianticlabs/monodepth2
'''
from __future__ import absolute_import, division, print_function
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models
import torch.utils.model_zoo as model_zoo
from collections import OrderedDict
import pdb
class ResNetMultiImageInput(models.ResNet):
"""Constructs a resnet model with varying number of input images.
Adapted from https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py
"""
def __init__(self, block, layers, num_classes=1000, num_input_images=1):
super(ResNetMultiImageInput, self).__init__(block, layers)
self.inplanes = 64
self.conv1 = nn.Conv2d(
num_input_images * 3, 64, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
def resnet_multiimage_input(num_layers, pretrained=False, num_input_images=1):
"""Constructs a ResNet model.
Args:
num_layers (int): Number of resnet layers. Must be 18 or 50
pretrained (bool): If True, returns a model pre-trained on ImageNet
num_input_images (int): Number of frames stacked as input
"""
assert num_layers in [18, 50], "Can only run with 18 or 50 layer resnet"
blocks = {18: [2, 2, 2, 2], 50: [3, 4, 6, 3]}[num_layers]
block_type = {18: models.resnet.BasicBlock, 50: models.resnet.Bottleneck}[num_layers]
model = ResNetMultiImageInput(block_type, blocks, num_input_images=num_input_images)
if pretrained:
loaded = model_zoo.load_url(models.resnet.model_urls['resnet{}'.format(num_layers)])
loaded['conv1.weight'] = torch.cat(
[loaded['conv1.weight']] * num_input_images, 1) / num_input_images
model.load_state_dict(loaded)
return model
class ResnetEncoder(nn.Module):
"""Pytorch module for a resnet encoder
"""
def __init__(self, num_layers, pretrained, num_input_images=1):
super(ResnetEncoder, self).__init__()
self.num_ch_enc = np.array([64, 64, 128, 256, 512])
resnets = {18: models.resnet18,
34: models.resnet34,
50: models.resnet50,
101: models.resnet101,
152: models.resnet152}
if num_layers not in resnets:
raise ValueError("{} is not a valid number of resnet layers".format(num_layers))
if num_input_images > 1:
self.encoder = resnet_multiimage_input(num_layers, pretrained, num_input_images)
else:
self.encoder = resnets[num_layers](pretrained)
if num_layers > 34:
self.num_ch_enc[1:] *= 4
def forward(self, input_image):
self.features = []
x = (input_image - 0.45) / 0.225
x = self.encoder.conv1(x)
x = self.encoder.bn1(x)
self.features.append(self.encoder.relu(x))
self.features.append(self.encoder.layer1(self.encoder.maxpool(self.features[-1])))
self.features.append(self.encoder.layer2(self.features[-1]))
self.features.append(self.encoder.layer3(self.features[-1]))
self.features.append(self.encoder.layer4(self.features[-1]))
return self.features
class ConvBlock(nn.Module):
"""Layer to perform a convolution followed by ELU
"""
def __init__(self, in_channels, out_channels):
super(ConvBlock, self).__init__()
self.conv = Conv3x3(in_channels, out_channels)
self.nonlin = nn.ELU(inplace=True)
def forward(self, x):
out = self.conv(x)
out = self.nonlin(out)
return out
class Conv3x3(nn.Module):
"""Layer to pad and convolve input
"""
def __init__(self, in_channels, out_channels, use_refl=True):
super(Conv3x3, self).__init__()
if use_refl:
self.pad = nn.ReflectionPad2d(1)
else:
self.pad = nn.ZeroPad2d(1)
self.conv = nn.Conv2d(int(in_channels), int(out_channels), 3)
def forward(self, x):
out = self.pad(x)
out = self.conv(out)
return out
def upsample(x):
"""Upsample input tensor by a factor of 2
"""
#return F.interpolate(x, scale_factor=2, mode="nearest")
# TODO
return F.interpolate(x, scale_factor=2, mode='bilinear', align_corners=False)
class DepthDecoder(nn.Module):
def __init__(self, num_ch_enc, scales=range(4), num_output_channels=1, use_skips=True):
super(DepthDecoder, self).__init__()
self.num_output_channels = num_output_channels
self.use_skips = use_skips
self.scales = scales
self.num_ch_enc = num_ch_enc
self.num_ch_dec = np.array([16, 32, 64, 128, 256])
# decoder
self.init_decoder()
def init_decoder(self):
self.upconvs = nn.ModuleList()
for i in range(4, -1, -1):
upconvs_now = nn.ModuleList()
# upconv_0
num_ch_in = self.num_ch_enc[-1] if i == 4 else self.num_ch_dec[i + 1]
num_ch_out = self.num_ch_dec[i]
upconvs_now.append(ConvBlock(num_ch_in, num_ch_out))
# upconv_1
num_ch_in = self.num_ch_dec[i]
if self.use_skips and i > 0:
num_ch_in += self.num_ch_enc[i - 1]
num_ch_out = self.num_ch_dec[i]
upconvs_now.append(ConvBlock(num_ch_in, num_ch_out))
self.upconvs.append(upconvs_now)
self.dispconvs = nn.ModuleList()
for s in self.scales:
self.dispconvs.append(Conv3x3(self.num_ch_dec[s], self.num_output_channels))
# self.decoder = nn.ModuleList(list(self.convs.values()))
self.sigmoid = nn.Sigmoid()
def forward(self, input_features):
self.outputs = {}
# decoder
x = input_features[-1]
for scale in range(4, -1, -1): # [4, 3, 2, 1, 0]
idx = 4 - scale
x = self.upconvs[idx][0](x)
x = [upsample(x)]
if self.use_skips and scale > 0:
x += [input_features[scale - 1]]
x = torch.cat(x, 1)
x = self.upconvs[idx][1](x)
# get disp
if scale in self.scales:
scale_idx = self.scales.index(scale)
self.outputs[("disp", scale)] = self.sigmoid(self.dispconvs[scale_idx](x))
return self.outputs
class Depth_Model(nn.Module):
def __init__(self, depth_scale, num_layers=18):
super(Depth_Model, self).__init__()
self.depth_scale = depth_scale
self.encoder = ResnetEncoder(num_layers=num_layers, pretrained=False)
self.decoder = DepthDecoder(self.encoder.num_ch_enc, scales=range(depth_scale))
def forward(self, img):
features = self.encoder(img)
outputs = self.decoder(features)
depth_list = []
disp_list = []
for i in range(self.depth_scale):
disp = outputs['disp', i]
#s_disp, depth = self.disp2depth(disp, self.min_depth, self.max_depth)
#depth_list.append(depth)
#disp_list.append(s_disp)
disp_list.append(disp)
return disp_list
================================================
FILE: core/networks/structures/feature_pyramid.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from net_utils import conv
import torch
import torch.nn as nn
class FeaturePyramid(nn.Module):
def __init__(self):
super(FeaturePyramid, self).__init__()
self.conv1 = conv(3, 16, kernel_size=3, stride=2)
self.conv2 = conv(16, 16, kernel_size=3, stride=1)
self.conv3 = conv(16, 32, kernel_size=3, stride=2)
self.conv4 = conv(32, 32, kernel_size=3, stride=1)
self.conv5 = conv(32, 64, kernel_size=3, stride=2)
self.conv6 = conv(64, 64, kernel_size=3, stride=1)
self.conv7 = conv(64, 96, kernel_size=3, stride=2)
self.conv8 = conv(96, 96, kernel_size=3, stride=1)
self.conv9 = conv(96, 128, kernel_size=3, stride=2)
self.conv10 = conv(128, 128, kernel_size=3, stride=1)
self.conv11 = conv(128, 196, kernel_size=3, stride=2)
self.conv12 = conv(196, 196, kernel_size=3, stride=1)
'''
for m in self.modules():
if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
nn.init.constant_(m.weight.data, 0.0)
if m.bias is not None:
m.bias.data.zero_()
'''
def forward(self, img):
cnv2 = self.conv2(self.conv1(img))
cnv4 = self.conv4(self.conv3(cnv2))
cnv6 = self.conv6(self.conv5(cnv4))
cnv8 = self.conv8(self.conv7(cnv6))
cnv10 = self.conv10(self.conv9(cnv8))
cnv12 = self.conv12(self.conv11(cnv10))
return cnv2, cnv4, cnv6, cnv8, cnv10, cnv12
================================================
FILE: core/networks/structures/flowposenet.py
================================================
import torch
import torch.nn as nn
from torch import sigmoid
from torch.nn.init import xavier_uniform_, zeros_
def conv(in_planes, out_planes, kernel_size=3):
return nn.Sequential(
nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, padding=(kernel_size-1)//2, stride=2),
nn.ReLU(inplace=True)
)
def upconv(in_planes, out_planes):
return nn.Sequential(
nn.ConvTranspose2d(in_planes, out_planes, kernel_size=4, stride=2, padding=1),
nn.ReLU(inplace=True)
)
class FlowPoseNet(nn.Module):
def __init__(self):
super(FlowPoseNet, self).__init__()
conv_planes = [16, 32, 64, 128, 256, 256, 256]
self.conv1 = conv(2, conv_planes[0], kernel_size=7)
self.conv2 = conv(conv_planes[0], conv_planes[1], kernel_size=5)
self.conv3 = conv(conv_planes[1], conv_planes[2])
self.conv4 = conv(conv_planes[2], conv_planes[3])
self.conv5 = conv(conv_planes[3], conv_planes[4])
self.conv6 = conv(conv_planes[4], conv_planes[5])
self.conv7 = conv(conv_planes[5], conv_planes[6])
self.pose_pred = nn.Conv2d(conv_planes[6], 6, kernel_size=1, padding=0)
def init_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
xavier_uniform_(m.weight.data)
if m.bias is not None:
zeros_(m.bias)
def forward(self, flow):
# [B, 2, H, W]
input = flow
out_conv1 = self.conv1(input)
out_conv2 = self.conv2(out_conv1)
out_conv3 = self.conv3(out_conv2)
out_conv4 = self.conv4(out_conv3)
out_conv5 = self.conv5(out_conv4)
out_conv6 = self.conv6(out_conv5)
out_conv7 = self.conv7(out_conv6)
pose = self.pose_pred(out_conv7)
pose = pose.mean(3).mean(2)
pose = 0.01 * pose.view(pose.size(0), 6)
return pose
================================================
FILE: core/networks/structures/inverse_warp.py
================================================
from __future__ import division
import torch
import torch.nn.functional as F
pixel_coords = None
def set_id_grid(depth):
global pixel_coords
b, h, w = depth.size()
i_range = torch.arange(0, h).view(1, h, 1).expand(
1, h, w).type_as(depth) # [1, H, W]
j_range = torch.arange(0, w).view(1, 1, w).expand(
1, h, w).type_as(depth) # [1, H, W]
ones = torch.ones(1, h, w).type_as(depth)
pixel_coords = torch.stack((j_range, i_range, ones), dim=1) # [1, 3, H, W]
def check_sizes(input, input_name, expected):
condition = [input.ndimension() == len(expected)]
for i, size in enumerate(expected):
if size.isdigit():
condition.append(input.size(i) == int(size))
assert(all(condition)), "wrong size for {}, expected {}, got {}".format(
input_name, 'x'.join(expected), list(input.size()))
def pixel2cam(depth, intrinsics_inv):
global pixel_coords
"""Transform coordinates in the pixel frame to the camera frame.
Args:
depth: depth maps -- [B, H, W]
intrinsics_inv: intrinsics_inv matrix for each element of batch -- [B, 3, 3]
Returns:
array of (u,v,1) cam coordinates -- [B, 3, H, W]
"""
b, h, w = depth.size()
if (pixel_coords is None) or pixel_coords.size(2) < h:
set_id_grid(depth)
current_pixel_coords = pixel_coords[:, :, :h, :w].expand(
b, 3, h, w).reshape(b, 3, -1) # [B, 3, H*W]
cam_coords = (intrinsics_inv @ current_pixel_coords).reshape(b, 3, h, w)
return cam_coords * depth.unsqueeze(1)
def cam2pixel(cam_coords, proj_c2p_rot, proj_c2p_tr, padding_mode):
"""Transform coordinates in the camera frame to the pixel frame.
Args:
cam_coords: pixel coordinates defined in the first camera coordinates system -- [B, 4, H, W]
proj_c2p_rot: rotation matrix of cameras -- [B, 3, 4]
proj_c2p_tr: translation vectors of cameras -- [B, 3, 1]
Returns:
array of [-1,1] coordinates -- [B, 2, H, W]
"""
b, _, h, w = cam_coords.size()
cam_coords_flat = cam_coords.reshape(b, 3, -1) # [B, 3, H*W]
if proj_c2p_rot is not None:
pcoords = proj_c2p_rot @ cam_coords_flat
else:
pcoords = cam_coords_flat
if proj_c2p_tr is not None:
pcoords = pcoords + proj_c2p_tr # [B, 3, H*W]
X = pcoords[:, 0]
Y = pcoords[:, 1]
Z = pcoords[:, 2].clamp(min=1e-3)
# Normalized, -1 if on extreme left, 1 if on extreme right (x = w-1) [B, H*W]
X_norm = 2*(X / Z)/(w-1) - 1
Y_norm = 2*(Y / Z)/(h-1) - 1 # Idem [B, H*W]
pixel_coords = torch.stack([X_norm, Y_norm], dim=2) # [B, H*W, 2]
return pixel_coords.reshape(b, h, w, 2)
def euler2mat(angle):
"""Convert euler angles to rotation matrix.
Reference: https://github.com/pulkitag/pycaffe-utils/blob/master/rot_utils.py#L174
Args:
angle: rotation angle along 3 axis (in radians) -- size = [B, 3]
Returns:
Rotation matrix corresponding to the euler angles -- size = [B, 3, 3]
"""
B = angle.size(0)
x, y, z = angle[:, 0], angle[:, 1], angle[:, 2]
cosz = torch.cos(z)
sinz = torch.sin(z)
zeros = z.detach()*0
ones = zeros.detach()+1
zmat = torch.stack([cosz, -sinz, zeros,
sinz, cosz, zeros,
zeros, zeros, ones], dim=1).reshape(B, 3, 3)
cosy = torch.cos(y)
siny = torch.sin(y)
ymat = torch.stack([cosy, zeros, siny,
zeros, ones, zeros,
-siny, zeros, cosy], dim=1).reshape(B, 3, 3)
cosx = torch.cos(x)
sinx = torch.sin(x)
xmat = torch.stack([ones, zeros, zeros,
zeros, cosx, -sinx,
zeros, sinx, cosx], dim=1).reshape(B, 3, 3)
rotMat = xmat @ ymat @ zmat
return rotMat
def quat2mat(quat):
"""Convert quaternion coefficients to rotation matrix.
Args:
quat: first three coeff of quaternion of rotation. fourht is then computed to have a norm of 1 -- size = [B, 3]
Returns:
Rotation matrix corresponding to the quaternion -- size = [B, 3, 3]
"""
norm_quat = torch.cat([quat[:, :1].detach()*0 + 1, quat], dim=1)
norm_quat = norm_quat/norm_quat.norm(p=2, dim=1, keepdim=True)
w, x, y, z = norm_quat[:, 0], norm_quat[:,
1], norm_quat[:, 2], norm_quat[:, 3]
B = quat.size(0)
w2, x2, y2, z2 = w.pow(2), x.pow(2), y.pow(2), z.pow(2)
wx, wy, wz = w*x, w*y, w*z
xy, xz, yz = x*y, x*z, y*z
rotMat = torch.stack([w2 + x2 - y2 - z2, 2*xy - 2*wz, 2*wy + 2*xz,
2*wz + 2*xy, w2 - x2 + y2 - z2, 2*yz - 2*wx,
2*xz - 2*wy, 2*wx + 2*yz, w2 - x2 - y2 + z2], dim=1).reshape(B, 3, 3)
return rotMat
def pose_vec2mat(vec, rotation_mode='euler'):
"""
Convert 6DoF parameters to transformation matrix.
Args:s
vec: 6DoF parameters in the order of tx, ty, tz, rx, ry, rz -- [B, 6]
Returns:
A transformation matrix -- [B, 3, 4]
"""
translation = vec[:, :3].unsqueeze(-1) # [B, 3, 1]
rot = vec[:, 3:]
if rotation_mode == 'euler':
rot_mat = euler2mat(rot) # [B, 3, 3]
elif rotation_mode == 'quat':
rot_mat = quat2mat(rot) # [B, 3, 3]
transform_mat = torch.cat([rot_mat, translation], dim=2) # [B, 3, 4]
return transform_mat
def inverse_warp(img, depth, pose, intrinsics, rotation_mode='euler', padding_mode='zeros'):
"""
Inverse warp a source image to the target image plane.
Args:
img: the source image (where to sample pixels) -- [B, 3, H, W]
depth: depth map of the target image -- [B, H, W]
pose: 6DoF pose parameters from target to source -- [B, 6]
intrinsics: camera intrinsic matrix -- [B, 3, 3]
Returns:
projected_img: Source image warped to the target image plane
valid_points: Boolean array indicating point validity
"""
check_sizes(img, 'img', 'B3HW')
check_sizes(depth, 'depth', 'BHW')
check_sizes(pose, 'pose', 'B6')
check_sizes(intrinsics, 'intrinsics', 'B33')
batch_size, _, img_height, img_width = img.size()
cam_coords = pixel2cam(depth, intrinsics.inverse()) # [B,3,H,W]
pose_mat = pose_vec2mat(pose, rotation_mode) # [B,3,4]
# Get projection matrix for tgt camera frame to source pixel frame
proj_cam_to_src_pixel = intrinsics @ pose_mat # [B, 3, 4]
rot, tr = proj_cam_to_src_pixel[:, :, :3], proj_cam_to_src_pixel[:, :, -1:]
src_pixel_coords = cam2pixel(
cam_coords, rot, tr, padding_mode) # [B,H,W,2]
projected_img = F.grid_sample(
img, src_pixel_coords, padding_mode=padding_mode)
valid_points = src_pixel_coords.abs().max(dim=-1)[0] <= 1
return projected_img, valid_points
def cam2pixel2(cam_coords, proj_c2p_rot, proj_c2p_tr, padding_mode):
"""Transform coordinates in the camera frame to the pixel frame.
Args:
cam_coords: pixel coordinates defined in the first camera coordinates system -- [B, 4, H, W]
proj_c2p_rot: rotation matrix of cameras -- [B, 3, 4]
proj_c2p_tr: translation vectors of cameras -- [B, 3, 1]
Returns:
array of [-1,1] coordinates -- [B, 2, H, W]
"""
b, _, h, w = cam_coords.size()
cam_coords_flat = cam_coords.reshape(b, 3, -1) # [B, 3, H*W]
if proj_c2p_rot is not None:
pcoords = proj_c2p_rot @ cam_coords_flat
else:
pcoords = cam_coords_flat
if proj_c2p_tr is not None:
pcoords = pcoords + proj_c2p_tr # [B, 3, H*W]
X = pcoords[:, 0]
Y = pcoords[:, 1]
Z = pcoords[:, 2].clamp(min=1e-3)
# Normalized, -1 if on extreme left, 1 if on extreme right (x = w-1) [B, H*W]
X_norm = 2*(X / Z)/(w-1) - 1
Y_norm = 2*(Y / Z)/(h-1) - 1 # Idem [B, H*W]
if padding_mode == 'zeros':
X_mask = ((X_norm > 1)+(X_norm < -1)).detach()
# make sure that no point in warped image is a combinaison of im and gray
X_norm[X_mask] = 2
Y_mask = ((Y_norm > 1)+(Y_norm < -1)).detach()
Y_norm[Y_mask] = 2
pixel_coords = torch.stack([X_norm, Y_norm], dim=2) # [B, H*W, 2]
return pixel_coords.reshape(b, h, w, 2), Z.reshape(b, 1, h, w)
def inverse_warp2(img, depth, ref_depth, pose, intrinsics, padding_mode='zeros'):
"""
Inverse warp a source image to the target image plane.
Args:
img: the source image (where to sample pixels) -- [B, 3, H, W]
depth: depth map of the target image -- [B, 1, H, W]
ref_depth: the source depth map (where to sample depth) -- [B, 1, H, W]
pose: 6DoF pose parameters from target to source -- [B, 6]
intrinsics: camera intrinsic matrix -- [B, 3, 3]
Returns:
projected_img: Source image warped to the target image plane
valid_mask: Float array indicating point validity
"""
check_sizes(img, 'img', 'B3HW')
check_sizes(depth, 'depth', 'B1HW')
check_sizes(ref_depth, 'ref_depth', 'B1HW')
check_sizes(pose, 'pose', 'B6')
check_sizes(intrinsics, 'intrinsics', 'B33')
batch_size, _, img_height, img_width = img.size()
cam_coords = pixel2cam(depth.squeeze(1), intrinsics.inverse()) # [B,3,H,W]
pose_mat = pose_vec2mat(pose) # [B,3,4]
# Get projection matrix for tgt camera frame to source pixel frame
proj_cam_to_src_pixel = intrinsics @ pose_mat # [B, 3, 4]
rot, tr = proj_cam_to_src_pixel[:, :, :3], proj_cam_to_src_pixel[:, :, -1:]
src_pixel_coords, computed_depth = cam2pixel2(
cam_coords, rot, tr, padding_mode) # [B,H,W,2]
projected_img = F.grid_sample(
img, src_pixel_coords, padding_mode=padding_mode)
valid_points = src_pixel_coords.abs().max(dim=-1)[0] <= 1
valid_mask = valid_points.unsqueeze(1).float()
projected_depth = F.grid_sample(
ref_depth, src_pixel_coords, padding_mode=padding_mode).clamp(min=1e-3)
return projected_img, valid_mask, projected_depth, computed_depth
================================================
FILE: core/networks/structures/net_utils.py
================================================
import torch
import torch.nn as nn
from torch.autograd import Variable
import pdb
import numpy as np
def conv(in_planes, out_planes, kernel_size=3, stride=1, padding=1, dilation=1):
return nn.Sequential(
nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride,
padding=padding, dilation=dilation, bias=True),
nn.LeakyReLU(0.1))
def deconv(in_planes, out_planes, kernel_size=4, stride=2, padding=1):
return nn.ConvTranspose2d(in_planes, out_planes, kernel_size, stride, padding, bias=True)
def warp_flow(x, flow, use_mask=False):
"""
warp an image/tensor (im2) back to im1, according to the optical flow
Inputs:
x: [B, C, H, W] (im2)
flow: [B, 2, H, W] flow
Returns:
ouptut: [B, C, H, W]
"""
B, C, H, W = x.size()
# mesh grid
xx = torch.arange(0, W).view(1,-1).repeat(H,1)
yy = torch.arange(0, H).view(-1,1).repeat(1,W)
xx = xx.view(1,1,H,W).repeat(B,1,1,1)
yy = yy.view(1,1,H,W).repeat(B,1,1,1)
grid = torch.cat((xx,yy),1).float()
if grid.shape != flow.shape:
raise ValueError('the shape of grid {0} is not equal to the shape of flow {1}.'.format(grid.shape, flow.shape))
if x.is_cuda:
grid = grid.to(x.get_device())
vgrid = grid + flow
# scale grid to [-1,1]
vgrid[:,0,:,:] = 2.0*vgrid[:,0,:,:].clone() / max(W-1,1)-1.0
vgrid[:,1,:,:] = 2.0*vgrid[:,1,:,:].clone() / max(H-1,1)-1.0
vgrid = vgrid.permute(0,2,3,1)
output = nn.functional.grid_sample(x, vgrid)
if use_mask:
mask = torch.autograd.Variable(torch.ones(x.size())).to(x.get_device())
mask = nn.functional.grid_sample(mask, vgrid)
mask[mask < 0.9999] = 0
mask[mask > 0] = 1
return output * mask
else:
return output
if __name__ == '__main__':
x = np.ones([1,1,10,10])
flow = np.stack([np.ones([1,10,10])*3.0, np.zeros([1,10,10])], axis=1)
y = warp_flow(torch.from_numpy(x).cuda().float(),torch.from_numpy(flow).cuda().float()).cpu().detach().numpy()
print(y)
================================================
FILE: core/networks/structures/pwc_tf.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from net_utils import conv, deconv, warp_flow
sys.path.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', '..', 'external'))
# from correlation_package.correlation import Correlation
# from spatial_correlation_sampler import SpatialCorrelationSampler as Correlation
import torch
import torch.nn as nn
from torch.autograd import Variable
import numpy as np
import pdb
import torch.nn.functional as F
#from spatial_correlation_sampler import spatial_correlation_sample
class PWC_tf(nn.Module):
def __init__(self, md=4):
super(PWC_tf, self).__init__()
self.corr = self.corr_naive
# self.corr = self.correlate
self.leakyRELU = nn.LeakyReLU(0.1)
nd = (2*md+1)**2
#dd = np.cumsum([128,128,96,64,32])
dd = np.array([128,128,96,64,32])
od = nd
self.conv6_0 = conv(od, 128, kernel_size=3, stride=1)
self.conv6_1 = conv(dd[0], 128, kernel_size=3, stride=1)
self.conv6_2 = conv(dd[0]+dd[1],96, kernel_size=3, stride=1)
self.conv6_3 = conv(dd[1]+dd[2],64, kernel_size=3, stride=1)
self.conv6_4 = conv(dd[2]+dd[3],32, kernel_size=3, stride=1)
self.predict_flow6 = self.predict_flow(dd[3]+dd[4])
#self.deconv6 = deconv(2, 2, kernel_size=4, stride=2, padding=1)
#self.upfeat6 = deconv(od+dd[4], 2, kernel_size=4, stride=2, padding=1)
od = nd+128+2
self.conv5_0 = conv(od, 128, kernel_size=3, stride=1)
self.conv5_1 = conv(dd[0], 128, kernel_size=3, stride=1)
self.conv5_2 = conv(dd[0]+dd[1],96, kernel_size=3, stride=1)
self.conv5_3 = conv(dd[1]+dd[2],64, kernel_size=3, stride=1)
self.conv5_4 = conv(dd[2]+dd[3],32, kernel_size=3, stride=1)
self.predict_flow5 = self.predict_flow(dd[3]+dd[4])
#self.deconv5 = deconv(2, 2, kernel_size=4, stride=2, padding=1)
#self.upfeat5 = deconv(od+dd[4], 2, kernel_size=4, stride=2, padding=1)
od = nd+96+2
self.conv4_0 = conv(od, 128, kernel_size=3, stride=1)
self.conv4_1 = conv(dd[0], 128, kernel_size=3, stride=1)
self.conv4_2 = conv(dd[0]+dd[1],96, kernel_size=3, stride=1)
self.conv4_3 = conv(dd[1]+dd[2],64, kernel_size=3, stride=1)
self.conv4_4 = conv(dd[2]+dd[3],32, kernel_size=3, stride=1)
self.predict_flow4 = self.predict_flow(dd[3]+dd[4])
#self.deconv4 = deconv(2, 2, kernel_size=4, stride=2, padding=1)
#self.upfeat4 = deconv(od+dd[4], 2, kernel_size=4, stride=2, padding=1)
od = nd+64+2
self.conv3_0 = conv(od, 128, kernel_size=3, stride=1)
self.conv3_1 = conv(dd[0], 128, kernel_size=3, stride=1)
self.conv3_2 = conv(dd[0]+dd[1],96, kernel_size=3, stride=1)
self.conv3_3 = conv(dd[1]+dd[2],64, kernel_size=3, stride=1)
self.conv3_4 = conv(dd[2]+dd[3],32, kernel_size=3, stride=1)
self.predict_flow3 = self.predict_flow(dd[3]+dd[4])
#self.deconv3 = deconv(2, 2, kernel_size=4, stride=2, padding=1)
#self.upfeat3 = deconv(od+dd[4], 2, kernel_size=4, stride=2, padding=1)
od = nd+32+2
self.conv2_0 = conv(od, 128, kernel_size=3, stride=1)
self.conv2_1 = conv(dd[0], 128, kernel_size=3, stride=1)
self.conv2_2 = conv(dd[0]+dd[1],96, kernel_size=3, stride=1)
self.conv2_3 = conv(dd[1]+dd[2],64, kernel_size=3, stride=1)
self.conv2_4 = conv(dd[2]+dd[3],32, kernel_size=3, stride=1)
self.predict_flow2 = self.predict_flow(dd[3]+dd[4])
#self.deconv2 = deconv(2, 2, kernel_size=4, stride=2, padding=1)
self.dc_conv1 = conv(dd[4]+2, 128, kernel_size=3, stride=1, padding=1, dilation=1)
self.dc_conv2 = conv(128, 128, kernel_size=3, stride=1, padding=2, dilation=2)
self.dc_conv3 = conv(128, 128, kernel_size=3, stride=1, padding=4, dilation=4)
self.dc_conv4 = conv(128, 96, kernel_size=3, stride=1, padding=8, dilation=8)
self.dc_conv5 = conv(96, 64, kernel_size=3, stride=1, padding=16, dilation=16)
self.dc_conv6 = conv(64, 32, kernel_size=3, stride=1, padding=1, dilation=1)
self.dc_conv7 = self.predict_flow(32)
'''
for m in self.modules():
if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
nn.init.kaiming_zeros_(m.weight.data)
if m.bias is not None:
m.bias.data.zero_()
'''
def predict_flow(self, in_planes):
return nn.Conv2d(in_planes,2,kernel_size=3,stride=1,padding=1,bias=True)
def warp(self, x, flow):
return warp_flow(x, flow, use_mask=False)
def corr_naive(self, input1, input2, d=4):
# naive pytorch implementation of the correlation layer.
assert (input1.shape == input2.shape)
batch_size, feature_num, H, W = input1.shape[0:4]
input2 = F.pad(input2, (d,d,d,d), value=0)
cv = []
for i in range(2 * d + 1):
for j in range(2 * d + 1):
cv.append((input1 * input2[:, :, i:(i + H), j:(j + W)]).mean(1).unsqueeze(1))
return torch.cat(cv, 1)
def forward(self, feature_list_1, feature_list_2, img_hw):
c11, c12, c13, c14, c15, c16 = feature_list_1
c21, c22, c23, c24, c25, c26 = feature_list_2
corr6 = self.corr(c16, c26)
x0 = self.conv6_0(corr6)
x1 = self.conv6_1(x0)
x2 = self.conv6_2(torch.cat((x0,x1),1))
x3 = self.conv6_3(torch.cat((x1,x2),1))
x4 = self.conv6_4(torch.cat((x2,x3),1))
flow6 = self.predict_flow6(torch.cat((x3,x4),1))
up_flow6 = F.interpolate(flow6, scale_factor=2.0, mode='bilinear')*2.0
warp5 = self.warp(c25, up_flow6)
corr5 = self.corr(c15, warp5)
x = torch.cat((corr5, c15, up_flow6), 1)
x0 = self.conv5_0(x)
x1 = self.conv5_1(x0)
x2 = self.conv5_2(torch.cat((x0,x1),1))
x3 = self.conv5_3(torch.cat((x1,x2),1))
x4 = self.conv5_4(torch.cat((x2,x3),1))
flow5 = self.predict_flow5(torch.cat((x3,x4),1))
flow5 = flow5 + up_flow6
up_flow5 = F.interpolate(flow5, scale_factor=2.0, mode='bilinear')*2.0
warp4 = self.warp(c24, up_flow5)
corr4 = self.corr(c14, warp4)
x = torch.cat((corr4, c14, up_flow5), 1)
x0 = self.conv4_0(x)
x1 = self.conv4_1(x0)
x2 = self.conv4_2(torch.cat((x0,x1),1))
x3 = self.conv4_3(torch.cat((x1,x2),1))
x4 = self.conv4_4(torch.cat((x2,x3),1))
flow4 = self.predict_flow4(torch.cat((x3,x4),1))
flow4 = flow4 + up_flow5
up_flow4 = F.interpolate(flow4, scale_factor=2.0, mode='bilinear')*2.0
warp3 = self.warp(c23, up_flow4)
corr3 = self.corr(c13, warp3)
x = torch.cat((corr3, c13, up_flow4), 1)
x0 = self.conv3_0(x)
x1 = self.conv3_1(x0)
x2 = self.conv3_2(torch.cat((x0,x1),1))
x3 = self.conv3_3(torch.cat((x1,x2),1))
x4 = self.conv3_4(torch.cat((x2,x3),1))
flow3 = self.predict_flow3(torch.cat((x3,x4),1))
flow3 = flow3 + up_flow4
up_flow3 = F.interpolate(flow3, scale_factor=2.0, mode='bilinear')*2.0
warp2 = self.warp(c22, up_flow3)
corr2 = self.corr(c12, warp2)
x = torch.cat((corr2, c12, up_flow3), 1)
x0 = self.conv2_0(x)
x1 = self.conv2_1(x0)
x2 = self.conv2_2(torch.cat((x0,x1),1))
x3 = self.conv2_3(torch.cat((x1,x2),1))
x4 = self.conv2_4(torch.cat((x2,x3),1))
flow2 = self.predict_flow2(torch.cat((x3,x4),1))
flow2 = flow2 + up_flow3
x = self.dc_conv4(self.dc_conv3(self.dc_conv2(self.dc_conv1(torch.cat([flow2, x4], 1)))))
flow2 = flow2 + self.dc_conv7(self.dc_conv6(self.dc_conv5(x)))
img_h, img_w = img_hw[0], img_hw[1]
flow2 = F.interpolate(flow2 * 4.0, [img_h, img_w], mode='bilinear')
flow3 = F.interpolate(flow3 * 4.0, [img_h // 2, img_w // 2], mode='bilinear')
flow4 = F.interpolate(flow4 * 4.0, [img_h // 4, img_w // 4], mode='bilinear')
flow5 = F.interpolate(flow5 * 4.0, [img_h // 8, img_w // 8], mode='bilinear')
return [flow2, flow3, flow4, flow5]
================================================
FILE: core/networks/structures/ransac.py
================================================
import torch
import numpy as np
import os, sys
import torch.nn as nn
import pdb
import cv2
class reduced_ransac(nn.Module):
def __init__(self, check_num, thres, dataset):
super(reduced_ransac, self).__init__()
self.check_num = check_num
self.thres = thres
self.dataset = dataset
def robust_rand_sample(self, match, mask, num, robust=True):
# match: [b, 4, -1] mask: [b, 1, -1]
b, n = match.shape[0], match.shape[2]
nonzeros_num = torch.min(torch.sum(mask > 0, dim=-1)) # []
if nonzeros_num.detach().cpu().numpy() == n:
rand_int = torch.randint(0, n, [num])
select_match = match[:,:,rand_int]
else:
# If there is zero score in match, sample the non-zero matches.
select_idxs = []
if robust:
num = np.minimum(nonzeros_num.detach().cpu().numpy(), num)
for i in range(b):
nonzero_idx = torch.nonzero(mask[i,0,:]) # [nonzero_num,1]
rand_int = torch.randint(0, nonzero_idx.shape[0], [int(num)])
select_idx = nonzero_idx[rand_int, :] # [num, 1]
select_idxs.append(select_idx)
select_idxs = torch.stack(select_idxs, 0) # [b,num,1]
select_match = torch.gather(match.transpose(1,2), index=select_idxs.repeat(1,1,4), dim=1).transpose(1,2) # [b, 4, num]
return select_match, num
def top_ratio_sample(self, match, mask, ratio):
# match: [b, 4, -1] mask: [b, 1, -1]
b, total_num = match.shape[0], match.shape[-1]
scores, indices = torch.topk(mask, int(ratio*total_num), dim=-1) # [B, 1, ratio*tnum]
select_match = torch.gather(match.transpose(1,2), index=indices.squeeze(1).unsqueeze(-1).repeat(1,1,4), dim=1).transpose(1,2) # [b, 4, ratio*tnum]
return select_match, scores
def forward(self, match, mask, visualizer=None):
# match: [B, 4, H, W] mask: [B, 1, H, W]
b, h, w = match.shape[0], match.shape[2], match.shape[3]
match = match.view([b, 4, -1]).contiguous()
mask = mask.view([b, 1, -1]).contiguous()
# Sample matches for RANSAC 8-point and best F selection
top_ratio_match, top_ratio_mask = self.top_ratio_sample(match, mask, ratio=0.20) # [b, 4, ratio*H*W]
check_match, check_num = self.robust_rand_sample(top_ratio_match, top_ratio_mask, num=self.check_num) # [b, 4, check_num]
check_match = check_match.contiguous()
cv_f = []
for i in range(b):
if self.dataset == 'nyuv2':
f, m = cv2.findFundamentalMat(check_match[i,:2,:].transpose(0,1).detach().cpu().numpy(), check_match[i,2:,:].transpose(0,1).detach().cpu().numpy(), cv2.FM_LMEDS, 0.99)
else:
f, m = cv2.findFundamentalMat(check_match[i,:2,:].transpose(0,1).detach().cpu().numpy(), check_match[i,2:,:].transpose(0,1).detach().cpu().numpy(), cv2.FM_RANSAC, 0.1, 0.99)
cv_f.append(f)
cv_f = np.stack(cv_f, axis=0)
cv_f = torch.from_numpy(cv_f).float().to(match.get_device())
return cv_f
================================================
FILE: core/visualize/__init__.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from visualizer import Visualizer
from visualizer import Visualizer_debug
from profiler import Profiler
================================================
FILE: core/visualize/profiler.py
================================================
import os
import time
import torch
import pdb
class Profiler(object):
def __init__(self, silent=False):
self.silent = silent
torch.cuda.synchronize()
self.start = time.time()
self.cache_time = self.start
def reset(self, silent=None):
if silent is None:
silent = self.silent
self.__init__(silent=silent)
def report_process(self, process_name):
if self.silent:
return None
torch.cuda.synchronize()
now = time.time()
print('{0}\t: {1:.4f}'.format(process_name, now - self.cache_time))
self.cache_time = now
def report_all(self, whole_process_name):
if self.silent:
return None
torch.cuda.synchronize()
now = time.time()
print('{0}\t: {1:.4f}'.format(whole_process_name, now - self.start))
pdb.set_trace()
================================================
FILE: core/visualize/visualizer.py
================================================
import os, sys
import numpy as np
import cv2
import pdb
import pickle
from mpl_toolkits import mplot3d
import matplotlib.pyplot as plt
import PIL.Image as pil
import matplotlib as mpl
import matplotlib.cm as cm
colorlib = [(0,0,255),(255,0,0),(0,255,0),(255,255,0),(0,255,255),(255,0,255),(0,0,0),(255,255,255)]
class Visualizer(object):
def __init__(self, loss_weights_dict, dump_dir=None):
self.loss_weights_dict = loss_weights_dict
self.use_flow_error = (self.loss_weights_dict['flow_error'] > 0)
self.dump_dir = dump_dir
self.log_list = []
def add_log_pack(self, log_pack):
self.log_list.append(log_pack)
def dump_log(self, fname=None):
if fname is None:
fname = self.dump_dir
with open(fname, 'wb') as f:
pickle.dump(self.log_list, f)
def print_loss(self, loss_pack, iter_=None):
loss_pixel = loss_pack['loss_pixel'].mean().detach().cpu().numpy()
loss_ssim = loss_pack['loss_ssim'].mean().detach().cpu().numpy()
loss_flow_smooth = loss_pack['loss_flow_smooth'].mean().detach().cpu().numpy()
loss_flow_consis = loss_pack['loss_flow_consis'].mean().detach().cpu().numpy()
if 'pt_depth_loss' in loss_pack.keys():
loss_pt_depth = loss_pack['pt_depth_loss'].mean().detach().cpu().numpy()
loss_pj_depth = loss_pack['pj_depth_loss'].mean().detach().cpu().numpy()
loss_depth_smooth = loss_pack['depth_smooth_loss'].mean().detach().cpu().numpy()
str_= ('iter: {0}, loss_pixel: {1:.6f}, loss_ssim: {2:.6f}, loss_pt_depth: {3:.6f}, loss_pj_depth: {4:.6f}, los
gitextract_5bwpkhvi/ ├── LICENSE ├── README.md ├── config/ │ ├── kitti.yaml │ ├── kitti_3stage.yaml │ ├── nyu.yaml │ ├── nyu_192.yaml │ ├── nyu_192_3stage.yaml │ ├── nyu_3stage.yaml │ ├── nyu_posenet_192.yaml │ ├── odo.yaml │ ├── odo_3stage.yaml │ └── odo_posenet.yaml ├── core/ │ ├── config/ │ │ ├── __init__.py │ │ └── config_utils.py │ ├── dataset/ │ │ ├── __init__.py │ │ ├── kitti_2012.py │ │ ├── kitti_2015.py │ │ ├── kitti_odo.py │ │ ├── kitti_prepared.py │ │ ├── kitti_raw.py │ │ └── nyu_v2.py │ ├── evaluation/ │ │ ├── __init__.py │ │ ├── eval_odom.py │ │ ├── evaluate_depth.py │ │ ├── evaluate_flow.py │ │ ├── evaluate_mask.py │ │ ├── evaluation_utils.py │ │ └── flowlib.py │ ├── networks/ │ │ ├── __init__.py │ │ ├── model_depth_pose.py │ │ ├── model_flow.py │ │ ├── model_flowposenet.py │ │ ├── model_triangulate_pose.py │ │ ├── pytorch_ssim/ │ │ │ ├── __init__.py │ │ │ └── ssim.py │ │ └── structures/ │ │ ├── __init__.py │ │ ├── depth_model.py │ │ ├── feature_pyramid.py │ │ ├── flowposenet.py │ │ ├── inverse_warp.py │ │ ├── net_utils.py │ │ ├── pwc_tf.py │ │ └── ransac.py │ └── visualize/ │ ├── __init__.py │ ├── profiler.py │ └── visualizer.py ├── data/ │ └── eigen/ │ ├── export_gt_depth.py │ ├── static_frames.txt │ ├── test_files.txt │ └── test_scenes.txt ├── infer_vo.py ├── requirements.txt ├── test.py └── train.py
SYMBOL INDEX (306 symbols across 32 files)
FILE: core/config/config_utils.py
function generate_loss_weights_dict (line 3) | def generate_loss_weights_dict(cfg):
FILE: core/dataset/kitti_2012.py
class KITTI_2012 (line 13) | class KITTI_2012(KITTI_Prepared):
method __init__ (line 14) | def __init__(self, data_dir, img_hw=(256, 832), init=True):
method get_data_list (line 21) | def get_data_list(self):
method __len__ (line 31) | def __len__(self):
method read_cam_intrinsic (line 34) | def read_cam_intrinsic(self, calib_file):
method __getitem__ (line 38) | def __getitem__(self, idx):
FILE: core/dataset/kitti_2015.py
class KITTI_2015 (line 5) | class KITTI_2015(KITTI_2012):
method __init__ (line 6) | def __init__(self, data_dir, img_hw=(256, 832)):
FILE: core/dataset/kitti_odo.py
function process_folder (line 7) | def process_folder(q, data_dir, output_dir, stride=1):
class KITTI_Odo (line 33) | class KITTI_Odo(object):
method __init__ (line 34) | def __init__(self, data_dir):
method __len__ (line 38) | def __len__(self):
method prepare_data_mp (line 41) | def prepare_data_mp(self, output_dir, stride=1):
method __getitem__ (line 76) | def __getitem__(self, idx):
FILE: core/dataset/kitti_prepared.py
class KITTI_Prepared (line 10) | class KITTI_Prepared(torch.utils.data.Dataset):
method __init__ (line 11) | def __init__(self, data_dir, num_scales=3, img_hw=(256, 832), num_iter...
method get_data_list (line 22) | def get_data_list(self, info_file):
method count (line 35) | def count(self):
method rand_num (line 38) | def rand_num(self, idx):
method __len__ (line 44) | def __len__(self):
method resize_img (line 50) | def resize_img(self, img, img_hw):
method random_flip_img (line 63) | def random_flip_img(self, img):
method preprocess_img (line 69) | def preprocess_img(self, img, img_hw=None, is_test=False):
method read_cam_intrinsic (line 78) | def read_cam_intrinsic(self, fname):
method rescale_intrinsics (line 87) | def rescale_intrinsics(self, K, img_hw_orig, img_hw_new):
method get_intrinsics_per_scale (line 92) | def get_intrinsics_per_scale(self, K, scale):
method get_multiscale_intrinsics (line 99) | def get_multiscale_intrinsics(self, K, num_scales):
method __getitem__ (line 109) | def __getitem__(self, idx):
FILE: core/dataset/kitti_raw.py
function process_folder (line 8) | def process_folder(q, static_frames, test_scenes, data_dir, output_dir, ...
class KITTI_RAW (line 45) | class KITTI_RAW(object):
method __init__ (line 46) | def __init__(self, data_dir, static_frames_txt, test_scenes_txt):
method __len__ (line 51) | def __len__(self):
method collect_static_frame (line 54) | def collect_static_frame(self):
method collect_test_scenes (line 66) | def collect_test_scenes(self):
method prepare_data_mp (line 74) | def prepare_data_mp(self, output_dir, stride=1):
method prepare_data (line 122) | def prepare_data(self, output_dir):
method __getitem__ (line 177) | def __getitem__(self, idx):
FILE: core/dataset/nyu_v2.py
function collect_image_list (line 14) | def collect_image_list(path):
function process_folder (line 24) | def process_folder(q, data_dir, output_dir, stride, train_scenes):
class NYU_Prepare (line 68) | class NYU_Prepare(object):
method __init__ (line 69) | def __init__(self, data_dir, test_dir):
method __len__ (line 78) | def __len__(self):
method get_all_scenes (line 81) | def get_all_scenes(self):
method get_test_scenes (line 90) | def get_test_scenes(self):
method get_train_scenes (line 104) | def get_train_scenes(self):
method prepare_data_mp (line 118) | def prepare_data_mp(self, output_dir, stride=1):
method __getitem__ (line 163) | def __getitem__(self, idx):
class NYU_v2 (line 168) | class NYU_v2(torch.utils.data.Dataset):
method __init__ (line 169) | def __init__(self, data_dir, num_scales=3, img_hw=(448, 576), num_iter...
method get_data_list (line 182) | def get_data_list(self, info_file):
method count (line 195) | def count(self):
method rand_num (line 198) | def rand_num(self, idx):
method __len__ (line 204) | def __len__(self):
method resize_img (line 210) | def resize_img(self, img, img_hw):
method random_flip_img (line 223) | def random_flip_img(self, img):
method undistort_img (line 229) | def undistort_img(self, img, K):
method preprocess_img (line 250) | def preprocess_img(self, img, K, img_hw=None, is_test=False):
method read_cam_intrinsic (line 262) | def read_cam_intrinsic(self, fname):
method rescale_intrinsics (line 271) | def rescale_intrinsics(self, K, img_hw_orig, img_hw_new):
method get_intrinsics_per_scale (line 277) | def get_intrinsics_per_scale(self, K, scale):
method get_multiscale_intrinsics (line 284) | def get_multiscale_intrinsics(self, K, num_scales):
method __getitem__ (line 294) | def __getitem__(self, idx):
FILE: core/evaluation/eval_odom.py
function scale_lse_solver (line 9) | def scale_lse_solver(X, Y):
function umeyama_alignment (line 22) | def umeyama_alignment(x, y, with_scale=False):
class KittiEvalOdom (line 72) | class KittiEvalOdom():
method __init__ (line 77) | def __init__(self):
method loadPoses (line 81) | def loadPoses(self, file_name):
method trajectory_distances (line 106) | def trajectory_distances(self, poses):
method rotation_error (line 123) | def rotation_error(self, pose_error):
method translation_error (line 131) | def translation_error(self, pose_error):
method last_frame_from_segment_length (line 137) | def last_frame_from_segment_length(self, dist, first_frame, len_):
method calc_sequence_errors (line 143) | def calc_sequence_errors(self, poses_gt, poses_result):
method save_sequence_errors (line 178) | def save_sequence_errors(self, err, file_name):
method compute_overall_err (line 185) | def compute_overall_err(self, seq_err):
method plotPath (line 198) | def plotPath(self, seq, poses_gt, poses_result):
method compute_segment_error (line 230) | def compute_segment_error(self, seq_errs):
method scale_optimization (line 259) | def scale_optimization(self, gt, pred):
method eval (line 282) | def eval(self, gt_txt, result_txt, seq=None):
FILE: core/evaluation/evaluate_depth.py
function process_depth (line 3) | def process_depth(gt_depth, pred_depth, min_depth, max_depth):
function eval_depth (line 13) | def eval_depth(gt_depths,
FILE: core/evaluation/evaluate_flow.py
function get_scaled_intrinsic_matrix (line 9) | def get_scaled_intrinsic_matrix(calib_file, zoom_x, zoom_y):
function load_intrinsics_raw (line 19) | def load_intrinsics_raw(calib_file):
function read_raw_calib_file (line 29) | def read_raw_calib_file(filepath):
function scale_intrinsics (line 45) | def scale_intrinsics(mat, sx, sy):
function read_flow_gt_worker (line 53) | def read_flow_gt_worker(dir_gt, i):
function load_gt_flow_kitti (line 60) | def load_gt_flow_kitti(gt_dataset_dir, mode):
function calculate_error_rate (line 85) | def calculate_error_rate(epe_map, gt_flow, mask):
function eval_flow_avg (line 93) | def eval_flow_avg(gt_flows,
FILE: core/evaluation/evaluate_mask.py
class EvalSegErr (line 12) | class EvalSegErr(Exception):
method __init__ (line 13) | def __init__(self, value):
method __str__ (line 16) | def __str__(self):
function pixel_accuracy (line 20) | def pixel_accuracy(eval_segm, gt_segm):
function mean_accuracy (line 48) | def mean_accuracy(eval_segm, gt_segm):
function mean_IU (line 74) | def mean_IU(eval_segm, gt_segm):
function frequency_weighted_IU (line 104) | def frequency_weighted_IU(eval_segm, gt_segm):
function get_pixel_area (line 140) | def get_pixel_area(segm):
function extract_both_masks (line 144) | def extract_both_masks(eval_segm, gt_segm, cl, n_cl):
function extract_classes (line 151) | def extract_classes(segm):
function union_classes (line 158) | def union_classes(eval_segm, gt_segm):
function extract_masks (line 168) | def extract_masks(segm, cl, n_cl):
function segm_size (line 178) | def segm_size(segm):
function check_size (line 188) | def check_size(eval_segm, gt_segm):
function read_mask_gt_worker (line 195) | def read_mask_gt_worker(gt_dataset_dir, idx):
function load_gt_mask (line 199) | def load_gt_mask(gt_dataset_dir):
function eval_mask (line 216) | def eval_mask(pred_masks, gt_masks, opt):
FILE: core/evaluation/evaluation_utils.py
function compute_errors (line 11) | def compute_errors(gt, pred, nyu=False):
FILE: core/evaluation/flowlib.py
function show_flow (line 29) | def show_flow(filename):
function visualize_flow (line 41) | def visualize_flow(flow, mode='Y'):
function read_flow (line 84) | def read_flow(filename):
function read_flow_png (line 107) | def read_flow_png(flow_file):
function write_flow_png (line 130) | def write_flow_png(flo, flow_file):
function write_flow (line 147) | def write_flow(flow, filename):
function segment_flow (line 166) | def segment_flow(flow):
function flow_error (line 203) | def flow_error(tu, tv, u, v):
function flow_to_image (line 258) | def flow_to_image(flow):
function evaluate_flow_file (line 299) | def evaluate_flow_file(gt, pred):
function evaluate_flow (line 315) | def evaluate_flow(gt_flow, pred_flow):
function read_disp_png (line 332) | def read_disp_png(file_name):
function disp_to_flowfile (line 350) | def disp_to_flowfile(disp, filename):
function read_image (line 378) | def read_image(filename):
function warp_image (line 389) | def warp_image(im, flow):
function scale_image (line 428) | def scale_image(image, new_range):
function compute_color (line 444) | def compute_color(u, v):
function make_color_wheel (line 488) | def make_color_wheel():
FILE: core/networks/__init__.py
function get_model (line 8) | def get_model(mode):
FILE: core/networks/model_depth_pose.py
class Model_depth_pose (line 14) | class Model_depth_pose(nn.Module):
method __init__ (line 15) | def __init__(self, cfg):
method meshgrid (line 26) | def meshgrid(self, h, w):
method robust_rand_sample (line 32) | def robust_rand_sample(self, match, mask, num):
method top_ratio_sample (line 52) | def top_ratio_sample(self, match, mask, ratio):
method rand_sample (line 59) | def rand_sample(self, match, num):
method filt_negative_depth (line 65) | def filt_negative_depth(self, point2d_1_depth, point2d_2_depth, point2...
method filt_invalid_coord (line 91) | def filt_invalid_coord(self, point2d_1_depth, point2d_2_depth, point2d...
method ray_angle_filter (line 123) | def ray_angle_filter(self, match, P1, P2, return_angle=False):
method midpoint_triangulate (line 168) | def midpoint_triangulate(self, match, K_inv, P1, P2):
method rt_from_fundamental_mat_nyu (line 199) | def rt_from_fundamental_mat_nyu(self, fmat, K, depth_match):
method verifyRT (line 228) | def verifyRT(self, match, K_inv, P1, P2):
method rt_from_fundamental_mat (line 239) | def rt_from_fundamental_mat(self, fmat, K, depth_match):
method reproject (line 277) | def reproject(self, P, point3d):
method scale_adapt (line 284) | def scale_adapt(self, depth1, depth2, eps=1e-12):
method affine_adapt (line 291) | def affine_adapt(self, depth1, depth2, use_translation=True, eps=1e-12):
method register_depth (line 312) | def register_depth(self, depth_pred, coord_tri, depth_tri):
method get_trian_loss (line 331) | def get_trian_loss(self, tri_depth, pred_tri_depth):
method get_reproj_fdp_loss (line 336) | def get_reproj_fdp_loss(self, pred1, pred2, P2, K, K_inv, valid_mask, ...
method disp2depth (line 359) | def disp2depth(self, disp, min_depth=0.1, max_depth=100.0):
method get_smooth_loss (line 366) | def get_smooth_loss(self, img, disp):
method infer_depth (line 382) | def infer_depth(self, img):
method infer_vo (line 387) | def infer_vo(self, img1, img2, K, K_inv, match_num=6000):
method check_rt (line 402) | def check_rt(self, img1, img2, K, K_inv):
method inference (line 426) | def inference(self, img1, img2, K, K_inv):
method forward (line 463) | def forward(self, inputs):
FILE: core/networks/model_flow.py
function transformerFwd (line 12) | def transformerFwd(U,
class Model_flow (line 151) | class Model_flow(nn.Module):
method __init__ (line 152) | def __init__(self, cfg):
method get_occlusion_mask_from_flow (line 169) | def get_occlusion_mask_from_flow(self, tensor_size, flow):
method get_flow_norm (line 177) | def get_flow_norm(self, flow, p=2):
method get_visible_masks (line 185) | def get_visible_masks(self, optical_flows, optical_flows_rev):
method get_consistent_masks (line 195) | def get_consistent_masks(self, optical_flows, optical_flows_rev):
method generate_img_pyramid (line 219) | def generate_img_pyramid(self, img, num_pyramid):
method warp_flow_pyramid (line 227) | def warp_flow_pyramid(self, img_pyramid, flow_pyramid):
method compute_loss_pixel (line 233) | def compute_loss_pixel(self, img_pyramid, img_warped_pyramid, occ_mask...
method compute_loss_ssim (line 244) | def compute_loss_ssim(self, img_pyramid, img_warped_pyramid, occ_mask_...
method gradients (line 257) | def gradients(self, img):
method cal_grad2_error (line 262) | def cal_grad2_error(self, flow, img):
method compute_loss_flow_smooth (line 273) | def compute_loss_flow_smooth(self, optical_flows, img_pyramid):
method compute_loss_flow_consis (line 282) | def compute_loss_flow_consis(self, fwd_flow_diff_pyramid, occ_mask_list):
method inference_flow (line 293) | def inference_flow(self, img1, img2):
method inference_corres (line 299) | def inference_corres(self, img1, img2):
method forward (line 319) | def forward(self, inputs, output_flow=False, use_flow_loss=True):
FILE: core/networks/model_flowposenet.py
function mean_on_mask (line 16) | def mean_on_mask(diff, valid_mask):
function edge_aware_smoothness_loss (line 22) | def edge_aware_smoothness_loss(pred_disp, img, max_scales):
function compute_smooth_loss (line 64) | def compute_smooth_loss(tgt_depth, tgt_img, ref_depth, ref_img, max_scal...
class Model_flowposenet (line 71) | class Model_flowposenet(nn.Module):
method __init__ (line 72) | def __init__(self, cfg):
method compute_pairwise_loss (line 79) | def compute_pairwise_loss(self, tgt_img, ref_img, tgt_depth, ref_depth...
method disp2depth (line 106) | def disp2depth(self, disp, min_depth=0.01, max_depth=80.0):
method infer_depth (line 113) | def infer_depth(self, img):
method inference (line 119) | def inference(self, img1, img2, K, K_inv):
method inference_flow (line 123) | def inference_flow(self, img1, img2):
method infer_pose (line 127) | def infer_pose(self, img1, img2, K, K_inv):
method forward (line 135) | def forward(self, inputs):
FILE: core/networks/model_triangulate_pose.py
class Model_triangulate_pose (line 11) | class Model_triangulate_pose(nn.Module):
method __init__ (line 12) | def __init__(self, cfg):
method meshgrid (line 24) | def meshgrid(self, h, w):
method compute_epipolar_loss (line 30) | def compute_epipolar_loss(self, fmat, match, mask):
method get_rigid_mask (line 53) | def get_rigid_mask(self, dist_map):
method inference (line 59) | def inference(self, img1, img2, K, K_inv):
method forward (line 76) | def forward(self, inputs, output_F=False, visualizer=None):
FILE: core/networks/pytorch_ssim/ssim.py
function SSIM (line 4) | def SSIM(x, y):
FILE: core/networks/structures/depth_model.py
class ResNetMultiImageInput (line 17) | class ResNetMultiImageInput(models.ResNet):
method __init__ (line 21) | def __init__(self, block, layers, num_classes=1000, num_input_images=1):
function resnet_multiimage_input (line 41) | def resnet_multiimage_input(num_layers, pretrained=False, num_input_imag...
class ResnetEncoder (line 60) | class ResnetEncoder(nn.Module):
method __init__ (line 63) | def __init__(self, num_layers, pretrained, num_input_images=1):
method forward (line 85) | def forward(self, input_image):
class ConvBlock (line 97) | class ConvBlock(nn.Module):
method __init__ (line 100) | def __init__(self, in_channels, out_channels):
method forward (line 106) | def forward(self, x):
class Conv3x3 (line 111) | class Conv3x3(nn.Module):
method __init__ (line 114) | def __init__(self, in_channels, out_channels, use_refl=True):
method forward (line 123) | def forward(self, x):
function upsample (line 128) | def upsample(x):
class DepthDecoder (line 135) | class DepthDecoder(nn.Module):
method __init__ (line 136) | def __init__(self, num_ch_enc, scales=range(4), num_output_channels=1,...
method init_decoder (line 149) | def init_decoder(self):
method forward (line 173) | def forward(self, input_features):
class Depth_Model (line 193) | class Depth_Model(nn.Module):
method __init__ (line 194) | def __init__(self, depth_scale, num_layers=18):
method forward (line 200) | def forward(self, img):
FILE: core/networks/structures/feature_pyramid.py
class FeaturePyramid (line 7) | class FeaturePyramid(nn.Module):
method __init__ (line 8) | def __init__(self):
method forward (line 29) | def forward(self, img):
FILE: core/networks/structures/flowposenet.py
function conv (line 7) | def conv(in_planes, out_planes, kernel_size=3):
function upconv (line 13) | def upconv(in_planes, out_planes):
class FlowPoseNet (line 19) | class FlowPoseNet(nn.Module):
method __init__ (line 21) | def __init__(self):
method init_weights (line 35) | def init_weights(self):
method forward (line 42) | def forward(self, flow):
FILE: core/networks/structures/inverse_warp.py
function set_id_grid (line 8) | def set_id_grid(depth):
function check_sizes (line 20) | def check_sizes(input, input_name, expected):
function pixel2cam (line 29) | def pixel2cam(depth, intrinsics_inv):
function cam2pixel (line 47) | def cam2pixel(cam_coords, proj_c2p_rot, proj_c2p_tr, padding_mode):
function euler2mat (line 77) | def euler2mat(angle):
function quat2mat (line 115) | def quat2mat(quat):
function pose_vec2mat (line 139) | def pose_vec2mat(vec, rotation_mode='euler'):
function inverse_warp (line 157) | def inverse_warp(img, depth, pose, intrinsics, rotation_mode='euler', pa...
function cam2pixel2 (line 194) | def cam2pixel2(cam_coords, proj_c2p_rot, proj_c2p_tr, padding_mode):
function inverse_warp2 (line 230) | def inverse_warp2(img, depth, ref_depth, pose, intrinsics, padding_mode=...
FILE: core/networks/structures/net_utils.py
function conv (line 7) | def conv(in_planes, out_planes, kernel_size=3, stride=1, padding=1, dila...
function deconv (line 13) | def deconv(in_planes, out_planes, kernel_size=4, stride=2, padding=1):
function warp_flow (line 16) | def warp_flow(x, flow, use_mask=False):
FILE: core/networks/structures/pwc_tf.py
class PWC_tf (line 16) | class PWC_tf(nn.Module):
method __init__ (line 17) | def __init__(self, md=4):
method predict_flow (line 91) | def predict_flow(self, in_planes):
method warp (line 94) | def warp(self, x, flow):
method corr_naive (line 97) | def corr_naive(self, input1, input2, d=4):
method forward (line 108) | def forward(self, feature_list_1, feature_list_2, img_hw):
FILE: core/networks/structures/ransac.py
class reduced_ransac (line 8) | class reduced_ransac(nn.Module):
method __init__ (line 9) | def __init__(self, check_num, thres, dataset):
method robust_rand_sample (line 15) | def robust_rand_sample(self, match, mask, num, robust=True):
method top_ratio_sample (line 36) | def top_ratio_sample(self, match, mask, ratio):
method forward (line 44) | def forward(self, match, mask, visualizer=None):
FILE: core/visualize/profiler.py
class Profiler (line 6) | class Profiler(object):
method __init__ (line 7) | def __init__(self, silent=False):
method reset (line 13) | def reset(self, silent=None):
method report_process (line 18) | def report_process(self, process_name):
method report_all (line 26) | def report_all(self, whole_process_name):
FILE: core/visualize/visualizer.py
class Visualizer (line 15) | class Visualizer(object):
method __init__ (line 16) | def __init__(self, loss_weights_dict, dump_dir=None):
method add_log_pack (line 23) | def add_log_pack(self, log_pack):
method dump_log (line 26) | def dump_log(self, fname=None):
method print_loss (line 32) | def print_loss(self, loss_pack, iter_=None):
class Visualizer_debug (line 50) | class Visualizer_debug():
method __init__ (line 51) | def __init__(self, dump_dir=None, img1=None, img2=None):
method draw_point_corres (line 56) | def draw_point_corres(self, batch_idx, match, name):
method draw_invalid_corres_ray (line 62) | def draw_invalid_corres_ray(self, img1, img2, depth_match, point2d_1_c...
method draw_epipolar_line (line 69) | def draw_epipolar_line(self, batch_idx, match, F, name):
method show_corres (line 76) | def show_corres(self, img1, img2, match, name):
method show_mask (line 93) | def show_mask(self, mask, name):
method save_img (line 98) | def save_img(self, img, name):
method save_depth_img (line 101) | def save_depth_img(self, depth, name):
method save_disp_color_img (line 109) | def save_disp_color_img(self, disp, name):
method drawlines (line 120) | def drawlines(self, img1, img2, lines, pts1, pts2):
method show_epipolar_line (line 133) | def show_epipolar_line(self, img1, img2, match, F, name):
method show_ray (line 153) | def show_ray(self, ax, K, RT, point2d, cmap='Greens'):
method visualize_points (line 166) | def visualize_points(self, ax, points, cmap=None):
method scatter_3d (line 171) | def scatter_3d(self, ax, point, scatter_color='r'):
method visualize_two_rays (line 174) | def visualize_two_rays(self, ax, match, P1, P2):
FILE: data/eigen/export_gt_depth.py
function load_velodyne_points (line 19) | def load_velodyne_points(filename):
function read_calib_file (line 28) | def read_calib_file(path):
function sub2ind (line 50) | def sub2ind(matrixSize, rowSub, colSub):
function generate_depth_map (line 57) | def generate_depth_map(calib_dir, velo_filename, cam=2, vel_depth=False):
function export_gt_depths_kitti (line 112) | def export_gt_depths_kitti():
FILE: infer_vo.py
function save_traj (line 19) | def save_traj(path, poses):
function projection (line 31) | def projection(xy, points, h_max, w_max):
function unprojection (line 47) | def unprojection(xy, depth, K):
function cv_triangulation (line 62) | def cv_triangulation(matches, pose):
class infer_vo (line 76) | class infer_vo():
method __init__ (line 77) | def __init__(self, seq_id, sequences_root_dir):
method read_rescale_camera_intrinsics (line 99) | def read_rescale_camera_intrinsics(self, path):
method load_images (line 114) | def load_images(self):
method get_prediction (line 129) | def get_prediction(self, img1, img2, model, K, K_inv, match_num):
method process_video (line 141) | def process_video(self, images, model):
method normalize_coord (line 175) | def normalize_coord(self, xy, K):
method align_to_depth (line 182) | def align_to_depth(self, xy1, xy2, pose, depth2):
method solve_pose_pnp (line 213) | def solve_pose_pnp(self, xy1, xy2, depth1):
method solve_pose_flow (line 255) | def solve_pose_flow(self, xy1, xy2):
class pObject (line 311) | class pObject(object):
method __init__ (line 312) | def __init__(self):
FILE: test.py
function test_kitti_2012 (line 16) | def test_kitti_2012(cfg, model, gt_flows, noc_masks):
function test_kitti_2015 (line 43) | def test_kitti_2015(cfg, model, gt_flows, noc_masks, gt_masks, depth_sav...
function disp2depth (line 78) | def disp2depth(disp, min_depth=0.001, max_depth=80.0):
function resize_depths (line 85) | def resize_depths(gt_depth_list, pred_disp_list):
function test_eigen_depth (line 99) | def test_eigen_depth(cfg, model):
function resize_disp (line 130) | def resize_disp(pred_disp_list, gt_depths):
function load_nyu_test_data (line 143) | def load_nyu_test_data(data_dir):
function test_nyu (line 153) | def test_nyu(cfg, model, test_images, test_gt_depths):
function test_single_image (line 185) | def test_single_image(img_path, model, training_hw, save_dir='./'):
class pObject (line 227) | class pObject(object):
method __init__ (line 228) | def __init__(self):
FILE: train.py
function save_model (line 19) | def save_model(iter_, model_dir, filename, model, optimizer):
function load_model (line 22) | def load_model(model_dir, filename, model, optimizer):
function train (line 29) | def train(cfg):
class pObject (line 204) | class pObject(object):
method __init__ (line 205) | def __init__(self):
Condensed preview — 54 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (690K chars).
[
{
"path": "LICENSE",
"chars": 1068,
"preview": "MIT License\n\nCopyright (c) 2021 Shaohui Liu\n\nPermission is hereby granted, free of charge, to any person obtaining a cop"
},
{
"path": "README.md",
"chars": 7085,
"preview": "## Towards Better Generalization: Joint Depth-Pose Learning without PoseNet\nCreated by <a href=\"https://github.com/thuzh"
},
{
"path": "config/kitti.yaml",
"chars": 874,
"preview": "cfg_name: 'default'\n\n# dataset\nraw_base_dir: '/home4/zhaow/data/kitti'\nprepared_base_dir: '/home5/zhaow/data/kitti_relea"
},
{
"path": "config/kitti_3stage.yaml",
"chars": 978,
"preview": "cfg_name: 'default'\n\n# dataset\nraw_base_dir: '/home/zhaow/data/kitti'\nprepared_base_dir: '/home/zhaow/data/kitti_seq'\ngt"
},
{
"path": "config/nyu.yaml",
"chars": 670,
"preview": "cfg_name: 'default'\n\n# dataset\nraw_base_dir: '/home5/zhaow/data/nyuv2'\nprepared_base_dir: '/home5/zhaow/data/nyu_seq_rel"
},
{
"path": "config/nyu_192.yaml",
"chars": 670,
"preview": "cfg_name: 'default'\n\n# dataset\nraw_base_dir: '/home5/zhaow/data/nyuv2'\nprepared_base_dir: '/home5/zhaow/data/nyu_seq_rel"
},
{
"path": "config/nyu_192_3stage.yaml",
"chars": 698,
"preview": "cfg_name: 'default'\n\n# dataset\nraw_base_dir: '/home5/zhaow/data/nyuv2'\nprepared_base_dir: '/home5/zhaow/data/nyu_seq_rel"
},
{
"path": "config/nyu_3stage.yaml",
"chars": 698,
"preview": "cfg_name: 'default'\n\n# dataset\nraw_base_dir: '/home5/zhaow/data/nyuv2'\nprepared_base_dir: '/home5/zhaow/data/nyu_seq_rel"
},
{
"path": "config/nyu_posenet_192.yaml",
"chars": 671,
"preview": "cfg_name: 'default'\n\n# dataset\nraw_base_dir: '/home5/zhaow/data/nyuv2_sub2'\nprepared_base_dir: '/home5/zhaow/data/nyu_se"
},
{
"path": "config/odo.yaml",
"chars": 740,
"preview": "cfg_name: 'default'\n\n# dataset\nraw_base_dir: '/home4/zhaow/data/kitti_odometry/sequences'\nprepared_base_dir: '/home5/zha"
},
{
"path": "config/odo_3stage.yaml",
"chars": 751,
"preview": "cfg_name: 'default'\n\n# dataset\nraw_base_dir: '/home4/zhaow/data/kitti_odometry/sequences'\nprepared_base_dir: '/home5/zha"
},
{
"path": "config/odo_posenet.yaml",
"chars": 748,
"preview": "cfg_name: 'default'\n\n# dataset\nraw_base_dir: '/home4/zhaow/data/kitti_odometry/sequences'\nprepared_base_dir: '/home5/zha"
},
{
"path": "core/config/__init__.py",
"chars": 128,
"preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom config_utils import generate_loss_weight"
},
{
"path": "core/config/config_utils.py",
"chars": 546,
"preview": "import os, sys\n\ndef generate_loss_weights_dict(cfg):\n weight_dict = {}\n weight_dict['loss_pixel'] = 1 - cfg.w_ssim"
},
{
"path": "core/dataset/__init__.py",
"chars": 287,
"preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom kitti_raw import KITTI_RAW\nfrom kitti_pr"
},
{
"path": "core/dataset/kitti_2012.py",
"chars": 2288,
"preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom kitti_prepared import KITTI_Prepared\nsys"
},
{
"path": "core/dataset/kitti_2015.py",
"chars": 378,
"preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom kitti_2012 import KITTI_2012\n\nclass KITT"
},
{
"path": "core/dataset/kitti_odo.py",
"chars": 3745,
"preview": "import os, sys\nimport numpy as np\nimport imageio\nfrom tqdm import tqdm\nimport torch.multiprocessing as mp\n\ndef process_f"
},
{
"path": "core/dataset/kitti_prepared.py",
"chars": 4630,
"preview": "import os, sys\nimport numpy as np\nimport cv2\nimport copy\n\nimport torch\nimport torch.utils.data\nimport pdb\n\nclass KITTI_P"
},
{
"path": "core/dataset/kitti_raw.py",
"chars": 8568,
"preview": "import os, sys\nimport numpy as np\nimport imageio\nfrom tqdm import tqdm\nimport torch.multiprocessing as mp\nimport pdb\n\nde"
},
{
"path": "core/dataset/nyu_v2.py",
"chars": 13298,
"preview": "import os, sys\nimport numpy as np\nimport imageio\nimport cv2\nimport copy\nimport h5py\nimport scipy.io as sio\nimport torch\n"
},
{
"path": "core/evaluation/__init__.py",
"chars": 212,
"preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom evaluate_flow import eval_flow_avg, load"
},
{
"path": "core/evaluation/eval_odom.py",
"chars": 13822,
"preview": "import copy\nfrom matplotlib import pyplot as plt\nimport numpy as np\nimport os\nfrom glob import glob\nimport pdb\n\n\ndef sca"
},
{
"path": "core/evaluation/evaluate_depth.py",
"chars": 1977,
"preview": "from evaluation_utils import *\n\ndef process_depth(gt_depth, pred_depth, min_depth, max_depth):\n mask = gt_depth > 0\n "
},
{
"path": "core/evaluation/evaluate_flow.py",
"chars": 6496,
"preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nimport numpy as np\nfrom flowlib import read_f"
},
{
"path": "core/evaluation/evaluate_mask.py",
"chars": 6269,
"preview": "import os\nimport numpy as np\nimport cv2\nimport functools\nimport matplotlib.pyplot as plt\nimport multiprocessing\n\n\"\"\"\nAdo"
},
{
"path": "core/evaluation/evaluation_utils.py",
"chars": 870,
"preview": "import numpy as np\nimport os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nimport cv2, skimage\nimport"
},
{
"path": "core/evaluation/flowlib.py",
"chars": 14420,
"preview": "#!/usr/bin/python\n\"\"\"\nAdopted from https://github.com/liruoteng/OpticalFlowToolkit\n# ==============================\n# fl"
},
{
"path": "core/networks/__init__.py",
"chars": 635,
"preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom model_flow import Model_flow\nfrom model_"
},
{
"path": "core/networks/model_depth_pose.py",
"chars": 30237,
"preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom structures import *\nfrom model_triangula"
},
{
"path": "core/networks/model_flow.py",
"chars": 18005,
"preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom structures import *\nfrom pytorch_ssim im"
},
{
"path": "core/networks/model_flowposenet.py",
"chars": 6517,
"preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom structures import *\nfrom pytorch_ssim im"
},
{
"path": "core/networks/model_triangulate_pose.py",
"chars": 5969,
"preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nimport torch\nimport torch.nn as nn\nimport num"
},
{
"path": "core/networks/pytorch_ssim/__init__.py",
"chars": 98,
"preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom ssim import SSIM\n\n"
},
{
"path": "core/networks/pytorch_ssim/ssim.py",
"chars": 535,
"preview": "import torch\nimport torch.nn as nn\n\ndef SSIM(x, y):\n C1 = 0.01 ** 2\n C2 = 0.03 ** 2\n\n mu_x = nn.AvgPool2d(3, 1,"
},
{
"path": "core/networks/structures/__init__.py",
"chars": 335,
"preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom feature_pyramid import FeaturePyramid\nfr"
},
{
"path": "core/networks/structures/depth_model.py",
"chars": 7964,
"preview": "'''\nThis code was ported from existing repos\n[LINK] https://github.com/nianticlabs/monodepth2\n'''\nfrom __future__ import"
},
{
"path": "core/networks/structures/feature_pyramid.py",
"chars": 1586,
"preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom net_utils import conv\nimport torch\nimpor"
},
{
"path": "core/networks/structures/flowposenet.py",
"chars": 1951,
"preview": "import torch\nimport torch.nn as nn\nfrom torch import sigmoid\nfrom torch.nn.init import xavier_uniform_, zeros_\n\n\ndef con"
},
{
"path": "core/networks/structures/inverse_warp.py",
"chars": 10018,
"preview": "from __future__ import division\nimport torch\nimport torch.nn.functional as F\n\npixel_coords = None\n\n\ndef set_id_grid(dept"
},
{
"path": "core/networks/structures/net_utils.py",
"chars": 2088,
"preview": "import torch\nimport torch.nn as nn\nfrom torch.autograd import Variable\nimport pdb\nimport numpy as np\n\ndef conv(in_planes"
},
{
"path": "core/networks/structures/pwc_tf.py",
"chars": 8423,
"preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom net_utils import conv, deconv, warp_flow"
},
{
"path": "core/networks/structures/ransac.py",
"chars": 3145,
"preview": "import torch\nimport numpy as np\nimport os, sys\nimport torch.nn as nn\nimport pdb\nimport cv2\n\nclass reduced_ransac(nn.Modu"
},
{
"path": "core/visualize/__init__.py",
"chars": 179,
"preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom visualizer import Visualizer\nfrom visual"
},
{
"path": "core/visualize/profiler.py",
"chars": 887,
"preview": "import os\nimport time\nimport torch\nimport pdb\n\nclass Profiler(object):\n def __init__(self, silent=False):\n sel"
},
{
"path": "core/visualize/visualizer.py",
"chars": 8652,
"preview": "import os, sys\nimport numpy as np\nimport cv2\nimport pdb\nimport pickle\nfrom mpl_toolkits import mplot3d\nimport matplotlib"
},
{
"path": "data/eigen/export_gt_depth.py",
"chars": 4963,
"preview": "# Copyright Niantic 2019. Patent Pending. All rights reserved.\n#\n# This software is licensed under the terms of the Mono"
},
{
"path": "data/eigen/static_frames.txt",
"chars": 394155,
"preview": "2011_09_26 2011_09_26_drive_0009_sync 0000000386\n2011_09_26 2011_09_26_drive_0009_sync 0000000387\n2011_09_26 2011_09_26_"
},
{
"path": "data/eigen/test_files.txt",
"chars": 35547,
"preview": "2011_09_26/2011_09_26_drive_0002_sync 0000000069 l\n2011_09_26/2011_09_26_drive_0002_sync 0000000054 l\n2011_09_26/2011_09"
},
{
"path": "data/eigen/test_scenes.txt",
"chars": 615,
"preview": "2011_09_26_drive_0117\n2011_09_28_drive_0002\n2011_09_26_drive_0052\n2011_09_30_drive_0016\n2011_09_26_drive_0059\n2011_09_26"
},
{
"path": "infer_vo.py",
"chars": 13467,
"preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom core.networks.model_depth_pose import Mo"
},
{
"path": "requirements.txt",
"chars": 475,
"preview": "cffi==1.12.3\ncycler==0.10.0\nCython==0.29.13\ndecorator==4.4.0\neasydict==1.9\nh5py==2.10.0\nimageio==2.5.0\njoblib==0.14.0\nki"
},
{
"path": "test.py",
"chars": 10749,
"preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom core.dataset import KITTI_2012, KITTI_20"
},
{
"path": "train.py",
"chars": 10881,
"preview": "import os, sys\nimport yaml\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom core.dataset import KITTI_RA"
}
]
About this extraction
This page contains the full source code of the B1ueber2y/TrianFlow GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 54 files (655.9 KB), approximately 282.2k tokens, and a symbol index with 306 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.