Full Code of B1ueber2y/TrianFlow for AI

master af3f2bfc745e cached
54 files
655.9 KB
282.2k tokens
306 symbols
1 requests
Download .txt
Showing preview only (681K chars total). Download the full file or copy to clipboard to get everything.
Repository: B1ueber2y/TrianFlow
Branch: master
Commit: af3f2bfc745e
Files: 54
Total size: 655.9 KB

Directory structure:
gitextract_5bwpkhvi/

├── LICENSE
├── README.md
├── config/
│   ├── kitti.yaml
│   ├── kitti_3stage.yaml
│   ├── nyu.yaml
│   ├── nyu_192.yaml
│   ├── nyu_192_3stage.yaml
│   ├── nyu_3stage.yaml
│   ├── nyu_posenet_192.yaml
│   ├── odo.yaml
│   ├── odo_3stage.yaml
│   └── odo_posenet.yaml
├── core/
│   ├── config/
│   │   ├── __init__.py
│   │   └── config_utils.py
│   ├── dataset/
│   │   ├── __init__.py
│   │   ├── kitti_2012.py
│   │   ├── kitti_2015.py
│   │   ├── kitti_odo.py
│   │   ├── kitti_prepared.py
│   │   ├── kitti_raw.py
│   │   └── nyu_v2.py
│   ├── evaluation/
│   │   ├── __init__.py
│   │   ├── eval_odom.py
│   │   ├── evaluate_depth.py
│   │   ├── evaluate_flow.py
│   │   ├── evaluate_mask.py
│   │   ├── evaluation_utils.py
│   │   └── flowlib.py
│   ├── networks/
│   │   ├── __init__.py
│   │   ├── model_depth_pose.py
│   │   ├── model_flow.py
│   │   ├── model_flowposenet.py
│   │   ├── model_triangulate_pose.py
│   │   ├── pytorch_ssim/
│   │   │   ├── __init__.py
│   │   │   └── ssim.py
│   │   └── structures/
│   │       ├── __init__.py
│   │       ├── depth_model.py
│   │       ├── feature_pyramid.py
│   │       ├── flowposenet.py
│   │       ├── inverse_warp.py
│   │       ├── net_utils.py
│   │       ├── pwc_tf.py
│   │       └── ransac.py
│   └── visualize/
│       ├── __init__.py
│       ├── profiler.py
│       └── visualizer.py
├── data/
│   └── eigen/
│       ├── export_gt_depth.py
│       ├── static_frames.txt
│       ├── test_files.txt
│       └── test_scenes.txt
├── infer_vo.py
├── requirements.txt
├── test.py
└── train.py

================================================
FILE CONTENTS
================================================

================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2021 Shaohui Liu

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
## Towards Better Generalization: Joint Depth-Pose Learning without PoseNet
Created by <a href="https://github.com/thuzhaowang" target="_blank">Wang Zhao</a>, <a href="http://b1ueber2y.me" target="_blank">Shaohui Liu</a>, Yezhi Shu, <a href="https://cg.cs.tsinghua.edu.cn/people/~Yongjin/Yongjin.htm" target="_blank">Yong-Jin Liu</a>.
        
## Introduction
This implementation is based on our CVPR'2020 paper "Towards Better Generalization: Joint Depth-Pose Learning without PoseNet". You can find the arXiv version of the paper <a href="https://arxiv.org/abs/2004.01314">here</a>. In this repository we release code and pre-trained models for *TrianFlow* (our method) and a strong baseline *PoseNet-Flow* method.
![img](./data/pipeline.png)

## Installation
The code is based on Python3.6. You could use either virtualenv or conda to setup a specified environment. And then run:
```bash
pip install -r requirements.txt
```

## Run a demo
To run a depth prediction demo, you may need to first download the pretrained model from <a href="https://drive.google.com/drive/folders/1rPXlK9bJpjU0OQH5leDCvyb0FcL5jlUk?usp=sharing">here</a>.

```bash
python test.py --config_file ./config/default_1scale.yaml --gpu 0 --mode depth --task demo --image_path ./data/demo/kitti.png --pretrained_model ./models/pretrained/depth_pretrained.pth --result_dir ./data/demo
```
This will give you a predicted depth map for demo image.
![img](./data/demo/kitti.png)
![img](./data/demo/demo_depth.jpg)

## Run experiments

### Prepare training data:
1. For KITTI depth and flow tasks, download KITTI raw dataset using the <a href="http://www.cvlibs.net/download.php?file=raw_data_downloader.zip">script</a> provided on the official website. You also need to download <a href="http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=flow">KITTI 2015 dataset</a> to evaluate the predicted optical flow. Run the following commands to generate groundtruth files for eigen test images. 
```
cd ./data/eigen
python export_gt_depth.py --data_path /path/to/your/kitti/root 
```
2. For KITTI Odometry task, download <a href="http://www.cvlibs.net/datasets/kitti/eval_odometry.php">KITTI Odometry dataset</a>.
3. For NYUv2 experiments, download <a href="https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html">NYUv2</a> raw sequences and labeled data mat, also the traing test split mat from <a href="https://github.com/ankurhanda/nyuv2-meta-data">here</a>. Put the labeled data and splits file under the same directory. The data structure should be:
```
nyuv2
  | basements  
  | cafe
  | ...
nyuv2_test
  | nyu_depth_v2_labeled.mat 
  | splits.mat
```

### Training:
1. Modify the configuration file in the ./config directory to set up your path. The config file contains the important paths and default hyper-parameters used in the training process.
2. For KITTI depth, we have the three-stage training schedule.
```bash
1. python train.py --config_file ./config/kitti.yaml --gpu [gpu_id] --mode flow --prepared_save_dir [name_of_your_prepared_dataset] --model_dir [your/directory/to/save/training/models]
2. python train.py --config_file ./config/kitti.yaml --gpu [gpu_id] --mode depth --prepared_save_dir [name_of_your_prepared_dataset] --model_dir [your/directory/to/save/training/models] --flow_pretrained_model [path/to/your/stage1_flow_model]
3. python train.py --config_file ./config/kitti_3stage.yaml --gpu [gpu_id] --mode depth_pose --prepared_save_dir [name_of_your_prepared_dataset] --model_dir [your/directory/to/save/training/models] --depth_pretrained_model [path/to/your/stage2_depth_model]
```
If you are running experiments on the dataset for the first time, it would first process data and save in the [prepared_base_dir] path defined in your config file. 
For other datasets like KITTI Odometry and NYUv2 dataset, you could run with the same commands using the appropriate config file.

We also implement and release codes for the strong baseline *PoseNet-Flow* method, you could run it by two-stage training:
```bash
1. python train.py --config_file [path/to/your/config/file] --gpu [gpu_id] --mode flow --prepared_save_dir [name_of_your_prepared_dataset] --model_dir [your/directory/to/save/training/models]
2. python train.py --config_file [path/to/your/config/file] --gpu [gpu_id] --mode flowposenet --prepared_save_dir [name_of_your_prepared_dataset] --model_dir [your/directory/to/save/training/models] --flow_pretrained_model [path/to/your/stage1_flow_model]
```

### Evaluation:
We provide pretrained models <a href="https://drive.google.com/drive/folders/1rPXlK9bJpjU0OQH5leDCvyb0FcL5jlUk?usp=sharing">here</a> for different tasks. The performance could be slightly different with the paper due to randomness.

1. To evaluate the monocular depth estimation on kitti eigen test split, run:
```bash
python test.py --config_file ./config/kitti.yaml --gpu [gpu_id] --mode depth --task kitti_depth --pretrained_model [path/to/your/model] --result_dir [path/to/save/results]
```
2. To evaluate the monocular depth estimation on nyuv2 test split, run:
```bash
python test.py --config_file ./config/nyu.yaml --gpu [gpu_id] --mode depth --task nyuv2 --pretrained_model [path/to/your/model] --result_dir [path/to/save/results]
```
3. To evaluate the optical flow estimation on KITTI 2015, run:
```bash
python test.py --config_file ./config/kitti.yaml --gpu [gpu_id] --mode flow_3stage --task kitti_flow --pretrained_model [path/to/your/model] --result_dir [path/to/save/results]
```
4. To evaluate the visual odometry task on KITTI Odometry dataset, first get predictions on a single sequence and then evaluate:
```bash
python infer_vo.py --config_file ./config/odo.yaml --gpu [gpu_id] --traj_save_dir_txt [where/to/save/the/prediction/file] --sequences_root_dir [the/root/dir/of/your/image/sequences] --sequence [the sequence id] ----pretrained_model [path/to/your/model]
python ./core/evaluation/eval_odom.py --gt_txt [path/to/your/groundtruth/poses/txt] --result_txt [path/to/your/prediction/txt] --seq [sequence id to evaluate]
```
You could evaluate on the sampled KITTI odometry dataset by simply sampling the raw image sequences and gt-pose txt. Then run *infer_vo.py* on the sampled image sequence and *eval_odom.py* with predicted txt and sampled gt txt to get results.

### Citation
If you find our work useful in your research, please consider citing:
```
@inproceedings{zhao2020towards,
  title={Towards Better Generalization: Joint Depth-Pose Learning without PoseNet},
  author={Zhao, Wang and Liu, Shaohui and Shu, Yezhi and Liu, Yong-Jin},
  booktitle={Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020}
}
```

### Related Projects
<a href="https://github.com/nianticlabs/monodepth2">Digging into Self-Supervised Monocular Depth Prediction.</a>

<a href="https://github.com/anuragranj/cc">Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation.</a>

<a href="https://github.com/Huangying-Zhan/DF-VO">Visual Odometry Revisited: What Should Be Learnt?</a>





================================================
FILE: config/kitti.yaml
================================================
cfg_name: 'default'

# dataset
raw_base_dir: '/home4/zhaow/data/kitti'
prepared_base_dir: '/home5/zhaow/data/kitti_release'
gt_2012_dir: '/home4/zhaow/data/kitti_stereo/kitti_2012/training'
gt_2015_dir: '/home4/zhaow/data/kitti_stereo/kitti_2015/training'
static_frames_txt: '/home5/zhaow/release/data/eigen/static_frames.txt'
test_scenes_txt: '/home5/zhaow/release/data/eigen/test_scenes.txt'
dataset: 'kitti_depth'
num_scales: 3

# training
num_iterations: 200000

# loss hyperparameters
w_ssim: 0.85 # w_pixel = 1 - w_ssim
w_flow_smooth: 10.0
w_flow_consis: 0.01
w_geo: 1.0
w_pt_depth: 1.0
w_pj_depth: 0.1
w_flow_error: 0.0
w_depth_smooth: 0.001


h_flow_consist_alpha: 3.0
h_flow_consist_beta: 0.05

ransac_iters: 100
ransac_points: 6000

# Depth Setting
depth_match_num: 6000
depth_sample_ratio: 0.20
depth_scale: 1

# basic info
img_hw: [256, 832]
use_svd_gpu: False



================================================
FILE: config/kitti_3stage.yaml
================================================
cfg_name: 'default'

# dataset
raw_base_dir: '/home/zhaow/data/kitti'
prepared_base_dir: '/home/zhaow/data/kitti_seq'
gt_2012_dir: '/home/zhaow/data/kitti_stereo/kitti_2012/training'
gt_2015_dir: '/home/zhaow/data/kitti_stereo/kitti_2015/training'
static_frames_txt: '/home/zhaow/data/kitti_seq/static_frames.txt'
test_scenes_txt: '/home5/zhaow/release/data/eigen/test_scenes.txt'
dataset: 'kitti_depth'
num_scales: 3

# training
num_iterations: 200000 # set -1 to use num_epochs
num_epochs: 0

# loss hyperparameters

w_ssim: 0.85 # w_pixel = 1 - w_ssim
w_flow_smooth: 10.0
w_flow_consis: 0.01
w_geo: 0.002 #0.002
w_pt_depth: 0.0
w_pj_depth: 0.000
w_flow_error: 0.01
w_depth_smooth: 0.000


h_flow_consist_alpha: 3.0
h_flow_consist_beta: 0.05

# dfe setting
dfe_depth: 3
dfe_depth_scales: 4
dfe_points: 500
ransac_iters: 100
ransac_points: 6000

# Depth Setting
depth_match_num: 6000
depth_sample_ratio: 0.20
depth_scale: 1

# basic info
img_hw: [256, 832]
use_svd_gpu: False



================================================
FILE: config/nyu.yaml
================================================
cfg_name: 'default'

# dataset
raw_base_dir: '/home5/zhaow/data/nyuv2'
prepared_base_dir: '/home5/zhaow/data/nyu_seq_release'
nyu_test_dir: '/home5/zhaow/data/nyuv2_test'
dataset: 'nyuv2'
num_scales: 3

# training
num_iterations: 400000

# loss hyperparameters
w_ssim: 0.85 # w_pixel = 1 - w_ssim
w_flow_smooth: 10.0
w_flow_consis: 0.01
w_geo: 0.0
w_pt_depth: 1.0
w_pj_depth: 0.1
w_flow_error: 0.00
w_depth_smooth: 0.0001


h_flow_consist_alpha: 3.0
h_flow_consist_beta: 0.05

ransac_iters: 100
ransac_points: 6000

# Depth Setting
depth_match_num: 6000
depth_sample_ratio: 0.20
depth_scale: 1

# basic info
img_hw: [448, 576]
#img_hw: [192, 256]
block_tri_grad: False



================================================
FILE: config/nyu_192.yaml
================================================
cfg_name: 'default'

# dataset
raw_base_dir: '/home5/zhaow/data/nyuv2'
prepared_base_dir: '/home5/zhaow/data/nyu_seq_release'
nyu_test_dir: '/home5/zhaow/data/nyuv2_test'
dataset: 'nyuv2'
num_scales: 3

# training
num_iterations: 400000

# loss hyperparameters
w_ssim: 0.85 # w_pixel = 1 - w_ssim
w_flow_smooth: 10.0
w_flow_consis: 0.01
w_geo: 0.0
w_pt_depth: 1.0
w_pj_depth: 0.1
w_flow_error: 0.00
w_depth_smooth: 0.0001


h_flow_consist_alpha: 3.0
h_flow_consist_beta: 0.05

ransac_iters: 100
ransac_points: 6000

# Depth Setting
depth_match_num: 6000
depth_sample_ratio: 0.20
depth_scale: 1

# basic info
#img_hw: [448, 576]
img_hw: [192, 256]
block_tri_grad: False



================================================
FILE: config/nyu_192_3stage.yaml
================================================
cfg_name: 'default'

# dataset
raw_base_dir: '/home5/zhaow/data/nyuv2'
prepared_base_dir: '/home5/zhaow/data/nyu_seq_release'
nyu_test_dir: '/home5/zhaow/data/nyuv2_test'
dataset: 'nyuv2'
num_scales: 3

# training
num_iterations: 400000 # set -1 to use num_epochs

# loss hyperparameters
w_ssim: 0.85 # w_pixel = 1 - w_ssim
w_flow_smooth: 10.0
w_flow_consis: 0.01
w_geo: 0.1
w_pt_depth: 0.1
w_pj_depth: 0.01
w_flow_error: 0.00
w_depth_smooth: 0.0001


h_flow_consist_alpha: 3.0
h_flow_consist_beta: 0.05

ransac_iters: 100
ransac_points: 6000

# Depth Setting
depth_match_num: 6000
depth_sample_ratio: 0.20
depth_scale: 1

# basic info
#img_hw: [448, 576]
img_hw: [192, 256]
block_tri_grad: False



================================================
FILE: config/nyu_3stage.yaml
================================================
cfg_name: 'default'

# dataset
raw_base_dir: '/home5/zhaow/data/nyuv2'
prepared_base_dir: '/home5/zhaow/data/nyu_seq_release'
nyu_test_dir: '/home5/zhaow/data/nyuv2_test'
dataset: 'nyuv2'
num_scales: 3

# training
num_iterations: 400000 # set -1 to use num_epochs

# loss hyperparameters
w_ssim: 0.85 # w_pixel = 1 - w_ssim
w_flow_smooth: 10.0
w_flow_consis: 0.01
w_geo: 0.1
w_pt_depth: 0.1
w_pj_depth: 0.01
w_flow_error: 0.00
w_depth_smooth: 0.0001


h_flow_consist_alpha: 3.0
h_flow_consist_beta: 0.05

ransac_iters: 100
ransac_points: 6000

# Depth Setting
depth_match_num: 6000
depth_sample_ratio: 0.20
depth_scale: 1

# basic info
img_hw: [448, 576]
#img_hw: [192, 256]
block_tri_grad: False



================================================
FILE: config/nyu_posenet_192.yaml
================================================
cfg_name: 'default'

# dataset
raw_base_dir: '/home5/zhaow/data/nyuv2_sub2'
prepared_base_dir: '/home5/zhaow/data/nyu_seq_release'
nyu_test_dir: '/home5/zhaow/data/nyuv2_test'
dataset: 'nyuv2'
num_scales: 3

# training
num_iterations: 500000 # set -1 to use num_epochs
num_epochs: 0

# loss hyperparameters
w_ssim: 0.85 # w_pixel = 1 - w_ssim
w_flow_smooth: 10.0
w_flow_consis: 0.01
w_geo: 0.0
w_pt_depth: 0.0
w_pj_depth: 0.5
w_flow_error: 10.0
w_depth_smooth: 0.00001


h_flow_consist_alpha: 3.0
h_flow_consist_beta: 0.05

ransac_iters: 100
ransac_points: 6000

# Depth Setting
depth_match_num: 6000
depth_scale: 1

# basic info
img_hw: [192,256]
block_tri_grad: False



================================================
FILE: config/odo.yaml
================================================
cfg_name: 'default'

# dataset
raw_base_dir: '/home4/zhaow/data/kitti_odometry/sequences'
prepared_base_dir: '/home5/zhaow/data/kitti_odo_release/'
gt_2012_dir: '/home4/zhaow/data/kitti_stereo/kitti_2012/training'
gt_2015_dir: '/home4/zhaow/data/kitti_stereo/kitti_2015/training'
dataset: 'kitti_odo'
num_scales: 3

# training
num_iterations: 200000


w_ssim: 0.85 # w_pixel = 1 - w_ssim
w_flow_smooth: 10.0
w_flow_consis: 0.01
w_geo: 0.1
w_pt_depth: 1.0
w_pj_depth: 0.1
w_flow_error: 0.0
w_depth_smooth: 0.0001


h_flow_consist_alpha: 3.0
h_flow_consist_beta: 0.05

ransac_iters: 100
ransac_points: 6000

# Depth Setting
depth_match_num: 6000
depth_sample_ratio: 0.20
depth_scale: 1

# basic info
img_hw: [256, 832]
block_tri_grad: False



================================================
FILE: config/odo_3stage.yaml
================================================
cfg_name: 'default'

# dataset
raw_base_dir: '/home4/zhaow/data/kitti_odometry/sequences'
prepared_base_dir: '/home5/zhaow/data/kitti_odo_release/'
gt_2012_dir: '/home4/zhaow/data/kitti_stereo/kitti_2012/training'
gt_2015_dir: '/home4/zhaow/data/kitti_stereo/kitti_2015/training'
dataset: 'kitti_odo'
num_scales: 3

# training
num_iterations: 200000


w_ssim: 0.85 # w_pixel = 1 - w_ssim
w_flow_smooth: 10.0
w_flow_consis: 0.01
w_geo: 0.002 #0.002
w_pt_depth: 0.0
w_pj_depth: 0.000
w_flow_error: 0.01
w_depth_smooth: 0.000


h_flow_consist_alpha: 3.0
h_flow_consist_beta: 0.05

ransac_iters: 100
ransac_points: 6000

# Depth Setting
depth_match_num: 6000
depth_sample_ratio: 0.20
depth_scale: 1

# basic info
img_hw: [256, 832]
block_tri_grad: False



================================================
FILE: config/odo_posenet.yaml
================================================
cfg_name: 'default'

# dataset
raw_base_dir: '/home4/zhaow/data/kitti_odometry/sequences'
prepared_base_dir: '/home5/zhaow/data/kitti_odo/'
gt_2012_dir: '/home4/zhaow/data/kitti_stereo/kitti_2012/training'
gt_2015_dir: '/home4/zhaow/data/kitti_stereo/kitti_2015/training'
dataset: 'kitti_odo'
num_scales: 3

# training
num_iterations: 200000 # set -1 to use num_epochs
num_epochs: 0


w_ssim: 0.85 # w_pixel = 1 - w_ssim
w_flow_smooth: 10.0
w_flow_consis: 0.01
w_geo: 0.0
w_pt_depth: 0.0
w_pj_depth: 0.1
w_flow_error: 0.5
w_depth_smooth: 0.0001


h_flow_consist_alpha: 3.0
h_flow_consist_beta: 0.05

ransac_iters: 100
ransac_points: 6000

# Depth Setting
depth_match_num: 6000
depth_scale: 1

# basic info
img_hw: [256, 832]
block_tri_grad: False



================================================
FILE: core/config/__init__.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from config_utils import generate_loss_weights_dict



================================================
FILE: core/config/config_utils.py
================================================
import os, sys

def generate_loss_weights_dict(cfg):
    weight_dict = {}
    weight_dict['loss_pixel'] = 1 - cfg.w_ssim
    weight_dict['loss_ssim'] = cfg.w_ssim
    weight_dict['loss_flow_smooth'] = cfg.w_flow_smooth
    weight_dict['loss_flow_consis'] = cfg.w_flow_consis
    weight_dict['geo_loss'] = cfg.w_geo
    weight_dict['pt_depth_loss'] = cfg.w_pt_depth
    weight_dict['pj_depth_loss'] = cfg.w_pj_depth
    weight_dict['depth_smooth_loss'] = cfg.w_depth_smooth
    weight_dict['flow_error'] = cfg.w_flow_error
    return weight_dict



================================================
FILE: core/dataset/__init__.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from kitti_raw import KITTI_RAW
from kitti_prepared import KITTI_Prepared
from kitti_2012 import KITTI_2012
from kitti_2015 import KITTI_2015
from nyu_v2 import NYU_Prepare, NYU_v2
from kitti_odo import KITTI_Odo

================================================
FILE: core/dataset/kitti_2012.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from kitti_prepared import KITTI_Prepared
sys.path.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', 'evaluation'))
from evaluate_flow import get_scaled_intrinsic_matrix, eval_flow_avg
import numpy as np
import cv2
import copy

import torch
import pdb

class KITTI_2012(KITTI_Prepared):
    def __init__(self, data_dir, img_hw=(256, 832), init=True):
        self.data_dir = data_dir
        self.img_hw = img_hw
        self.num_total = 194
        if init:
            self.data_list = self.get_data_list()

    def get_data_list(self):
        data_list = []
        for i in range(self.num_total):
            data = {}
            data['img1_dir'] = os.path.join(self.data_dir, 'image_2', str(i).zfill(6) + '_10.png')
            data['img2_dir'] = os.path.join(self.data_dir, 'image_2', str(i).zfill(6) + '_11.png')
            data['calib_file_dir'] = os.path.join(self.data_dir, 'calib_cam_to_cam', str(i).zfill(6) + '.txt')
            data_list.append(data)
        return data_list

    def __len__(self):
        return len(self.data_list)

    def read_cam_intrinsic(self, calib_file):
        input_intrinsic = get_scaled_intrinsic_matrix(calib_file, zoom_x=1.0, zoom_y=1.0)
        return input_intrinsic

    def __getitem__(self, idx):
        '''
        Returns:
        - img		torch.Tensor (N * H, W, 3)
        - K	torch.Tensor (num_scales, 3, 3)
        - K_inv	torch.Tensor (num_scales, 3, 3)
        '''
        data = self.data_list[idx]
        # load img
        img1 = cv2.imread(data['img1_dir'])
        img2 = cv2.imread(data['img2_dir'])
        img_hw_orig = (img1.shape[0], img1.shape[1])
        img = np.concatenate([img1, img2], 0)
        img = self.preprocess_img(img, self.img_hw, is_test=True)
        img  = img.transpose(2,0,1)

        # load intrinsic
        cam_intrinsic = self.read_cam_intrinsic(data['calib_file_dir'])
        cam_intrinsic = self.rescale_intrinsics(cam_intrinsic, img_hw_orig, self.img_hw)
        K, K_inv = self.get_intrinsics_per_scale(cam_intrinsic, scale=0) # (3, 3), (3, 3)
        return torch.from_numpy(img).float(), torch.from_numpy(K).float(), torch.from_numpy(K_inv).float()

if __name__ == '__main__':
    pass



================================================
FILE: core/dataset/kitti_2015.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from kitti_2012 import KITTI_2012

class KITTI_2015(KITTI_2012):
    def __init__(self, data_dir, img_hw=(256, 832)):
        super(KITTI_2015, self).__init__(data_dir, img_hw, init=False)
        self.num_total = 200

        self.data_list = self.get_data_list()

if __name__ == '__main__':
    pass



================================================
FILE: core/dataset/kitti_odo.py
================================================
import os, sys
import numpy as np
import imageio
from tqdm import tqdm
import torch.multiprocessing as mp

def process_folder(q, data_dir, output_dir, stride=1):
    while True:
        if q.empty():
            break
        folder = q.get()
        image_path = os.path.join(data_dir, folder, 'image_2/')
        dump_image_path = os.path.join(output_dir, folder)
        if not os.path.isdir(dump_image_path):
            os.makedirs(dump_image_path)
        f = open(os.path.join(dump_image_path, 'train.txt'), 'w')
        
        # Note. the os.listdir method returns arbitary order of list. We need correct order.
        numbers = len(os.listdir(image_path))
        for n in range(numbers - stride):
            s_idx = n
            e_idx = s_idx + stride
            curr_image = imageio.imread(os.path.join(image_path, '%.6d'%s_idx)+'.png')
            next_image = imageio.imread(os.path.join(image_path, '%.6d'%e_idx)+'.png')
            seq_images = np.concatenate([curr_image, next_image], axis=0)
            imageio.imsave(os.path.join(dump_image_path, '%.6d'%s_idx)+'.png', seq_images.astype('uint8'))

            # Write training files
            f.write('%s %s\n' % (os.path.join(folder, '%.6d'%s_idx)+'.png', os.path.join(folder, 'calib.txt')))
        print(folder)


class KITTI_Odo(object):
    def __init__(self, data_dir):
        self.data_dir = data_dir
        self.train_seqs = ['00','01','02','03','04','05','06','07','08']

    def __len__(self):
        raise NotImplementedError

    def prepare_data_mp(self, output_dir, stride=1):
        num_processes = 16
        processes = []
        q = mp.Queue()
        if not os.path.isfile(os.path.join(output_dir, 'train.txt')):
            os.makedirs(output_dir)
            #f = open(os.path.join(output_dir, 'train.txt'), 'w')
            print('Preparing sequence data....')
            if not os.path.isdir(self.data_dir):
                raise
            dirlist = os.listdir(self.data_dir)
            total_dirlist = []
            # Get the different folders of images
            for d in dirlist:
                if d in self.train_seqs:
                    q.put(d)
            # Process every folder
            for rank in range(num_processes):
                p = mp.Process(target=process_folder, args=(q, self.data_dir, output_dir, stride))
                p.start()
                processes.append(p)
            for p in processes:
                p.join()
        
        f = open(os.path.join(output_dir, 'train.txt'), 'w')
        for d in self.train_seqs:
            train_file = open(os.path.join(output_dir, d, 'train.txt'), 'r')
            for l in train_file.readlines():
                f.write(l)

            command = 'cp ' + os.path.join(self.data_dir, d, 'calib.txt') + ' ' + os.path.join(output_dir, d, 'calib.txt')
            os.system(command)
        
        print('Data Preparation Finished.')

    def __getitem__(self, idx):
        raise NotImplementedError


if __name__ == '__main__':
    data_dir = '/home4/zhaow/data/kitti'
    dirlist = os.listdir('/home4/zhaow/data/kitti')
    output_dir = '/home4/zhaow/data/kitti_seq/data_generated_s2'
    total_dirlist = []
    # Get the different folders of images
    for d in dirlist:
        seclist = os.listdir(os.path.join(data_dir, d))
        for s in seclist:
            if os.path.isdir(os.path.join(data_dir, d, s)):
                total_dirlist.append(os.path.join(d, s))
    
    F = open(os.path.join(output_dir, 'train.txt'), 'w')
    for p in total_dirlist:
        traintxt = os.path.join(os.path.join(output_dir, p), 'train.txt')
        f = open(traintxt, 'r')
        for line in f.readlines():
            F.write(line)
        print(traintxt)

================================================
FILE: core/dataset/kitti_prepared.py
================================================
import os, sys
import numpy as np
import cv2
import copy

import torch
import torch.utils.data
import pdb

class KITTI_Prepared(torch.utils.data.Dataset):
    def __init__(self, data_dir, num_scales=3, img_hw=(256, 832), num_iterations=None):
        super(KITTI_Prepared, self).__init__()
        self.data_dir = data_dir
        self.num_scales = num_scales
        self.img_hw = img_hw
        self.num_iterations = num_iterations

        info_file = os.path.join(self.data_dir, 'train.txt')
        #info_file = os.path.join(self.data_dir, 'train_flow.txt')
        self.data_list = self.get_data_list(info_file)

    def get_data_list(self, info_file):
        with open(info_file, 'r') as f:
            lines = f.readlines()
        data_list = []
        for line in lines:
            k = line.strip('\n').split()
            data = {}
            data['image_file'] = os.path.join(self.data_dir, k[0])
            data['cam_intrinsic_file'] = os.path.join(self.data_dir, k[1])
            data_list.append(data)
        print('A total of {} image pairs found'.format(len(data_list)))
        return data_list

    def count(self):
        return len(self.data_list)

    def rand_num(self, idx):
        num_total = self.count()
        np.random.seed(idx)
        num = np.random.randint(num_total)
        return num

    def __len__(self):
        if self.num_iterations is None:
            return self.count()
        else:
            return self.num_iterations

    def resize_img(self, img, img_hw):
        '''
        Input size (N*H, W, 3)
        Output size (N*H', W', 3), where (H', W') == self.img_hw
        '''
        img_h, img_w = img.shape[0], img.shape[1]
        img_hw_orig = (int(img_h / 2), img_w) 
        img1, img2 = img[:img_hw_orig[0], :, :], img[img_hw_orig[0]:, :, :]
        img1_new = cv2.resize(img1, (img_hw[1], img_hw[0]))
        img2_new = cv2.resize(img2, (img_hw[1], img_hw[0]))
        img_new = np.concatenate([img1_new, img2_new], 0)
        return img_new

    def random_flip_img(self, img):
        is_flip = (np.random.rand() > 0.5)
        if is_flip:
            img = cv2.flip(img, 1)
        return img

    def preprocess_img(self, img, img_hw=None, is_test=False):
        if img_hw is None:
            img_hw = self.img_hw
        img = self.resize_img(img, img_hw)
        if not is_test:
            img = self.random_flip_img(img)
        img = img / 255.0
        return img

    def read_cam_intrinsic(self, fname):
        with open(fname, 'r') as f:
            lines = f.readlines()
        data = lines[-1].strip('\n').split(' ')[1:]
        data = [float(k) for k in data]
        data = np.array(data).reshape(3,4)
        cam_intrinsics = data[:3,:3]
        return cam_intrinsics

    def rescale_intrinsics(self, K, img_hw_orig, img_hw_new):
        K[0,:] = K[0,:] * img_hw_new[1] / img_hw_orig[1]
        K[1,:] = K[1,:] * img_hw_new[0] / img_hw_orig[0]
        return K

    def get_intrinsics_per_scale(self, K, scale):
        K_new = copy.deepcopy(K)
        K_new[0,:] = K_new[0,:] / (2**scale)
        K_new[1,:] = K_new[1,:] / (2**scale)
        K_new_inv = np.linalg.inv(K_new)
        return K_new, K_new_inv

    def get_multiscale_intrinsics(self, K, num_scales):
        K_ms, K_inv_ms = [], []
        for s in range(num_scales):
            K_new, K_new_inv = self.get_intrinsics_per_scale(K, s)
            K_ms.append(K_new[None,:,:])
            K_inv_ms.append(K_new_inv[None,:,:])
        K_ms = np.concatenate(K_ms, 0)
        K_inv_ms = np.concatenate(K_inv_ms, 0)
        return K_ms, K_inv_ms

    def __getitem__(self, idx):
        '''
        Returns:
        - img		torch.Tensor (N * H, W, 3)
        - K	torch.Tensor (num_scales, 3, 3)
        - K_inv	torch.Tensor (num_scales, 3, 3)
        '''
        if self.num_iterations is not None:
            idx = self.rand_num(idx)
        data = self.data_list[idx]
        # load img
        img = cv2.imread(data['image_file'])
        img_hw_orig = (int(img.shape[0] / 2), img.shape[1])
        img = self.preprocess_img(img, self.img_hw) # (img_h * 2, img_w, 3)
        img = img.transpose(2,0,1)

        # load intrinsic
        cam_intrinsic = self.read_cam_intrinsic(data['cam_intrinsic_file'])
        cam_intrinsic = self.rescale_intrinsics(cam_intrinsic, img_hw_orig, self.img_hw)
        K_ms, K_inv_ms = self.get_multiscale_intrinsics(cam_intrinsic, self.num_scales) # (num_scales, 3, 3), (num_scales, 3, 3)
        return torch.from_numpy(img).float(), torch.from_numpy(K_ms).float(), torch.from_numpy(K_inv_ms).float()

if __name__ == '__main__':
    pass



================================================
FILE: core/dataset/kitti_raw.py
================================================
import os, sys
import numpy as np
import imageio
from tqdm import tqdm
import torch.multiprocessing as mp
import pdb

def process_folder(q, static_frames, test_scenes, data_dir, output_dir, stride=1):
    while True:
        if q.empty():
            break
        folder = q.get()
        if folder in static_frames.keys():
            static_ids = static_frames[folder]
        else:
            static_ids = []
        scene = folder.split('/')[1]
        if scene[:-5] in test_scenes:
            continue
        image_path = os.path.join(data_dir, folder, 'image_02/data')
        dump_image_path = os.path.join(output_dir, folder)
        if not os.path.isdir(dump_image_path):
            os.makedirs(dump_image_path)
        f = open(os.path.join(dump_image_path, 'train.txt'), 'w')
        
        # Note. the os.listdir method returns arbitary order of list. We need correct order.
        numbers = len(os.listdir(image_path))
        for n in range(numbers - stride):
            s_idx = n
            e_idx = s_idx + stride
            if '%.10d'%s_idx in static_ids or '%.10d'%e_idx in static_ids:
                #print('%.10d'%s_idx)
                continue
            curr_image = imageio.imread(os.path.join(image_path, '%.10d'%s_idx)+'.png')
            next_image = imageio.imread(os.path.join(image_path, '%.10d'%e_idx)+'.png')
            seq_images = np.concatenate([curr_image, next_image], axis=0)
            imageio.imsave(os.path.join(dump_image_path, '%.10d'%s_idx)+'.png', seq_images.astype('uint8'))

            # Write training files
            date = folder.split('/')[0]
            f.write('%s %s\n' % (os.path.join(folder, '%.10d'%s_idx)+'.png', os.path.join(date, 'calib_cam_to_cam.txt')))
        print(folder)


class KITTI_RAW(object):
    def __init__(self, data_dir, static_frames_txt, test_scenes_txt):
        self.data_dir = data_dir
        self.static_frames_txt = static_frames_txt
        self.test_scenes_txt = test_scenes_txt

    def __len__(self):
        raise NotImplementedError

    def collect_static_frame(self):
        f = open(self.static_frames_txt)
        static_frames = {}
        for line in f.readlines():
            line = line.strip()
            date, drive, frame_id = line.split(' ')
            curr_fid = '%.10d' % (np.int(frame_id))
            if os.path.join(date, drive) not in static_frames.keys():
                static_frames[os.path.join(date, drive)] = []
            static_frames[os.path.join(date, drive)].append(curr_fid)
        return static_frames
    
    def collect_test_scenes(self):
        f = open(self.test_scenes_txt)
        test_scenes = []
        for line in f.readlines():
            line = line.strip()
            test_scenes.append(line)
        return test_scenes

    def prepare_data_mp(self, output_dir, stride=1):
        num_processes = 16
        processes = []
        q = mp.Queue()
        static_frames = self.collect_static_frame()
        test_scenes = self.collect_test_scenes()
        if not os.path.isfile(os.path.join(output_dir, 'train.txt')):
            os.makedirs(output_dir)
            #f = open(os.path.join(output_dir, 'train.txt'), 'w')
            print('Preparing sequence data....')
            if not os.path.isdir(self.data_dir):
                raise
            dirlist = os.listdir(self.data_dir)
            total_dirlist = []
            # Get the different folders of images
            for d in dirlist:
                seclist = os.listdir(os.path.join(self.data_dir, d))
                for s in seclist:
                    if os.path.isdir(os.path.join(self.data_dir, d, s)):
                        total_dirlist.append(os.path.join(d, s))
                        q.put(os.path.join(d, s))
            # Process every folder
            for rank in range(num_processes):
                p = mp.Process(target=process_folder, args=(q, static_frames, test_scenes, self.data_dir, output_dir, stride))
                p.start()
                processes.append(p)
            for p in processes:
                p.join()
        
        # Collect the training frames.
        f = open(os.path.join(output_dir, 'train.txt'), 'w')
        for date in os.listdir(output_dir):
            if os.path.isdir(os.path.join(output_dir, date)):
                drives = os.listdir(os.path.join(output_dir, date))
                for d in drives:
                    train_file = open(os.path.join(output_dir, date, d, 'train.txt'), 'r')
                    for l in train_file.readlines():
                        f.write(l)
        
        # Get calib files
        for date in os.listdir(self.data_dir):
            command = 'cp ' + os.path.join(self.data_dir, date, 'calib_cam_to_cam.txt') + ' ' + os.path.join(output_dir, date, 'calib_cam_to_cam.txt')
            os.system(command)
        
        print('Data Preparation Finished.')



    def prepare_data(self, output_dir):
        static_frames = self.collect_static_frame()
        test_scenes = self.collect_test_scenes()
        if not os.path.isfile(os.path.join(output_dir, 'train.txt')):
            os.makedirs(output_dir)
            f = open(os.path.join(output_dir, 'train.txt'), 'w')
            print('Preparing sequence data....')
            if not os.path.isdir(self.data_dir):
                raise
            dirlist = os.listdir(self.data_dir)
            total_dirlist = []
            # Get the different folders of images
            for d in dirlist:
                seclist = os.listdir(os.path.join(self.data_dir, d))
                for s in seclist:
                    if os.path.isdir(os.path.join(self.data_dir, d, s)):
                        total_dirlist.append(os.path.join(d, s))
            # Process every folder
            for folder in tqdm(total_dirlist):
                if folder in static_frames.keys():
                    static_ids = static_frames[folder]
                else:
                    static_ids = []
                scene = folder.split('/')[1]
                if scene in test_scenes:
                    continue
                image_path = os.path.join(self.data_dir, folder, 'image_02/data')
                dump_image_path = os.path.join(output_dir, folder)
                if not os.path.isdir(dump_image_path):
                    os.makedirs(dump_image_path)
                # Note. the os.listdir method returns arbitary order of list. We need correct order.
                numbers = len(os.listdir(image_path))
                for n in range(numbers - 1):
                    s_idx = n
                    e_idx = s_idx + 1
                    if '%.10d'%s_idx in static_ids or '%.10d'%e_idx in static_ids:
                        print('%.10d'%s_idx)
                        continue
                    curr_image = imageio.imread(os.path.join(image_path, '%.10d'%s_idx)+'.png')
                    next_image = imageio.imread(os.path.join(image_path, '%.10d'%e_idx)+'.png')
                    seq_images = np.concatenate([curr_image, next_image], axis=0)
                    imageio.imsave(os.path.join(dump_image_path, '%.10d'%s_idx)+'.png', seq_images.astype('uint8'))
    
                    # Write training files
                    date = folder.split('/')[0]
                    f.write('%s %s\n' % (os.path.join(folder, '%.10d'%s_idx)+'.png', os.path.join(date, 'calib_cam_to_cam.txt')))
                print(folder)
        
        # Get calib files
        for date in os.listdir(self.data_dir):
            command = 'cp ' + os.path.join(self.data_dir, date, 'calib_cam_to_cam.txt') + ' ' + os.path.join(output_dir, date, 'calib_cam_to_cam.txt')
            os.system(command)
        
        return os.path.join(output_dir, 'train.txt')

    def __getitem__(self, idx):
        raise NotImplementedError


if __name__ == '__main__':
    data_dir = '/home4/zhaow/data/kitti'
    dirlist = os.listdir('/home4/zhaow/data/kitti')
    output_dir = '/home4/zhaow/data/kitti_seq/data_generated_s2'
    total_dirlist = []
    # Get the different folders of images
    for d in dirlist:
        seclist = os.listdir(os.path.join(data_dir, d))
        for s in seclist:
            if os.path.isdir(os.path.join(data_dir, d, s)):
                total_dirlist.append(os.path.join(d, s))
    
    F = open(os.path.join(output_dir, 'train.txt'), 'w')
    for p in total_dirlist:
        traintxt = os.path.join(os.path.join(output_dir, p), 'train.txt')
        f = open(traintxt, 'r')
        for line in f.readlines():
            F.write(line)
        print(traintxt)







================================================
FILE: core/dataset/nyu_v2.py
================================================
import os, sys
import numpy as np
import imageio
import cv2
import copy
import h5py
import scipy.io as sio
import torch
import torch.utils.data
import pdb
from tqdm import tqdm
import torch.multiprocessing as mp

def collect_image_list(path):
    # Get ppm images list of a folder.
    files = os.listdir(path)
    sorted_file = sorted([f for f in files])
    image_list = []
    for l in sorted_file:
        if l.split('.')[-1] == 'ppm':
            image_list.append(l)
    return image_list

def process_folder(q, data_dir, output_dir, stride, train_scenes):
    # Directly process the original nyu v2 depth dataset.
    while True:
        if q.empty():
            break
        folder = q.get()
        scene_name = folder.split('/')[-1]
        s1,s2 = scene_name.split('_')[:-1], scene_name.split('_')[-1]
        scene_name_full = ''
        for j in s1:
            scene_name_full = scene_name_full + j + '_'
        scene_name_full = scene_name_full + s2[:4]
        
        if scene_name_full not in train_scenes:
            continue
        image_path = os.path.join(data_dir, folder)
        dump_image_path = os.path.join(output_dir, folder)
        if not os.path.isdir(dump_image_path):
            os.makedirs(dump_image_path)
        f = open(os.path.join(dump_image_path, 'train.txt'), 'w')
        
        # Note. the os.listdir method returns arbitary order of list. We need correct order.
        image_list = collect_image_list(image_path)
        #image_list = open(os.path.join(image_path, 'index.txt')).readlines()
        numbers = len(image_list) - 1  # The last ppm file seems truncated.
        for n in range(numbers - stride):
            s_idx = n
            e_idx = s_idx + stride
            s_name = image_list[s_idx].strip()
            e_name = image_list[e_idx].strip()
            
            curr_image = imageio.imread(os.path.join(image_path, s_name))
            next_image = imageio.imread(os.path.join(image_path, e_name))
            #curr_image = cv2.imread(os.path.join(image_path, s_name))
            #next_image = cv2.imread(os.path.join(image_path, e_name))
            seq_images = np.concatenate([curr_image, next_image], axis=0)
            imageio.imsave(os.path.join(dump_image_path,  os.path.splitext(s_name)[0]+'.png'), seq_images.astype('uint8'))
            #cv2.imwrite(os.path.join(dump_image_path, os.path.splitext(s_name)[0]+'.png'), seq_images.astype('uint8'))

            # Write training files
            #date = folder.split('_')[2]
            f.write('%s %s\n' % (os.path.join(folder, os.path.splitext(s_name)[0]+'.png'), 'calib_cam_to_cam.txt'))
        print(folder)

class NYU_Prepare(object):
    def __init__(self, data_dir, test_dir):
        self.data_dir = data_dir
        self.test_data = os.path.join(test_dir, 'nyu_depth_v2_labeled.mat')
        self.splits = os.path.join(test_dir, 'splits.mat')
        self.get_all_scenes()
        self.get_test_scenes()
        self.get_train_scenes()
        

    def __len__(self):
        raise NotImplementedError

    def get_all_scenes(self):
        self.all_scenes = []
        paths = os.listdir(self.data_dir)
        for p in paths:
            if os.path.isdir(os.path.join(self.data_dir, p)):
                pp = os.listdir(os.path.join(self.data_dir, p))
                for path in pp:
                    self.all_scenes.append(path)

    def get_test_scenes(self):
        self.test_scenes = []
        test_data = h5py.File(self.test_data, 'r')
        test_split = sio.loadmat(self.splits)['testNdxs']
        test_split = np.array(test_split).squeeze(1)
        
        test_scenes = test_data['scenes'][0][test_split-1]
        for i in range(len(test_scenes)):
            obj = test_data[test_scenes[i]]
            name = "".join(chr(j) for j in obj[:])
            if name not in self.test_scenes:
                self.test_scenes.append(name)
        #pdb.set_trace()
    
    def get_train_scenes(self):
        self.train_scenes = []
        train_data = h5py.File(self.test_data, 'r')
        train_split = sio.loadmat(self.splits)['trainNdxs']
        train_split = np.array(train_split).squeeze(1)
        
        train_scenes = train_data['scenes'][0][train_split-1]
        for i in range(len(train_scenes)):
            obj = train_data[train_scenes[i]]
            name = "".join(chr(j) for j in obj[:])
            if name not in self.train_scenes:
                self.train_scenes.append(name)


    def prepare_data_mp(self, output_dir, stride=1):
        num_processes = 32
        processes = []
        q = mp.Queue()
        if not os.path.isfile(os.path.join(output_dir, 'train.txt')):
            os.makedirs(output_dir)
            #f = open(os.path.join(output_dir, 'train.txt'), 'w')
            print('Preparing sequence data....')
            if not os.path.isdir(self.data_dir):
                raise
            dirlist = os.listdir(self.data_dir)
            total_dirlist = []
            # Get the different folders of images
            for d in dirlist:
                if not os.path.isdir(os.path.join(self.data_dir, d)):
                    continue
                seclist = os.listdir(os.path.join(self.data_dir, d))
                for s in seclist:
                    if os.path.isdir(os.path.join(self.data_dir, d, s)):
                        total_dirlist.append(os.path.join(d, s))
                        q.put(os.path.join(d, s))
            # Process every folder
            for rank in range(num_processes):
                p = mp.Process(target=process_folder, args=(q, self.data_dir, output_dir, stride, self.train_scenes))
                p.start()
                processes.append(p)
            for p in processes:
                p.join()
        
        # Collect the training frames.
        f = open(os.path.join(output_dir, 'train.txt'), 'w')
        for dirlist in os.listdir(output_dir):
            if os.path.isdir(os.path.join(output_dir, dirlist)):
                seclists = os.listdir(os.path.join(output_dir, dirlist))
                for s in seclists:
                    train_file = open(os.path.join(output_dir, dirlist, s, 'train.txt'), 'r')
                    for l in train_file.readlines():
                        f.write(l)
        f.close()
        
        f = open(os.path.join(output_dir, 'calib_cam_to_cam.txt'), 'w')
        f.write('P_rect: 5.1885790117450188e+02 0.0 3.2558244941119034e+02 0.0 0.0 5.1946961112127485e+02 2.5373616633400465e+02 0.0 0.0 0.0 1.0 0.0')
        f.close()
        print('Data Preparation Finished.')

    def __getitem__(self, idx):
        raise NotImplementedError



class NYU_v2(torch.utils.data.Dataset):
    def __init__(self, data_dir, num_scales=3, img_hw=(448, 576), num_iterations=None):
        super(NYU_v2, self).__init__()
        self.data_dir = data_dir
        self.num_scales = num_scales
        self.img_hw = img_hw
        self.num_iterations = num_iterations
        self.undist_coeff = np.array([2.07966153e-01, -5.8613825e-01, 7.223136313e-04, 1.047962719e-03, 4.98569866e-01])
        self.mapx, self.mapy = None, None
        self.roi = None

        info_file = os.path.join(self.data_dir, 'train.txt')
        self.data_list = self.get_data_list(info_file)

    def get_data_list(self, info_file):
        with open(info_file, 'r') as f:
            lines = f.readlines()
        data_list = []
        for line in lines:
            k = line.strip('\n').split()
            data = {}
            data['image_file'] = os.path.join(self.data_dir, k[0])
            data['cam_intrinsic_file'] = os.path.join(self.data_dir, k[1])
            data_list.append(data)
        print('A total of {} image pairs found'.format(len(data_list)))
        return data_list

    def count(self):
        return len(self.data_list)

    def rand_num(self, idx):
        num_total = self.count()
        np.random.seed(idx)
        num = np.random.randint(num_total)
        return num

    def __len__(self):
        if self.num_iterations is None:
            return self.count()
        else:
            return self.num_iterations

    def resize_img(self, img, img_hw):
        '''
        Input size (N*H, W, 3)
        Output size (N*H', W', 3), where (H', W') == self.img_hw
        '''
        img_h, img_w = img.shape[0], img.shape[1]
        img_hw_orig = (int(img_h / 2), img_w) 
        img1, img2 = img[:img_hw_orig[0], :, :], img[img_hw_orig[0]:, :, :]
        img1_new = cv2.resize(img1, (img_hw[1], img_hw[0]))
        img2_new = cv2.resize(img2, (img_hw[1], img_hw[0]))
        img_new = np.concatenate([img1_new, img2_new], 0)
        return img_new

    def random_flip_img(self, img):
        is_flip = (np.random.rand() > 0.5)
        if is_flip:
            img = cv2.flip(img, 1)
        return img
    
    def undistort_img(self, img, K):
        img_h, img_w = img.shape[0], img.shape[1]
        img_hw_orig = (int(img_h / 2), img_w) 
        img1, img2 = img[:img_hw_orig[0], :, :], img[img_hw_orig[0]:, :, :]
        
        h, w = img_hw_orig
        if self.mapx is None:
            newcameramtx, self.roi = cv2.getOptimalNewCameraMatrix(K, self.undist_coeff, (w,h), 1, (w,h))
            self.mapx, self.mapy = cv2.initUndistortRectifyMap(K, self.undist_coeff, None, newcameramtx, (w,h), 5)
        
        img1_undist = cv2.remap(img1, self.mapx, self.mapy, cv2.INTER_LINEAR)
        img2_undist = cv2.remap(img2, self.mapx, self.mapy, cv2.INTER_LINEAR)
        x,y,w,h = self.roi
        img1_undist = img1_undist[y:y+h, x:x+w]
        img2_undist = img2_undist[y:y+h, x:x+w]
        img_undist = np.concatenate([img1_undist, img2_undist], 0)
        #cv2.imwrite('./test.png', img)
        #cv2.imwrite('./test_undist.png', img_undist)
        #pdb.set_trace()
        return img_undist

    def preprocess_img(self, img, K, img_hw=None, is_test=False):
        if img_hw is None:
            img_hw = self.img_hw
        if not is_test:
            #img = img
            img = self.undistort_img(img, K)
            #img = self.random_flip_img(img)
            
        img = self.resize_img(img, img_hw)
        img = img / 255.0
        return img

    def read_cam_intrinsic(self, fname):
        with open(fname, 'r') as f:
            lines = f.readlines()
        data = lines[-1].strip('\n').split(' ')[1:]
        data = [float(k) for k in data]
        data = np.array(data).reshape(3,4)
        cam_intrinsics = data[:3,:3]
        return cam_intrinsics
    
    def rescale_intrinsics(self, K, img_hw_orig, img_hw_new):
        K_new = copy.deepcopy(K)
        K_new[0,:] = K_new[0,:] * img_hw_new[0] / img_hw_orig[0]
        K_new[1,:] = K_new[1,:] * img_hw_new[1] / img_hw_orig[1]
        return K_new

    def get_intrinsics_per_scale(self, K, scale):
        K_new = copy.deepcopy(K)
        K_new[0,:] = K_new[0,:] / (2**scale)
        K_new[1,:] = K_new[1,:] / (2**scale)
        K_new_inv = np.linalg.inv(K_new)
        return K_new, K_new_inv

    def get_multiscale_intrinsics(self, K, num_scales):
        K_ms, K_inv_ms = [], []
        for s in range(num_scales):
            K_new, K_new_inv = self.get_intrinsics_per_scale(K, s)
            K_ms.append(K_new[None,:,:])
            K_inv_ms.append(K_new_inv[None,:,:])
        K_ms = np.concatenate(K_ms, 0)
        K_inv_ms = np.concatenate(K_inv_ms, 0)
        return K_ms, K_inv_ms

    def __getitem__(self, idx):
        '''
        Returns:
        - img		torch.Tensor (N * H, W, 3)
        - K	torch.Tensor (num_scales, 3, 3)
        - K_inv	torch.Tensor (num_scales, 3, 3)
        '''
        if idx >= self.num_iterations:
            raise IndexError
        if self.num_iterations is not None:
            idx = self.rand_num(idx)
        data = self.data_list[idx]
        # load img
        img = cv2.imread(data['image_file'])
        img_hw_orig = (int(img.shape[0] / 2), img.shape[1])
        
        # load intrinsic
        cam_intrinsic_orig = self.read_cam_intrinsic(data['cam_intrinsic_file'])
        cam_intrinsic = self.rescale_intrinsics(cam_intrinsic_orig, img_hw_orig, self.img_hw)
        K_ms, K_inv_ms = self.get_multiscale_intrinsics(cam_intrinsic, self.num_scales) # (num_scales, 3, 3), (num_scales, 3, 3)
        
        # image preprocessing
        img = self.preprocess_img(img, cam_intrinsic_orig, self.img_hw) # (img_h * 2, img_w, 3)
        img = img.transpose(2,0,1)

        
        return torch.from_numpy(img).float(), torch.from_numpy(K_ms).float(), torch.from_numpy(K_inv_ms).float()





if __name__ == '__main__':
    data_dir = '/home4/zhaow/data/kitti'
    dirlist = os.listdir('/home4/zhaow/data/kitti')
    output_dir = '/home4/zhaow/data/kitti_seq/data_generated_s2'
    total_dirlist = []
    # Get the different folders of images
    for d in dirlist:
        seclist = os.listdir(os.path.join(data_dir, d))
        for s in seclist:
            if os.path.isdir(os.path.join(data_dir, d, s)):
                total_dirlist.append(os.path.join(d, s))
    
    F = open(os.path.join(output_dir, 'train.txt'), 'w')
    for p in total_dirlist:
        traintxt = os.path.join(os.path.join(output_dir, p), 'train.txt')
        f = open(traintxt, 'r')
        for line in f.readlines():
            F.write(line)
        print(traintxt)







================================================
FILE: core/evaluation/__init__.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from evaluate_flow import eval_flow_avg, load_gt_flow_kitti
from evaluate_mask import load_gt_mask
from evaluate_depth import eval_depth


================================================
FILE: core/evaluation/eval_odom.py
================================================
import copy
from matplotlib import pyplot as plt
import numpy as np
import os
from glob import glob
import pdb


def scale_lse_solver(X, Y):
    """Least-sqaure-error solver
    Compute optimal scaling factor so that s(X)-Y is minimum
    Args:
        X (KxN array): current data
        Y (KxN array): reference data
    Returns:
        scale (float): scaling factor
    """
    scale = np.sum(X * Y)/np.sum(X ** 2)
    return scale


def umeyama_alignment(x, y, with_scale=False):
    """
    Computes the least squares solution parameters of an Sim(m) matrix
    that minimizes the distance between a set of registered points.
    Umeyama, Shinji: Least-squares estimation of transformation parameters
                     between two point patterns. IEEE PAMI, 1991
    :param x: mxn matrix of points, m = dimension, n = nr. of data points
    :param y: mxn matrix of points, m = dimension, n = nr. of data points
    :param with_scale: set to True to align also the scale (default: 1.0 scale)
    :return: r, t, c - rotation matrix, translation vector and scale factor
    """
    if x.shape != y.shape:
        assert False, "x.shape not equal to y.shape"

    # m = dimension, n = nr. of data points
    m, n = x.shape

    # means, eq. 34 and 35
    mean_x = x.mean(axis=1)
    mean_y = y.mean(axis=1)

    # variance, eq. 36
    # "transpose" for column subtraction
    sigma_x = 1.0 / n * (np.linalg.norm(x - mean_x[:, np.newaxis])**2)

    # covariance matrix, eq. 38
    outer_sum = np.zeros((m, m))
    for i in range(n):
        outer_sum += np.outer((y[:, i] - mean_y), (x[:, i] - mean_x))
    cov_xy = np.multiply(1.0 / n, outer_sum)

    # SVD (text betw. eq. 38 and 39)
    u, d, v = np.linalg.svd(cov_xy)

    # S matrix, eq. 43
    s = np.eye(m)
    if np.linalg.det(u) * np.linalg.det(v) < 0.0:
        # Ensure a RHS coordinate system (Kabsch algorithm).
        s[m - 1, m - 1] = -1

    # rotation, eq. 40
    r = u.dot(s).dot(v)

    # scale & translation, eq. 42 and 41
    c = 1 / sigma_x * np.trace(np.diag(d).dot(s)) if with_scale else 1.0
    t = mean_y - np.multiply(c, r.dot(mean_x))

    return r, t, c


class KittiEvalOdom():
    # ----------------------------------------------------------------------
	# poses: N,4,4
	# pose: 4,4
	# ----------------------------------------------------------------------
    def __init__(self):
        self.lengths = [100, 200, 300, 400, 500, 600, 700, 800]
        self.num_lengths = len(self.lengths)

    def loadPoses(self, file_name):
        # ----------------------------------------------------------------------
		# Each line in the file should follow one of the following structures
		# (1) idx pose(3x4 matrix in terms of 12 numbers)
		# (2) pose(3x4 matrix in terms of 12 numbers)
		# ----------------------------------------------------------------------
        f = open(file_name, 'r')
        s = f.readlines()
        f.close()
        file_len = len(s)
        poses = {}
        for cnt, line in enumerate(s):
            P = np.eye(4)
            line_split = [float(i) for i in line.split(" ")]
            withIdx = int(len(line_split) == 13)
            for row in range(3):
                for col in range(4):
                    P[row, col] = line_split[row*4 + col + withIdx]
            if withIdx:
                frame_idx = line_split[0]
            else:
                frame_idx = cnt
            poses[frame_idx] = P
        return poses

    def trajectory_distances(self, poses):
        # ----------------------------------------------------------------------
		# poses: dictionary: [frame_idx: pose]
		# ----------------------------------------------------------------------
        dist = [0]
        sort_frame_idx = sorted(poses.keys())
        for i in range(len(sort_frame_idx)-1):
            cur_frame_idx = sort_frame_idx[i]   
            next_frame_idx = sort_frame_idx[i+1]
            P1 = poses[cur_frame_idx]
            P2 = poses[next_frame_idx]
            dx = P1[0, 3] - P2[0, 3]
            dy = P1[1, 3] - P2[1, 3]
            dz = P1[2, 3] - P2[2, 3]
            dist.append(dist[i]+np.sqrt(dx**2+dy**2+dz**2))
        return dist

    def rotation_error(self, pose_error):
        a = pose_error[0, 0]
        b = pose_error[1, 1]
        c = pose_error[2, 2]
        d = 0.5*(a+b+c-1.0)
        rot_error = np.arccos(max(min(d, 1.0), -1.0))
        return rot_error

    def translation_error(self, pose_error):
        dx = pose_error[0, 3]
        dy = pose_error[1, 3]
        dz = pose_error[2, 3]
        return np.sqrt(dx**2+dy**2+dz**2)

    def last_frame_from_segment_length(self, dist, first_frame, len_):
        for i in range(first_frame, len(dist), 1):
            if dist[i] > (dist[first_frame] + len_):
                return i
        return -1

    def calc_sequence_errors(self, poses_gt, poses_result):
        err = []
        dist = self.trajectory_distances(poses_gt)
        self.step_size = 10

        for first_frame in range(0, len(poses_gt), self.step_size):
            for i in range(self.num_lengths):
                len_ = self.lengths[i]
                last_frame = self.last_frame_from_segment_length(dist, first_frame, len_)

                # ----------------------------------------------------------------------
				# Continue if sequence not long enough
				# ----------------------------------------------------------------------
                if last_frame == -1 or not(last_frame in poses_result.keys()) or not(first_frame in poses_result.keys()):
                    continue

                # ----------------------------------------------------------------------
				# compute rotational and translational errors
				# ----------------------------------------------------------------------
                pose_delta_gt = np.dot(np.linalg.inv(poses_gt[first_frame]), poses_gt[last_frame])
                pose_delta_result = np.dot(np.linalg.inv(poses_result[first_frame]), poses_result[last_frame])
                pose_error = np.dot(np.linalg.inv(pose_delta_result), pose_delta_gt)

                r_err = self.rotation_error(pose_error)
                t_err = self.translation_error(pose_error)

                # ----------------------------------------------------------------------
				# compute speed 
				# ----------------------------------------------------------------------
                num_frames = last_frame - first_frame + 1.0
                speed = len_/(0.1*num_frames)

                err.append([first_frame, r_err/len_, t_err/len_, len_, speed])
        return err
        
    def save_sequence_errors(self, err, file_name):
        fp = open(file_name, 'w')
        for i in err:
            line_to_write = " ".join([str(j) for j in i])
            fp.writelines(line_to_write+"\n")
        fp.close()

    def compute_overall_err(self, seq_err):
        t_err = 0
        r_err = 0

        seq_len = len(seq_err)

        for item in seq_err:
            r_err += item[1]
            t_err += item[2]
        ave_t_err = t_err / seq_len
        ave_r_err = r_err / seq_len
        return ave_t_err, ave_r_err

    def plotPath(self, seq, poses_gt, poses_result):
        plot_keys = ["Ground Truth", "Ours"]
        fontsize_ = 20
        plot_num =-1

        poses_dict = {}
        poses_dict["Ground Truth"] = poses_gt
        poses_dict["Ours"] = poses_result

        fig = plt.figure()
        ax = plt.gca()
        ax.set_aspect('equal')

        for key in plot_keys:
            pos_xz = []
            # for pose in poses_dict[key]:
            for frame_idx in sorted(poses_dict[key].keys()):
                pose = poses_dict[key][frame_idx]
                pos_xz.append([pose[0,3],  pose[2,3]])
            pos_xz = np.asarray(pos_xz)
            plt.plot(pos_xz[:,0],  pos_xz[:,1], label = key)

        plt.legend(loc="upper right", prop={'size': fontsize_})
        plt.xticks(fontsize=fontsize_)
        plt.yticks(fontsize=fontsize_)
        plt.xlabel('x (m)', fontsize=fontsize_)
        plt.ylabel('z (m)', fontsize=fontsize_)
        fig.set_size_inches(10, 10)
        png_title = "sequence_"+(seq)
        plt.savefig(self.plot_path_dir + "/" + png_title + ".pdf", bbox_inches='tight', pad_inches=0)
        # plt.show()

    def compute_segment_error(self, seq_errs):
        # ----------------------------------------------------------------------
		# This function calculates average errors for different segment.
		# ----------------------------------------------------------------------

        segment_errs = {}
        avg_segment_errs = {}
        for len_ in self.lengths:
            segment_errs[len_] = []
        # ----------------------------------------------------------------------
		# Get errors
		# ----------------------------------------------------------------------
        for err in seq_errs:
            len_ = err[3]
            t_err = err[2]
            r_err = err[1]
            segment_errs[len_].append([t_err, r_err])
        # ----------------------------------------------------------------------
		# Compute average
		# ----------------------------------------------------------------------
        for len_ in self.lengths:
            if segment_errs[len_] != []:
                avg_t_err = np.mean(np.asarray(segment_errs[len_])[:, 0])
                avg_r_err = np.mean(np.asarray(segment_errs[len_])[:, 1])
                avg_segment_errs[len_] = [avg_t_err, avg_r_err]
            else:
                avg_segment_errs[len_] = []
        return avg_segment_errs

    def scale_optimization(self, gt, pred):
        """ Optimize scaling factor
        Args:
            gt (4x4 array dict): ground-truth poses
            pred (4x4 array dict): predicted poses
        Returns:
            new_pred (4x4 array dict): predicted poses after optimization
        """
        pred_updated = copy.deepcopy(pred)
        xyz_pred = []
        xyz_ref = []
        for i in pred:
            pose_pred = pred[i]
            pose_ref = gt[i]
            xyz_pred.append(pose_pred[:3, 3])
            xyz_ref.append(pose_ref[:3, 3])
        xyz_pred = np.asarray(xyz_pred)
        xyz_ref = np.asarray(xyz_ref)
        scale = scale_lse_solver(xyz_pred, xyz_ref)
        for i in pred_updated:
            pred_updated[i][:3, 3] *= scale
        return pred_updated

    def eval(self, gt_txt, result_txt, seq=None):
        # gt_dir: the directory of groundtruth poses txt
        # results_dir: the directory of predicted poses txt
        self.plot_path_dir = os.path.dirname(result_txt) + "/plot_path"
        if not os.path.exists(self.plot_path_dir):
            os.makedirs(self.plot_path_dir)
        
        self.gt_txt = gt_txt

        ave_t_errs = []
        ave_r_errs = []

        poses_result = self.loadPoses(result_txt)
        poses_gt = self.loadPoses(self.gt_txt)

        # Pose alignment to first frame
        idx_0 = sorted(list(poses_result.keys()))[0]
        pred_0 = poses_result[idx_0]
        gt_0 = poses_gt[idx_0]
        for cnt in poses_result:
            poses_result[cnt] = np.linalg.inv(pred_0) @ poses_result[cnt]
            poses_gt[cnt] = np.linalg.inv(gt_0) @ poses_gt[cnt]

        # get XYZ
        xyz_gt = []
        xyz_result = []
        for cnt in poses_result:
            xyz_gt.append([poses_gt[cnt][0, 3], poses_gt[cnt][1, 3], poses_gt[cnt][2, 3]])
            xyz_result.append([poses_result[cnt][0, 3], poses_result[cnt][1, 3], poses_result[cnt][2, 3]])
        xyz_gt = np.asarray(xyz_gt).transpose(1, 0)
        xyz_result = np.asarray(xyz_result).transpose(1, 0)

        r, t, scale = umeyama_alignment(xyz_result, xyz_gt, True)

        align_transformation = np.eye(4)
        align_transformation[:3:, :3] = r
        align_transformation[:3, 3] = t
        
        for cnt in poses_result:
            poses_result[cnt][:3, 3] *= scale
            poses_result[cnt] = align_transformation @ poses_result[cnt]

        # ----------------------------------------------------------------------
        # compute sequence errors
        # ----------------------------------------------------------------------
        seq_err = self.calc_sequence_errors(poses_gt, poses_result)

        # ----------------------------------------------------------------------
        # Compute segment errors
        # ----------------------------------------------------------------------
        avg_segment_errs = self.compute_segment_error(seq_err)

        # ----------------------------------------------------------------------
        # compute overall error
        # ----------------------------------------------------------------------
        ave_t_err, ave_r_err = self.compute_overall_err(seq_err)
        print("Sequence: " + seq)
        print("Translational error (%): ", ave_t_err*100)
        print("Rotational error (deg/100m): ", ave_r_err/np.pi*180*100)
        ave_t_errs.append(ave_t_err)
        ave_r_errs.append(ave_r_err)

        # Plotting
        self.plotPath(seq, poses_gt, poses_result)    

        print("-------------------- For Copying ------------------------------")
        for i in range(len(ave_t_errs)):
            print("{0:.2f}".format(ave_t_errs[i]*100))
            print("{0:.2f}".format(ave_r_errs[i]/np.pi*180*100))



if __name__ == '__main__':
    import argparse
    parser = argparse.ArgumentParser(description='KITTI evaluation')
    parser.add_argument('--gt_txt', type=str, required=True, help="Groundtruth directory")
    parser.add_argument('--result_txt', type=str, required=True, help="Result directory")
    parser.add_argument('--seq', type=str, help="sequences to be evaluated", default='09')
    args = parser.parse_args()

    eval_tool = KittiEvalOdom()
    eval_tool.eval(args.gt_txt, args.result_txt, seq=args.seq)


================================================
FILE: core/evaluation/evaluate_depth.py
================================================
from evaluation_utils import *

def process_depth(gt_depth, pred_depth, min_depth, max_depth):
    mask = gt_depth > 0
    pred_depth[pred_depth < min_depth] = min_depth
    pred_depth[pred_depth > max_depth] = max_depth
    gt_depth[gt_depth < min_depth] = min_depth
    gt_depth[gt_depth > max_depth] = max_depth

    return gt_depth, pred_depth, mask


def eval_depth(gt_depths,
               pred_depths,
               min_depth=1e-3,
               max_depth=80, nyu=False):
    num_samples = len(pred_depths)
    rms = np.zeros(num_samples, np.float32)
    log_rms = np.zeros(num_samples, np.float32)
    abs_rel = np.zeros(num_samples, np.float32)
    sq_rel = np.zeros(num_samples, np.float32)
    d1_all = np.zeros(num_samples, np.float32)
    a1 = np.zeros(num_samples, np.float32)
    a2 = np.zeros(num_samples, np.float32)
    a3 = np.zeros(num_samples, np.float32)

    for i in range(num_samples):
        gt_depth = gt_depths[i]
        pred_depth = pred_depths[i]
        mask = np.logical_and(gt_depth > min_depth, gt_depth < max_depth)
        
        if not nyu:
            gt_height, gt_width = gt_depth.shape
            crop = np.array([0.40810811 * gt_height, 0.99189189 * gt_height,
                            0.03594771 * gt_width, 0.96405229 * gt_width]).astype(np.int32)
            crop_mask = np.zeros(mask.shape)
            crop_mask[crop[0]:crop[1], crop[2]:crop[3]] = 1
            mask = np.logical_and(mask, crop_mask)
        
        gt_depth = gt_depth[mask]
        pred_depth = pred_depth[mask]
        scale = np.median(gt_depth) / np.median(pred_depth)
        pred_depth *= scale

        gt_depth, pred_depth, mask = process_depth(
            gt_depth, pred_depth, min_depth, max_depth)

        abs_rel[i], sq_rel[i], rms[i], log_rms[i], a1[i], a2[i], a3[
            i] = compute_errors(gt_depth, pred_depth, nyu=nyu)


    return [abs_rel.mean(), sq_rel.mean(), rms.mean(), log_rms.mean(), a1.mean(), a2.mean(), a3.mean()]



================================================
FILE: core/evaluation/evaluate_flow.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
import numpy as np
from flowlib import read_flow_png, flow_to_image
import cv2
import multiprocessing
import functools

def get_scaled_intrinsic_matrix(calib_file, zoom_x, zoom_y):
    intrinsics = load_intrinsics_raw(calib_file)
    intrinsics = scale_intrinsics(intrinsics, zoom_x, zoom_y)

    intrinsics[0, 1] = 0.0
    intrinsics[1, 0] = 0.0
    intrinsics[2, 0] = 0.0
    intrinsics[2, 1] = 0.0
    return intrinsics

def load_intrinsics_raw(calib_file):
    filedata = read_raw_calib_file(calib_file)
    if "P_rect_02" in filedata:
        P_rect = filedata['P_rect_02']
    else:
        P_rect = filedata['P2']
    P_rect = np.reshape(P_rect, (3, 4))
    intrinsics = P_rect[:3, :3]
    return intrinsics

def read_raw_calib_file(filepath):
    # From https://github.com/utiasSTARS/pykitti/blob/master/pykitti/utils.py
    """Read in a calibration file and parse into a dictionary."""
    data = {}

    with open(filepath, 'r') as f:
        for line in f.readlines():
            key, value = line.split(':', 1)
            # The only non-float values in these files are dates, which
            # we don't care about anyway
            try:
                data[key] = np.array([float(x) for x in value.split()])
            except ValueError:
                pass
    return data

def scale_intrinsics(mat, sx, sy):
    out = np.copy(mat)
    out[0, 0] *= sx
    out[0, 2] *= sx
    out[1, 1] *= sy
    out[1, 2] *= sy
    return out

def read_flow_gt_worker(dir_gt, i):
    flow_true = read_flow_png(
        os.path.join(dir_gt, "flow_occ", str(i).zfill(6) + "_10.png"))
    flow_noc_true = read_flow_png(
        os.path.join(dir_gt, "flow_noc", str(i).zfill(6) + "_10.png"))
    return flow_true, flow_noc_true[:, :, 2]

def load_gt_flow_kitti(gt_dataset_dir, mode):
    gt_flows = []
    noc_masks = []
    if mode == "kitti_2012":
        num_gt = 194
        dir_gt = gt_dataset_dir
    elif mode == "kitti_2015":
        num_gt = 200
        dir_gt = gt_dataset_dir
    else:
        num_gt = None
        dir_gt = None
        raise ValueError('Mode {} not found.'.format(mode))

    fun = functools.partial(read_flow_gt_worker, dir_gt)
    pool = multiprocessing.Pool(5)
    results = pool.imap(fun, range(num_gt), chunksize=10)
    pool.close()
    pool.join()

    for result in results:
        gt_flows.append(result[0])
        noc_masks.append(result[1])
    return gt_flows, noc_masks

def calculate_error_rate(epe_map, gt_flow, mask):
    bad_pixels = np.logical_and(
        epe_map * mask > 3,
        epe_map * mask / np.maximum(
            np.sqrt(np.sum(np.square(gt_flow), axis=2)), 1e-10) > 0.05)
    return bad_pixels.sum() / mask.sum()


def eval_flow_avg(gt_flows,
                  noc_masks,
                  pred_flows,
                  cfg,
                  moving_masks=None,
                  write_img=False):
    error, error_noc, error_occ, error_move, error_static, error_rate = 0.0, 0.0, 0.0, 0.0, 0.0, 0.0
    error_move_rate, error_static_rate = 0.0, 0.0

    num = len(gt_flows)
    for gt_flow, noc_mask, pred_flow, i in zip(gt_flows, noc_masks, pred_flows,
                                               range(len(gt_flows))):
        H, W = gt_flow.shape[0:2]

        pred_flow = np.copy(pred_flow)
        pred_flow[:, :, 0] = pred_flow[:, :, 0] / cfg.img_hw[1] * W
        pred_flow[:, :, 1] = pred_flow[:, :, 1] / cfg.img_hw[0] * H

        flo_pred = cv2.resize(
            pred_flow, (W, H), interpolation=cv2.INTER_LINEAR)

        if write_img:
            if not os.path.exists(os.path.join(cfg.model_dir, "pred_flow")):
                os.mkdir(os.path.join(cfg.model_dir, "pred_flow"))
            cv2.imwrite(
                os.path.join(cfg.model_dir, "pred_flow",
                             str(i).zfill(6) + "_10.png"),
                flow_to_image(flo_pred))
            cv2.imwrite(
                os.path.join(cfg.model_dir, "pred_flow",
                             str(i).zfill(6) + "_10_gt.png"),
                flow_to_image(gt_flow[:, :, 0:2]))
            cv2.imwrite(
                os.path.join(cfg.model_dir, "pred_flow",
                             str(i).zfill(6) + "_10_err.png"),
                flow_to_image(
                    (flo_pred - gt_flow[:, :, 0:2]) * gt_flow[:, :, 2:3]))

        epe_map = np.sqrt(
            np.sum(np.square(flo_pred[:, :, 0:2] - gt_flow[:, :, 0:2]),
                   axis=2))
        error += np.sum(epe_map * gt_flow[:, :, 2]) / np.sum(gt_flow[:, :, 2])

        error_noc += np.sum(epe_map * noc_mask) / np.sum(noc_mask)

        error_occ += np.sum(epe_map * (gt_flow[:, :, 2] - noc_mask)) / max(
            np.sum(gt_flow[:, :, 2] - noc_mask), 1.0)

        error_rate += calculate_error_rate(epe_map, gt_flow[:, :, 0:2],
                                           gt_flow[:, :, 2])

        if moving_masks:
            move_mask = moving_masks[i]

            error_move_rate += calculate_error_rate(
                epe_map, gt_flow[:, :, 0:2], gt_flow[:, :, 2] * move_mask)
            error_static_rate += calculate_error_rate(
                epe_map, gt_flow[:, :, 0:2],
                gt_flow[:, :, 2] * (1.0 - move_mask))

            error_move += np.sum(epe_map * gt_flow[:, :, 2] *
                                 move_mask) / np.sum(gt_flow[:, :, 2] *
                                                     move_mask)
            error_static += np.sum(epe_map * gt_flow[:, :, 2] * (
                1.0 - move_mask)) / np.sum(gt_flow[:, :, 2] *
                                           (1.0 - move_mask))

    if moving_masks:
        result = "{:>10}, {:>10}, {:>10}, {:>10}, {:>10}, {:>10}, {:>10}, {:>10} \n".format(
            'epe', 'epe_noc', 'epe_occ', 'epe_move', 'epe_static',
            'move_err_rate', 'static_err_rate', 'err_rate')
        result += "{:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f} \n".format(
            error / num, error_noc / num, error_occ / num, error_move / num,
            error_static / num, error_move_rate / num, error_static_rate / num,
            error_rate / num)
        return result
    else:
        result = "{:>10}, {:>10}, {:>10}, {:>10} \n".format(
            'epe', 'epe_noc', 'epe_occ', 'err_rate')
        result += "{:10.4f}, {:10.4f}, {:10.4f}, {:10.4f} \n".format(
            error / num, error_noc / num, error_occ / num, error_rate / num)
        return result


================================================
FILE: core/evaluation/evaluate_mask.py
================================================
import os
import numpy as np
import cv2
import functools
import matplotlib.pyplot as plt
import multiprocessing

"""
Adopted from https://github.com/martinkersner/py_img_seg_eval
"""

class EvalSegErr(Exception):
    def __init__(self, value):
        self.value = value

    def __str__(self):
        return repr(self.value)


def pixel_accuracy(eval_segm, gt_segm):
    '''
    sum_i(n_ii) / sum_i(t_i)
    '''

    check_size(eval_segm, gt_segm)

    cl, n_cl = extract_classes(gt_segm)
    eval_mask, gt_mask = extract_both_masks(eval_segm, gt_segm, cl, n_cl)

    sum_n_ii = 0
    sum_t_i = 0

    for i, c in enumerate(cl):
        curr_eval_mask = eval_mask[i, :, :]
        curr_gt_mask = gt_mask[i, :, :]

        sum_n_ii += np.sum(np.logical_and(curr_eval_mask, curr_gt_mask))
        sum_t_i += np.sum(curr_gt_mask)

    if (sum_t_i == 0):
        pixel_accuracy_ = 0
    else:
        pixel_accuracy_ = sum_n_ii / sum_t_i

    return pixel_accuracy_


def mean_accuracy(eval_segm, gt_segm):
    '''
    (1/n_cl) sum_i(n_ii/t_i)
    '''

    check_size(eval_segm, gt_segm)

    cl, n_cl = extract_classes(gt_segm)
    eval_mask, gt_mask = extract_both_masks(eval_segm, gt_segm, cl, n_cl)

    accuracy = list([0]) * n_cl

    for i, c in enumerate(cl):
        curr_eval_mask = eval_mask[i, :, :]
        curr_gt_mask = gt_mask[i, :, :]

        n_ii = np.sum(np.logical_and(curr_eval_mask, curr_gt_mask))
        t_i = np.sum(curr_gt_mask)

        if (t_i != 0):
            accuracy[i] = n_ii / t_i

    mean_accuracy_ = np.mean(accuracy)
    return mean_accuracy_


def mean_IU(eval_segm, gt_segm):
    '''
    (1/n_cl) * sum_i(n_ii / (t_i + sum_j(n_ji) - n_ii))
    '''

    check_size(eval_segm, gt_segm)

    cl, n_cl = union_classes(eval_segm, gt_segm)
    _, n_cl_gt = extract_classes(gt_segm)
    eval_mask, gt_mask = extract_both_masks(eval_segm, gt_segm, cl, n_cl)

    IU = list([0]) * n_cl

    for i, c in enumerate(cl):
        curr_eval_mask = eval_mask[i, :, :]
        curr_gt_mask = gt_mask[i, :, :]

        if (np.sum(curr_eval_mask) == 0) or (np.sum(curr_gt_mask) == 0):
            continue

        n_ii = np.sum(np.logical_and(curr_eval_mask, curr_gt_mask))
        t_i = np.sum(curr_gt_mask)
        n_ij = np.sum(curr_eval_mask)

        IU[i] = n_ii / (t_i + n_ij - n_ii)

    mean_IU_ = np.sum(IU) / n_cl_gt
    return mean_IU_, np.array(IU)


def frequency_weighted_IU(eval_segm, gt_segm):
    '''
    sum_k(t_k)^(-1) * sum_i((t_i*n_ii)/(t_i + sum_j(n_ji) - n_ii))
    '''

    check_size(eval_segm, gt_segm)

    cl, n_cl = union_classes(eval_segm, gt_segm)
    eval_mask, gt_mask = extract_both_masks(eval_segm, gt_segm, cl, n_cl)

    frequency_weighted_IU_ = list([0]) * n_cl

    for i, c in enumerate(cl):
        curr_eval_mask = eval_mask[i, :, :]
        curr_gt_mask = gt_mask[i, :, :]

        if (np.sum(curr_eval_mask) == 0) or (np.sum(curr_gt_mask) == 0):
            continue

        n_ii = np.sum(np.logical_and(curr_eval_mask, curr_gt_mask))
        t_i = np.sum(curr_gt_mask)
        n_ij = np.sum(curr_eval_mask)

        frequency_weighted_IU_[i] = (t_i * n_ii) / (t_i + n_ij - n_ii)

    sum_k_t_k = get_pixel_area(eval_segm)

    frequency_weighted_IU_ = np.sum(frequency_weighted_IU_) / sum_k_t_k
    return frequency_weighted_IU_


'''
Auxiliary functions used during evaluation.
'''


def get_pixel_area(segm):
    return segm.shape[0] * segm.shape[1]


def extract_both_masks(eval_segm, gt_segm, cl, n_cl):
    eval_mask = extract_masks(eval_segm, cl, n_cl)
    gt_mask = extract_masks(gt_segm, cl, n_cl)

    return eval_mask, gt_mask


def extract_classes(segm):
    cl = np.unique(segm)
    n_cl = len(cl)

    return cl, n_cl


def union_classes(eval_segm, gt_segm):
    eval_cl, _ = extract_classes(eval_segm)
    gt_cl, _ = extract_classes(gt_segm)

    cl = np.union1d(eval_cl, gt_cl)
    n_cl = len(cl)

    return cl, n_cl


def extract_masks(segm, cl, n_cl):
    h, w = segm_size(segm)
    masks = np.zeros((n_cl, h, w))

    for i, c in enumerate(cl):
        masks[i, :, :] = segm == c

    return masks


def segm_size(segm):
    try:
        height = segm.shape[0]
        width = segm.shape[1]
    except IndexError:
        raise

    return height, width


def check_size(eval_segm, gt_segm):
    h_e, w_e = segm_size(eval_segm)
    h_g, w_g = segm_size(gt_segm)

    if (h_e != h_g) or (w_e != w_g):
        raise EvalSegErr("DiffDim: Different dimensions of matrices!")

def read_mask_gt_worker(gt_dataset_dir, idx):
    return cv2.imread(
        gt_dataset_dir + "/obj_map/" + str(idx).zfill(6) + "_10.png", -1)

def load_gt_mask(gt_dataset_dir):
    num_gt = 200
    
    # the dataset dir should be the directory of kitti-2015.
    fun = functools.partial(read_mask_gt_worker, gt_dataset_dir)
    pool = multiprocessing.Pool(5)
    results = pool.imap(fun, range(num_gt), chunksize=10)
    pool.close()
    pool.join()

    gt_masks = []
    for m in results:
        m[m > 0.0] = 1.0
        gt_masks.append(m)
    return gt_masks


def eval_mask(pred_masks, gt_masks, opt):
    grey_cmap = plt.get_cmap("Greys")
    if not os.path.exists(os.path.join(opt.trace, "pred_mask")):
        os.mkdir(os.path.join(opt.trace, "pred_mask"))

    pa_res, ma_res, mIU_res, fwIU_res = 0.0, 0.0, 0.0, 0.0
    IU_res = np.array([0.0, 0.0])

    num_total = len(gt_masks)
    for i in range(num_total):
        gt_mask = gt_masks[i]
        H, W = gt_mask.shape[0:2]

        pred_mask = cv2.resize(
            pred_masks[i], (W, H), interpolation=cv2.INTER_LINEAR)

        pred_mask[pred_mask >= 0.5] = 1.0
        pred_mask[pred_mask < 0.5] = 0.0

        cv2.imwrite(
            os.path.join(opt.trace, "pred_mask",
                         str(i).zfill(6) + "_10_plot.png"),
            grey_cmap(pred_mask))
        cv2.imwrite(
            os.path.join(opt.trace, "pred_mask", str(i).zfill(6) + "_10.png"),
            pred_mask)

        pa_res += pixel_accuracy(pred_mask, gt_mask)
        ma_res += mean_accuracy(pred_mask, gt_mask)

        mIU, IU = mean_IU(pred_mask, gt_mask)
        mIU_res += mIU
        IU_res += IU

        fwIU_res += frequency_weighted_IU(pred_mask, gt_mask)

    return pa_res / 200., ma_res / 200., mIU_res / 200., fwIU_res / 200., IU_res / 200.


================================================
FILE: core/evaluation/evaluation_utils.py
================================================
import numpy as np
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
import cv2, skimage
import skimage.io
#import scipy.misc as sm
import imageio as sm


# Adopted from https://github.com/mrharicot/monodepth
def compute_errors(gt, pred, nyu=False):
    thresh = np.maximum((gt / pred), (pred / gt))
    a1 = (thresh < 1.25).mean()
    a2 = (thresh < 1.25**2).mean()
    a3 = (thresh < 1.25**3).mean()

    rmse = (gt - pred)**2
    rmse = np.sqrt(rmse.mean())

    rmse_log = (np.log(gt) - np.log(pred))**2
    rmse_log = np.sqrt(rmse_log.mean())

    log10 = np.mean(np.abs((np.log10(gt) - np.log10(pred))))

    abs_rel = np.mean(np.abs(gt - pred) / (gt))

    sq_rel = np.mean(((gt - pred)**2) / (gt))

    if nyu:
        return abs_rel, sq_rel, rmse, log10, a1, a2, a3
    else:
        return abs_rel, sq_rel, rmse, rmse_log, a1, a2, a3



================================================
FILE: core/evaluation/flowlib.py
================================================
#!/usr/bin/python
"""
Adopted from https://github.com/liruoteng/OpticalFlowToolkit
# ==============================
# flowlib.py
# library for optical flow processing
# Author: Ruoteng Li
# Date: 6th Aug 2016
# ==============================
"""
import png
import scipy
import numpy as np
import matplotlib.colors as cl
import matplotlib.pyplot as plt
from PIL import Image
#import cv2

UNKNOWN_FLOW_THRESH = 1e7
SMALLFLOW = 0.0
LARGEFLOW = 1e8
"""
=============
Flow Section
=============
"""


def show_flow(filename):
    """
    visualize optical flow map using matplotlib
    :param filename: optical flow file
    :return: None
    """
    flow = read_flow(filename)
    img = flow_to_image(flow)
    plt.imshow(img)
    plt.show()


def visualize_flow(flow, mode='Y'):
    """
    this function visualize the input flow
    :param flow: input flow in array
    :param mode: choose which color mode to visualize the flow (Y: Ccbcr, RGB: RGB color)
    :return: None
    """
    if mode == 'Y':
        # Ccbcr color wheel
        img = flow_to_image(flow)
        plt.imshow(img)
        plt.show()
    elif mode == 'RGB':
        (h, w) = flow.shape[0:2]
        du = flow[:, :, 0]
        dv = flow[:, :, 1]
        valid = flow[:, :, 2]
        max_flow = max(np.max(du), np.max(dv))
        img = np.zeros((h, w, 3), dtype=np.float64)
        # angle layer
        img[:, :, 0] = np.arctan2(dv, du) / (2 * np.pi)
        # magnitude layer, normalized to 1
        img[:, :, 1] = np.sqrt(du * du + dv * dv) * 8 / max_flow
        # phase layer
        img[:, :, 2] = 8 - img[:, :, 1]
        # clip to [0,1]
        small_idx = img[:, :, 0:3] < 0
        large_idx = img[:, :, 0:3] > 1
        img[small_idx] = 0
        img[large_idx] = 1
        # convert to rgb
        img = cl.hsv_to_rgb(img)
        # remove invalid point
        img[:, :, 0] = img[:, :, 0] * valid
        img[:, :, 1] = img[:, :, 1] * valid
        img[:, :, 2] = img[:, :, 2] * valid
        # show
        plt.imshow(img)
        plt.show()

    return None


def read_flow(filename):
    """
    read optical flow from Middlebury .flo file
    :param filename: name of the flow file
    :return: optical flow data in matrix
    """
    f = open(filename, 'rb')
    magic = np.fromfile(f, np.float32, count=1)
    data2d = None

    if 202021.25 != magic:
        print('Magic number incorrect. Invalid .flo file')
    else:
        w = np.fromfile(f, np.int32, count=1)[0]
        h = np.fromfile(f, np.int32, count=1)[0]
        print("Reading %d x %d flo file" % (h, w))
        data2d = np.fromfile(f, np.float32, count=2 * w * h)
        # reshape data into 3D array (columns, rows, channels)
        data2d = np.resize(data2d, (h, w, 2))
    f.close()
    return data2d


def read_flow_png(flow_file):
    """
    Read optical flow from KITTI .png file
    :param flow_file: name of the flow file
    :return: optical flow data in matrix
    """
    flow_object = png.Reader(filename=flow_file)
    flow_direct = flow_object.asDirect()
    flow_data = list(flow_direct[2])
    (w, h) = flow_direct[3]['size']
    flow = np.zeros((h, w, 3), dtype=np.float64)
    for i in range(len(flow_data)):
        flow[i, :, 0] = flow_data[i][0::3]
        flow[i, :, 1] = flow_data[i][1::3]
        flow[i, :, 2] = flow_data[i][2::3]

    invalid_idx = (flow[:, :, 2] == 0)
    flow[:, :, 0:2] = (flow[:, :, 0:2] - 2**15) / 64.0
    flow[invalid_idx, 0] = 0
    flow[invalid_idx, 1] = 0
    return flow


def write_flow_png(flo, flow_file):
    h, w, _ = flo.shape
    out_flo = np.ones((h, w, 3), dtype=np.float32)
    out_flo[:, :, 0] = np.maximum(
        np.minimum(flo[:, :, 0] * 64.0 + 2**15, 2**16 - 1), 0)
    out_flo[:, :, 1] = np.maximum(
        np.minimum(flo[:, :, 1] * 64.0 + 2**15, 2**16 - 1), 0)
    out_flo = out_flo.astype(np.uint16)

    with open(flow_file, 'wb') as f:
        writer = png.Writer(width=w, height=h, bitdepth=16)
        # Convert z to the Python list of lists expected by
        # the png writer.
        z2list = out_flo.reshape(-1, w * 3).tolist()
        writer.write(f, z2list)


def write_flow(flow, filename):
    """
    write optical flow in Middlebury .flo format
    :param flow: optical flow map
    :param filename: optical flow file path to be saved
    :return: None
    """
    f = open(filename, 'wb')
    magic = np.array([202021.25], dtype=np.float32)
    (height, width) = flow.shape[0:2]
    w = np.array([width], dtype=np.int32)
    h = np.array([height], dtype=np.int32)
    magic.tofile(f)
    w.tofile(f)
    h.tofile(f)
    flow.tofile(f)
    f.close()


def segment_flow(flow):
    h = flow.shape[0]
    w = flow.shape[1]
    u = flow[:, :, 0]
    v = flow[:, :, 1]

    idx = ((abs(u) > LARGEFLOW) | (abs(v) > LARGEFLOW))
    idx2 = (abs(u) == SMALLFLOW)
    class0 = (v == 0) & (u == 0)
    u[idx2] = 0.00001
    tan_value = v / u

    class1 = (tan_value < 1) & (tan_value >= 0) & (u > 0) & (v >= 0)
    class2 = (tan_value >= 1) & (u >= 0) & (v >= 0)
    class3 = (tan_value < -1) & (u <= 0) & (v >= 0)
    class4 = (tan_value < 0) & (tan_value >= -1) & (u < 0) & (v >= 0)
    class8 = (tan_value >= -1) & (tan_value < 0) & (u > 0) & (v <= 0)
    class7 = (tan_value < -1) & (u >= 0) & (v <= 0)
    class6 = (tan_value >= 1) & (u <= 0) & (v <= 0)
    class5 = (tan_value >= 0) & (tan_value < 1) & (u < 0) & (v <= 0)

    seg = np.zeros((h, w))

    seg[class1] = 1
    seg[class2] = 2
    seg[class3] = 3
    seg[class4] = 4
    seg[class5] = 5
    seg[class6] = 6
    seg[class7] = 7
    seg[class8] = 8
    seg[class0] = 0
    seg[idx] = 0

    return seg


def flow_error(tu, tv, u, v):
    """
    Calculate average end point error
    :param tu: ground-truth horizontal flow map
    :param tv: ground-truth vertical flow map
    :param u:  estimated horizontal flow map
    :param v:  estimated vertical flow map
    :return: End point error of the estimated flow
    """
    smallflow = 0.0
    '''
    stu = tu[bord+1:end-bord,bord+1:end-bord]
    stv = tv[bord+1:end-bord,bord+1:end-bord]
    su = u[bord+1:end-bord,bord+1:end-bord]
    sv = v[bord+1:end-bord,bord+1:end-bord]
    '''
    stu = tu[:]
    stv = tv[:]
    su = u[:]
    sv = v[:]

    idxUnknow = (abs(stu) > UNKNOWN_FLOW_THRESH) | (
        abs(stv) > UNKNOWN_FLOW_THRESH)
    stu[idxUnknow] = 0
    stv[idxUnknow] = 0
    su[idxUnknow] = 0
    sv[idxUnknow] = 0

    ind2 = [(np.absolute(stu) > smallflow) | (np.absolute(stv) > smallflow)]
    index_su = su[ind2]
    index_sv = sv[ind2]
    an = 1.0 / np.sqrt(index_su**2 + index_sv**2 + 1)
    un = index_su * an
    vn = index_sv * an

    index_stu = stu[ind2]
    index_stv = stv[ind2]
    tn = 1.0 / np.sqrt(index_stu**2 + index_stv**2 + 1)
    tun = index_stu * tn
    tvn = index_stv * tn
    '''
    angle = un * tun + vn * tvn + (an * tn)
    index = [angle == 1.0]
    angle[index] = 0.999
    ang = np.arccos(angle)
    mang = np.mean(ang)
    mang = mang * 180 / np.pi
    '''

    epe = np.sqrt((stu - su)**2 + (stv - sv)**2)
    epe = epe[ind2]
    mepe = np.mean(epe)
    return mepe


def flow_to_image(flow):
    """
    Convert flow into middlebury color code image
    :param flow: optical flow map
    :return: optical flow image in middlebury color
    """
    u = flow[:, :, 0]
    v = flow[:, :, 1]

    maxu = -999.
    maxv = -999.
    minu = 999.
    minv = 999.

    idxUnknow = (abs(u) > UNKNOWN_FLOW_THRESH) | (abs(v) > UNKNOWN_FLOW_THRESH)
    u[idxUnknow] = 0
    v[idxUnknow] = 0

    maxu = max(maxu, np.max(u))
    minu = min(minu, np.min(u))

    maxv = max(maxv, np.max(v))
    minv = min(minv, np.min(v))

    rad = np.sqrt(u**2 + v**2)
    maxrad = max(-1, np.max(rad))

    print("max flow: %.4f\nflow range:\nu = %.3f .. %.3f\nv = %.3f .. %.3f" % (
        maxrad, minu, maxu, minv, maxv))

    u = u / (maxrad + np.finfo(float).eps)
    v = v / (maxrad + np.finfo(float).eps)

    img = compute_color(u, v)

    idx = np.repeat(idxUnknow[:, :, np.newaxis], 3, axis=2)
    img[idx] = 0

    return np.uint8(img)


def evaluate_flow_file(gt, pred):
    """
    evaluate the estimated optical flow end point error according to ground truth provided
    :param gt: ground truth file path
    :param pred: estimated optical flow file path
    :return: end point error, float32
    """
    # Read flow files and calculate the errors
    gt_flow = read_flow(gt)  # ground truth flow
    eva_flow = read_flow(pred)  # predicted flow
    # Calculate errors
    average_pe = flow_error(gt_flow[:, :, 0], gt_flow[:, :, 1],
                            eva_flow[:, :, 0], eva_flow[:, :, 1])
    return average_pe


def evaluate_flow(gt_flow, pred_flow):
    """
    gt: ground-truth flow
    pred: estimated flow
    """
    average_pe = flow_error(gt_flow[:, :, 0], gt_flow[:, :, 1],
                            pred_flow[:, :, 0], pred_flow[:, :, 1])
    return average_pe


"""
==============
Disparity Section
==============
"""


def read_disp_png(file_name):
    """
    Read optical flow from KITTI .png file
    :param file_name: name of the flow file
    :return: optical flow data in matrix
    """
    image_object = png.Reader(filename=file_name)
    image_direct = image_object.asDirect()
    image_data = list(image_direct[2])
    (w, h) = image_direct[3]['size']
    channel = len(image_data[0]) / w
    flow = np.zeros((h, w, channel), dtype=np.uint16)
    for i in range(len(image_data)):
        for j in range(channel):
            flow[i, :, j] = image_data[i][j::channel]
    return flow[:, :, 0] / 256


def disp_to_flowfile(disp, filename):
    """
    Read KITTI disparity file in png format
    :param disp: disparity matrix
    :param filename: the flow file name to save
    :return: None
    """
    f = open(filename, 'wb')
    magic = np.array([202021.25], dtype=np.float32)
    (height, width) = disp.shape[0:2]
    w = np.array([width], dtype=np.int32)
    h = np.array([height], dtype=np.int32)
    empty_map = np.zeros((height, width), dtype=np.float32)
    data = np.dstack((disp, empty_map))
    magic.tofile(f)
    w.tofile(f)
    h.tofile(f)
    data.tofile(f)
    f.close()


"""
==============
Image Section
==============
"""


def read_image(filename):
    """
    Read normal image of any format
    :param filename: name of the image file
    :return: image data in matrix uint8 type
    """
    img = Image.open(filename)
    im = np.array(img)
    return im


def warp_image(im, flow):
    """
    Use optical flow to warp image to the next
    :param im: image to warp
    :param flow: optical flow
    :return: warped image
    """
    image_height = im.shape[0]
    image_width = im.shape[1]
    flow_height = flow.shape[0]
    flow_width = flow.shape[1]
    n = image_height * image_width
    (iy, ix) = np.mgrid[0:image_height, 0:image_width]
    (fy, fx) = np.mgrid[0:flow_height, 0:flow_width]
    fx += flow[:, :, 0]
    fy += flow[:, :, 1]
    mask = fx < 0 | fx > flow_width | fy < 0 | fy > flow_height
    fx = np.min(np.max(fx, 0), flow_width)
    fy = np.min(np.max(fy, 0), flow_height)
    points = np.concatenate((ix.reshape(n, 1), iy.reshape(n, 1)), axis=1)
    xi = np.concatenate((fx.reshape(n, 1), fy.reshape(n, 1)), axis=1)
    warp = np.zeros((image_height, image_width, im.shape[2]))
    for i in range(im.shape[2]):
        channel = im[:, :, i]
        values = channel.reshape(n, 1)
        new_channel = scipy.interpolate.griddata(
            points, values, xi, method='cubic')
        new_channel[mask] = 1
        warp[:, :, i] = new_channel
    return warp


"""
==============
Others
==============
"""


def scale_image(image, new_range):
    """
    Linearly scale the image into desired range
    :param image: input image
    :param new_range: the new range to be aligned
    :return: image normalized in new range
    """
    min_val = np.min(image).astype(np.float32)
    max_val = np.max(image).astype(np.float32)
    min_val_new = np.array(min(new_range), dtype=np.float32)
    max_val_new = np.array(max(new_range), dtype=np.float32)
    scaled_image = (image - min_val) / (max_val - min_val) * (
        max_val_new - min_val_new) + min_val_new
    return scaled_image.astype(np.uint8)


def compute_color(u, v):
    """
    compute optical flow color map
    :param u: optical flow horizontal map
    :param v: optical flow vertical map
    :return: optical flow in color code
    """
    [h, w] = u.shape
    img = np.zeros([h, w, 3])
    nanIdx = np.isnan(u) | np.isnan(v)
    u[nanIdx] = 0
    v[nanIdx] = 0

    colorwheel = make_color_wheel()
    ncols = np.size(colorwheel, 0)

    rad = np.sqrt(u**2 + v**2)

    a = np.arctan2(-v, -u) / np.pi

    fk = (a + 1) / 2 * (ncols - 1) + 1

    k0 = np.floor(fk).astype(int)

    k1 = k0 + 1
    k1[k1 == ncols + 1] = 1
    f = fk - k0

    for i in range(0, np.size(colorwheel, 1)):
        tmp = colorwheel[:, i]
        col0 = tmp[k0 - 1] / 255
        col1 = tmp[k1 - 1] / 255
        col = (1 - f) * col0 + f * col1

        idx = rad <= 1
        col[idx] = 1 - rad[idx] * (1 - col[idx])
        notidx = np.logical_not(idx)

        col[notidx] *= 0.75
        img[:, :, i] = np.uint8(np.floor(255 * col * (1 - nanIdx)))

    return img


def make_color_wheel():
    """
    Generate color wheel according Middlebury color code
    :return: Color wheel
    """
    RY = 15
    YG = 6
    GC = 4
    CB = 11
    BM = 13
    MR = 6

    ncols = RY + YG + GC + CB + BM + MR

    colorwheel = np.zeros([ncols, 3])

    col = 0

    # RY
    colorwheel[0:RY, 0] = 255
    colorwheel[0:RY, 1] = np.transpose(np.floor(255 * np.arange(0, RY) / RY))
    col += RY

    # YG
    colorwheel[col:col + YG, 0] = 255 - np.transpose(
        np.floor(255 * np.arange(0, YG) / YG))
    colorwheel[col:col + YG, 1] = 255
    col += YG

    # GC
    colorwheel[col:col + GC, 1] = 255
    colorwheel[col:col + GC, 2] = np.transpose(
        np.floor(255 * np.arange(0, GC) / GC))
    col += GC

    # CB
    colorwheel[col:col + CB, 1] = 255 - np.transpose(
        np.floor(255 * np.arange(0, CB) / CB))
    colorwheel[col:col + CB, 2] = 255
    col += CB

    # BM
    colorwheel[col:col + BM, 2] = 255
    colorwheel[col:col + BM, 0] = np.transpose(
        np.floor(255 * np.arange(0, BM) / BM))
    col += +BM

    # MR
    colorwheel[col:col + MR, 2] = 255 - np.transpose(
        np.floor(255 * np.arange(0, MR) / MR))
    colorwheel[col:col + MR, 0] = 255

    return colorwheel


================================================
FILE: core/networks/__init__.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from model_flow import Model_flow
from model_triangulate_pose import Model_triangulate_pose
from model_depth_pose import Model_depth_pose
from model_flowposenet import Model_flowposenet

def get_model(mode):
    if mode == 'flow':
        return Model_flow
    elif mode == 'pose' or mode == 'pose_flow':
        return Model_triangulate_pose
    elif mode == 'depth' or mode == 'depth_pose':
        return Model_depth_pose
    elif mode == 'flowposenet':
        return Model_flowposenet
    else:
        raise ValueError('Mode {} not found.'.format(mode))


================================================
FILE: core/networks/model_depth_pose.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from structures import *
from model_triangulate_pose import Model_triangulate_pose
from pytorch_ssim import SSIM
sys.path.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', 'visualize'))
from visualizer import *
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import pdb

class Model_depth_pose(nn.Module):
    def __init__(self, cfg):
        super(Model_depth_pose, self).__init__()
        self.depth_match_num = cfg.depth_match_num
        self.depth_sample_ratio = cfg.depth_sample_ratio
        self.depth_scale = cfg.depth_scale
        self.w_flow_error = cfg.w_flow_error
        self.dataset = cfg.dataset

        self.depth_net = Depth_Model(cfg.depth_scale)
        self.model_pose = Model_triangulate_pose(cfg)

    def meshgrid(self, h, w):
        xx, yy = np.meshgrid(np.arange(0,w), np.arange(0,h))
        meshgrid = np.transpose(np.stack([xx,yy], axis=-1), [2,0,1]) # [2,h,w]
        meshgrid = torch.from_numpy(meshgrid)
        return meshgrid
    
    def robust_rand_sample(self, match, mask, num):
        # match: [b, 4, -1] mask: [b, 1, -1]
        b, n = match.shape[0], match.shape[2]
        nonzeros_num = torch.min(torch.sum(mask > 0, dim=-1)) # []
        if nonzeros_num.detach().cpu().numpy() == n:
            rand_int = torch.randint(0, n, [num])
            select_match = match[:,:,rand_int]
        else:
            # If there is zero score in match, sample the non-zero matches.
            num = np.minimum(nonzeros_num.detach().cpu().numpy(), num)
            select_idxs = []
            for i in range(b):
                nonzero_idx = torch.nonzero(mask[i,0,:]) # [nonzero_num,1]
                rand_int = torch.randint(0, nonzero_idx.shape[0], [int(num)])
                select_idx = nonzero_idx[rand_int, :] # [num, 1]
                select_idxs.append(select_idx)
            select_idxs = torch.stack(select_idxs, 0) # [b,num,1]
            select_match = torch.gather(match.transpose(1,2), index=select_idxs.repeat(1,1,4), dim=1).transpose(1,2) # [b, 4, num]
        return select_match, num

    def top_ratio_sample(self, match, mask, ratio):
        # match: [b, 4, -1] mask: [b, 1, -1]
        b, total_num = match.shape[0], match.shape[-1]
        scores, indices = torch.topk(mask, int(ratio*total_num), dim=-1) # [B, 1, ratio*tnum]
        select_match = torch.gather(match.transpose(1,2), index=indices.squeeze(1).unsqueeze(-1).repeat(1,1,4), dim=1).transpose(1,2) # [b, 4, ratio*tnum]
        return select_match, scores
    
    def rand_sample(self, match, num):
        b, c, n = match.shape[0], match.shape[1], match.shape[2]
        rand_int = torch.randint(0, match.shape[-1], size=[num])
        select_pts = match[:,:,rand_int]
        return select_pts
    
    def filt_negative_depth(self, point2d_1_depth, point2d_2_depth, point2d_1_coord, point2d_2_coord):
        # Filter out the negative projection depth.
        # point2d_1_depth: [b, n, 1]
        b, n = point2d_1_depth.shape[0], point2d_1_depth.shape[1]
        mask = (point2d_1_depth > 0.01).float() * (point2d_2_depth > 0.01).float()
        select_idxs = []
        flag = 0
        for i in range(b):
            if torch.sum(mask[i,:,0]) == n:
                idx = torch.arange(n).to(mask.get_device())
            else:
                nonzero_idx = torch.nonzero(mask[i,:,0]).squeeze(1) # [k]
                if nonzero_idx.shape[0] < 0.1*n:
                    idx = torch.arange(n).to(mask.get_device())
                    flag = 1
                else:
                    res = torch.randint(0, nonzero_idx.shape[0], size=[n-nonzero_idx.shape[0]]).to(mask.get_device()) # [n-nz]
                    idx = torch.cat([nonzero_idx, nonzero_idx[res]], 0)
            select_idxs.append(idx)
        select_idxs = torch.stack(select_idxs, dim=0) # [b,n]
        point2d_1_depth = torch.gather(point2d_1_depth, index=select_idxs.unsqueeze(-1), dim=1) # [b,n,1]
        point2d_2_depth = torch.gather(point2d_2_depth, index=select_idxs.unsqueeze(-1), dim=1) # [b,n,1]
        point2d_1_coord = torch.gather(point2d_1_coord, index=select_idxs.unsqueeze(-1).repeat(1,1,2), dim=1) # [b,n,2]
        point2d_2_coord = torch.gather(point2d_2_coord, index=select_idxs.unsqueeze(-1).repeat(1,1,2), dim=1) # [b,n,2]
        return point2d_1_depth, point2d_2_depth, point2d_1_coord, point2d_2_coord, flag
    
    def filt_invalid_coord(self, point2d_1_depth, point2d_2_depth, point2d_1_coord, point2d_2_coord, max_h, max_w):
        # Filter out the negative projection depth.
        # point2d_1_depth: [b, n, 1]
        b, n = point2d_1_coord.shape[0], point2d_1_coord.shape[1]
        max_coord = torch.Tensor([max_w, max_h]).to(point2d_1_coord.get_device())
        mask = (point2d_1_coord > 0).all(dim=-1, keepdim=True).float() * (point2d_2_coord > 0).all(dim=-1, keepdim=True).float() * \
            (point2d_1_coord < max_coord).all(dim=-1, keepdim=True).float() * (point2d_2_coord < max_coord).all(dim=-1, keepdim=True).float()
        
        flag = 0
        if torch.sum(1.0-mask) == 0:
            return point2d_1_depth, point2d_2_depth, point2d_1_coord, point2d_2_coord, flag

        select_idxs = []
        for i in range(b):
            if torch.sum(mask[i,:,0]) == n:
                idx = torch.arange(n).to(mask.get_device())
            else:
                nonzero_idx = torch.nonzero(mask[i,:,0]).squeeze(1) # [k]
                if nonzero_idx.shape[0] < 0.1*n:
                    idx = torch.arange(n).to(mask.get_device())
                    flag = 1
                else:
                    res = torch.randint(0, nonzero_idx.shape[0], size=[n-nonzero_idx.shape[0]]).to(mask.get_device())
                    idx = torch.cat([nonzero_idx, nonzero_idx[res]], 0)
            select_idxs.append(idx)
        select_idxs = torch.stack(select_idxs, dim=0) # [b,n]
        point2d_1_depth = torch.gather(point2d_1_depth, index=select_idxs.unsqueeze(-1), dim=1) # [b,n,1]
        point2d_2_depth = torch.gather(point2d_2_depth, index=select_idxs.unsqueeze(-1), dim=1) # [b,n,1]
        point2d_1_coord = torch.gather(point2d_1_coord, index=select_idxs.unsqueeze(-1).repeat(1,1,2), dim=1) # [b,n,2]
        point2d_2_coord = torch.gather(point2d_2_coord, index=select_idxs.unsqueeze(-1).repeat(1,1,2), dim=1) # [b,n,2]
        return point2d_1_depth, point2d_2_depth, point2d_1_coord, point2d_2_coord, flag
    
    def ray_angle_filter(self, match, P1, P2, return_angle=False):
        # match: [b, 4, n] P: [B, 3, 4]
        b, n = match.shape[0], match.shape[2]
        K = P1[:,:,:3] # P1 with identity rotation and zero translation
        K_inv = torch.inverse(K)
        RT1 = K_inv.bmm(P1) # [b, 3, 4]
        RT2 = K_inv.bmm(P2)
        ones = torch.ones([b,1,n]).to(match.get_device())
        pts1 = torch.cat([match[:,:2,:], ones], 1)
        pts2 = torch.cat([match[:,2:,:], ones], 1)
        
        ray1_dir = (RT1[:,:,:3].transpose(1,2)).bmm(K_inv).bmm(pts1)# [b,3,n]
        ray1_dir = ray1_dir / (torch.norm(ray1_dir, dim=1, keepdim=True, p=2) + 1e-12)
        ray1_origin = (-1) * RT1[:,:,:3].transpose(1,2).bmm(RT1[:,:,3].unsqueeze(-1)) # [b, 3, 1]
        ray2_dir = (RT2[:,:,:3].transpose(1,2)).bmm(K_inv).bmm(pts2) # [b,3,n]
        ray2_dir = ray2_dir / (torch.norm(ray2_dir, dim=1, keepdim=True, p=2) + 1e-12)
        ray2_origin = (-1) * RT2[:,:,:3].transpose(1,2).bmm(RT2[:,:,3].unsqueeze(-1)) # [b, 3, 1]

        # We compute the angle betwwen vertical line from ray1 origin to ray2 and ray1.
        p1p2 = (ray1_origin - ray2_origin).repeat(1,1,n)
        verline = ray2_origin.repeat(1,1,n) + torch.sum(p1p2 * ray2_dir, dim=1, keepdim=True) * ray2_dir - ray1_origin.repeat(1,1,n) # [b,3,n]
        cosvalue = torch.sum(ray1_dir * verline, dim=1, keepdim=True)  / \
            ((torch.norm(ray1_dir, dim=1, keepdim=True, p=2) + 1e-12) * (torch.norm(verline, dim=1, keepdim=True, p=2) + 1e-12))# [b,1,n]

        mask = (cosvalue > 0.001).float() # we drop out angles less than 1' [b,1,n]
        flag = 0
        num = torch.min(torch.sum(mask, -1)).int()
        if num.cpu().detach().numpy() == 0:
            flag = 1
            filt_match = match[:,:,:100]
            if return_angle:
                return filt_match, flag, torch.zeros_like(mask).to(filt_match.get_device())
            else:
                return filt_match, flag
        nonzero_idx = []
        for i in range(b):
            idx = torch.nonzero(mask[i,0,:])[:num] # [num,1]
            nonzero_idx.append(idx)
        nonzero_idx = torch.stack(nonzero_idx, 0) # [b,num,1]
        filt_match = torch.gather(match.transpose(1,2), index=nonzero_idx.repeat(1,1,4), dim=1).transpose(1,2) # [b,4,num]
        if return_angle:
            return filt_match, flag, mask
        else:
            return filt_match, flag
    
    def midpoint_triangulate(self, match, K_inv, P1, P2):
        # match: [b, 4, num] P1: [b, 3, 4]
        # Match is in the image coordinates. P1, P2 is camera parameters. [B, 3, 4] match: [B, M, 4]
        b, n = match.shape[0], match.shape[2]
        RT1 = K_inv.bmm(P1) # [b, 3, 4]
        RT2 = K_inv.bmm(P2)
        ones = torch.ones([b,1,n]).to(match.get_device())
        pts1 = torch.cat([match[:,:2,:], ones], 1)
        pts2 = torch.cat([match[:,2:,:], ones], 1)
        
        ray1_dir = (RT1[:,:,:3].transpose(1,2)).bmm(K_inv).bmm(pts1)# [b,3,n]
        ray1_dir = ray1_dir / (torch.norm(ray1_dir, dim=1, keepdim=True, p=2) + 1e-12)
        ray1_origin = (-1) * RT1[:,:,:3].transpose(1,2).bmm(RT1[:,:,3].unsqueeze(-1)) # [b, 3, 1]
        ray2_dir = (RT2[:,:,:3].transpose(1,2)).bmm(K_inv).bmm(pts2) # [b,3,n]
        ray2_dir = ray2_dir / (torch.norm(ray2_dir, dim=1, keepdim=True, p=2) + 1e-12)
        ray2_origin = (-1) * RT2[:,:,:3].transpose(1,2).bmm(RT2[:,:,3].unsqueeze(-1)) # [b, 3, 1]
    
        dir_cross = torch.cross(ray1_dir, ray2_dir, dim=1) # [b,3,n]
        denom = 1.0 / (torch.sum(dir_cross * dir_cross, dim=1, keepdim=True)+1e-12) # [b,1,n]
        origin_vec = (ray2_origin - ray1_origin).repeat(1,1,n) # [b,3,n]
        a1 = origin_vec.cross(ray2_dir, dim=1) # [b,3,n]
        a1 = torch.sum(a1 * dir_cross, dim=1, keepdim=True) * denom # [b,1,n]
        a2 = origin_vec.cross(ray1_dir, dim=1) # [b,3,n]
        a2 = torch.sum(a2 * dir_cross, dim=1, keepdim=True) * denom # [b,1,n]
        p1 = ray1_origin + a1 * ray1_dir
        p2 = ray2_origin + a2 * ray2_dir
        point = (p1 + p2) / 2.0 # [b,3,n]
        # Convert to homo coord to get consistent with other functions.
        point_homo = torch.cat([point, ones], dim=1).transpose(1,2) # [b,n,4]
        return point_homo
    
    def rt_from_fundamental_mat_nyu(self, fmat, K, depth_match):
        # F: [b, 3, 3] K: [b, 3, 3] depth_match: [b ,4, n]
        #verify_match = self.rand_sample(depth_match, 5000) # [b,4,100]
        verify_match = depth_match.transpose(1,2).cpu().detach().numpy()
        K_inv = torch.inverse(K)
        b = fmat.shape[0]
        fmat_ = K.transpose(1,2).bmm(fmat)
        essential_mat = fmat_.bmm(K)
        iden = torch.cat([torch.eye(3), torch.zeros([3,1])], -1).unsqueeze(0).repeat(b,1,1).to(K.get_device()) # [b,3,4]
        P1 = K.bmm(iden)
        flags = []
        number_inliers = []
        P2 = []
        for i in range(b):
            cnum, R, t, _ = cv2.recoverPose(essential_mat[i].cpu().detach().numpy().astype('float64'), verify_match[i,:,:2].astype('float64'), \
                verify_match[i,:,2:].astype('float64'), cameraMatrix=K[i,:,:].cpu().detach().numpy().astype('float64'))
            p2 = torch.from_numpy(np.concatenate([R, t], axis=-1)).float().to(K.get_device())
            P2.append(p2)
            if cnum > depth_match.shape[-1] / 7.0:
                flags.append(1)
            else:
                flags.append(0)
            number_inliers.append(cnum)
            
        P2 = K.bmm(torch.stack(P2, axis=0))
        #pdb.set_trace()
        
        return P1, P2, flags
    
    def verifyRT(self, match, K_inv, P1, P2):
        # match: [b, 4, n] P1: [b,3,4] P2: [b,3,4]
        b, n = match.shape[0], match.shape[2]
        point3d = self.midpoint_triangulate(match, K_inv, P1, P2).reshape([-1,4]).unsqueeze(-1) # [b*n, 4, 1]
        P1_ = P1.repeat(n,1,1)
        P2_ = P2.repeat(n,1,1)
        depth1 = P1_.bmm(point3d)[:,-1,:] / point3d[:,-1,:] # [b*n, 1]
        depth2 = P2_.bmm(point3d)[:,-1,:] / point3d[:,-1,:]
        inlier_num = torch.sum((depth1.view([b,n]) > 0).float() * (depth2.view([b,n]) > 0).float(), 1) # [b]
        return inlier_num
    
    def rt_from_fundamental_mat(self, fmat, K, depth_match):
        # F: [b, 3, 3] K: [b, 3, 3] depth_match: [b ,4, n]
        verify_match = self.rand_sample(depth_match, 200) # [b,4,100]
        K_inv = torch.inverse(K)
        b = fmat.shape[0]
        fmat_ = K.transpose(1,2).bmm(fmat)
        essential_mat = fmat_.bmm(K)
        essential_mat_cpu = essential_mat.cpu()
        U, S, V = torch.svd(essential_mat_cpu)
        U, S, V = U.to(K.get_device()), S.to(K.get_device()), V.to(K.get_device())
        W = torch.from_numpy(np.array([[[0., -1., 0.],[1., 0., 0.],[0., 0., 1.]]])).float().repeat(b,1,1).to(K.get_device())
        # R = UWV^t or UW^tV^t t = U[:,2] the third column of U
        R1 = U.bmm(W).bmm(V.transpose(1,2)) # Do we need matrix determinant sign?
        R1 = torch.sign(torch.det(R1)).unsqueeze(-1).unsqueeze(-1) * R1
        R2 = U.bmm(W.transpose(1,2)).bmm(V.transpose(1,2))
        R2 = torch.sign(torch.det(R2)).unsqueeze(-1).unsqueeze(-1) * R2
        t1 = U[:,:,2].unsqueeze(-1) # The third column
        t2 = -U[:,:,2].unsqueeze(-1) # Inverse direction
        
        iden = torch.cat([torch.eye(3), torch.zeros([3,1])], -1).unsqueeze(0).repeat(b,1,1).to(K.get_device()) # [b,3,4]
        P1 = K.bmm(iden)
        P2_1 = K.bmm(torch.cat([R1, t1], -1))
        P2_2 = K.bmm(torch.cat([R2, t1], -1))
        P2_3 = K.bmm(torch.cat([R1, t2], -1))
        P2_4 = K.bmm(torch.cat([R2, t2], -1))
        P2_c = [P2_1, P2_2, P2_3, P2_4]
        flags = []
        for i in range(4):
            with torch.no_grad():
                inlier_num = self.verifyRT(verify_match, K_inv, P1, P2_c[i])
                flags.append(inlier_num)
        P2_c = torch.stack(P2_c, dim=1) # [B, 4, 3, 4]
        flags = torch.stack(flags, dim=1) # [B, 4]
        idx = torch.argmax(flags, dim=-1, keepdim=True) # [b,1]
        P2 = torch.gather(P2_c, index=idx.unsqueeze(-1).unsqueeze(-1).repeat(1,1,3,4), dim=1).squeeze(1) # [b,3,4]
        #pdb.set_trace()
        return P1, P2
    
    def reproject(self, P, point3d):
        # P: [b,3,4] point3d: [b,n,4]
        point2d = P.bmm(point3d.transpose(1,2)) # [b,4,n]
        point2d_coord = (point2d[:,:2,:] / (point2d[:,2,:].unsqueeze(1) + 1e-12)).transpose(1,2) # [b,n,2]
        point2d_depth = point2d[:,2,:].unsqueeze(1).transpose(1,2) # [b,n,1]
        return point2d_coord, point2d_depth

    def scale_adapt(self, depth1, depth2, eps=1e-12):
        with torch.no_grad():
            A = torch.sum((depth1 ** 2) / (depth2 ** 2 + eps), dim=1) # [b,1]
            C = torch.sum(depth1 / (depth2 + eps), dim=1) # [b,1]
            a = C / (A + eps)
        return a

    def affine_adapt(self, depth1, depth2, use_translation=True, eps=1e-12):
        a_scale = self.scale_adapt(depth1, depth2, eps=eps)
        if not use_translation: # only fit the scale parameter
            return a_scale, torch.zeros_like(a_scale)
        else:
            with torch.no_grad():
                A = torch.sum((depth1 ** 2) / (depth2 ** 2 + eps), dim=1) # [b,1]
                B = torch.sum(depth1 / (depth2 ** 2 + eps), dim=1) # [b,1]
                C = torch.sum(depth1 / (depth2 + eps), dim=1) # [b,1]
                D = torch.sum(1.0 / (depth2 ** 2 + eps), dim=1) # [b,1]
                E = torch.sum(1.0 / (depth2 + eps), dim=1) # [b,1]
                a = (B*E - D*C) / (B*B - A*D + 1e-12)
                b = (B*C - A*E) / (B*B - A*D + 1e-12)

                # check ill condition
                cond = (B*B - A*D)
                valid = (torch.abs(cond) > 1e-4).float()
                a = a * valid + a_scale * (1 - valid)
                b = b * valid
            return a, b

    def register_depth(self, depth_pred, coord_tri, depth_tri):
        # depth_pred: [b, 1, h, w] coord_tri: [b,n,2] depth_tri: [b,n,1]
        batch, _, h, w = depth_pred.shape[0], depth_pred.shape[1], depth_pred.shape[2], depth_pred.shape[3]
        n = depth_tri.shape[1]
        coord_tri_nor = torch.stack([2.0*coord_tri[:,:,0] / (w-1.0) - 1.0, 2.0*coord_tri[:,:,1] / (h-1.0) - 1.0], -1)
        depth_inter = F.grid_sample(depth_pred, coord_tri_nor.view([batch,n,1,2]), padding_mode='reflection').squeeze(-1).transpose(1,2) # [b,n,1]

        # Normalize
        scale = torch.median(depth_inter, 1)[0] / (torch.median(depth_tri, 1)[0] + 1e-12)
        scale = scale.detach() # [b,1]
        scale_depth_inter = depth_inter / (scale.unsqueeze(-1) + 1e-12)
        scale_depth_pred = depth_pred / (scale.unsqueeze(-1).unsqueeze(-1) + 1e-12)
        
        # affine adapt
        a, b = self.affine_adapt(scale_depth_inter, depth_tri, use_translation=False)
        affine_depth_inter = a.unsqueeze(1) * scale_depth_inter + b.unsqueeze(1) # [b,n,1]
        affine_depth_pred = a.unsqueeze(-1).unsqueeze(-1) * scale_depth_pred + b.unsqueeze(-1).unsqueeze(-1) # [b,1,h,w]
        return affine_depth_pred, affine_depth_inter
    
    def get_trian_loss(self, tri_depth, pred_tri_depth):
        # depth: [b,n,1]
        loss = torch.pow(1.0 - pred_tri_depth / (tri_depth + 1e-12), 2).mean((1,2))
        return loss
    
    def get_reproj_fdp_loss(self, pred1, pred2, P2, K, K_inv, valid_mask, rigid_mask, flow, visualizer=None):
        # pred: [b,1,h,w] Rt: [b,3,4] K: [b,3,3] mask: [b,1,h,w] flow: [b,2,h,w]
        b, h, w = pred1.shape[0], pred1.shape[2], pred1.shape[3]
        xy = self.meshgrid(h,w).unsqueeze(0).repeat(b,1,1,1).float().to(flow.get_device()) # [b,2,h,w]
        ones = torch.ones([b,1,h,w]).float().to(flow.get_device())
        pts1_3d = K_inv.bmm(torch.cat([xy, ones], 1).view([b,3,-1])) * pred1.view([b,1,-1]) # [b,3,h*w]
        pts2_coord, pts2_depth = self.reproject(P2, torch.cat([pts1_3d, ones.view([b,1,-1])], 1).transpose(1,2)) # [b,h*w, 2]
        # TODO Here some of the reprojection coordinates are invalid. (<0 or >max)
        reproj_valid_mask = (pts2_coord > torch.Tensor([0,0]).to(pred1.get_device())).all(-1, True).float() * \
            (pts2_coord < torch.Tensor([w-1,h-1]).to(pred1.get_device())).all(-1, True).float() # [b,h*w, 1]
        reproj_valid_mask = (valid_mask * reproj_valid_mask.view([b,h,w,1]).permute([0,3,1,2])).detach()
        rigid_mask = rigid_mask.detach()
        pts2_depth = pts2_depth.transpose(1,2).view([b,1,h,w])

        # Get the interpolated depth prediction2
        pts2_coord_nor = torch.cat([2.0 * pts2_coord[:,:,0].unsqueeze(-1) / (w - 1.0) - 1.0, 2.0 * pts2_coord[:,:,1].unsqueeze(-1) / (h - 1.0) - 1.0], -1)
        inter_depth2 = F.grid_sample(pred2, pts2_coord_nor.view([b, h, w, 2]), padding_mode='reflection') # [b,1,h,w]
        pj_loss_map = (torch.abs(1.0 - pts2_depth / (inter_depth2 + 1e-12)) * rigid_mask * reproj_valid_mask)
        pj_loss = pj_loss_map.mean((1,2,3)) / ((reproj_valid_mask * rigid_mask).mean((1,2,3))+1e-12)
        #pj_loss = (valid_mask * mask * torch.abs(pts2_depth - inter_depth2) / (torch.abs(pts2_depth + inter_depth2)+1e-12)).mean((1,2,3)) / ((valid_mask * mask).mean((1,2,3))+1e-12) # [b]
        flow_loss = (rigid_mask * torch.abs(flow + xy - pts2_coord.detach().permute(0,2,1).view([b,2,h,w]))).mean((1,2,3)) / (rigid_mask.mean((1,2,3)) + 1e-12)
        return pj_loss, flow_loss
    
    def disp2depth(self, disp, min_depth=0.1, max_depth=100.0):
        min_disp = 1 / max_depth
        max_disp = 1 / min_depth
        scaled_disp = min_disp + (max_disp - min_disp) * disp
        depth = 1 / scaled_disp
        return scaled_disp, depth
    
    def get_smooth_loss(self, img, disp):
        # img: [b,3,h,w] depth: [b,1,h,w]
        """Computes the smoothness loss for a disparity image
        The color image is used for edge-aware smoothness
        """
        grad_disp_x = torch.abs(disp[:, :, :, :-1] - disp[:, :, :, 1:])
        grad_disp_y = torch.abs(disp[:, :, :-1, :] - disp[:, :, 1:, :])

        grad_img_x = torch.mean(torch.abs(img[:, :, :, :-1] - img[:, :, :, 1:]), 1, keepdim=True)
        grad_img_y = torch.mean(torch.abs(img[:, :, :-1, :] - img[:, :, 1:, :]), 1, keepdim=True)

        grad_disp_x *= torch.exp(-grad_img_x)
        grad_disp_y *= torch.exp(-grad_img_y)

        return grad_disp_x.mean((1,2,3)) + grad_disp_y.mean((1,2,3))

    def infer_depth(self, img):
        disp_list = self.depth_net(img)
        disp, depth = self.disp2depth(disp_list[0])
        return disp_list[0]
    
    def infer_vo(self, img1, img2, K, K_inv, match_num=6000):
        b, img_h, img_w = img1.shape[0], img1.shape[2], img1.shape[3]
        F_final, img1_valid_mask, img1_rigid_mask, fwd_flow, fwd_match = self.model_pose.inference(img1, img2, K, K_inv)
        # infer depth
        disp1_list = self.depth_net(img1) # Nscales * [B, 1, H, W]
        disp2_list = self.depth_net(img2)
        disp1, depth1 = self.disp2depth(disp1_list[0])
        disp2, depth2 = self.disp2depth(disp2_list[0])

        img1_depth_mask = img1_rigid_mask * img1_valid_mask
        # [b, 4, match_num]
        top_ratio_match, top_ratio_mask = self.top_ratio_sample(fwd_match.view([b,4,-1]), img1_depth_mask.view([b,1,-1]), ratio=0.30) # [b, 4, ratio*h*w]
        depth_match, depth_match_num = self.robust_rand_sample(top_ratio_match, top_ratio_mask, num=match_num)
        return depth_match, depth1, depth2
    
    def check_rt(self, img1, img2, K, K_inv):
        # initialization
        b = img1.shape[0]
        flag1, flag2, flag3 = 0, 0, 0
        images = torch.cat([img1, img2], dim=2)
        inputs = [images, K.unsqueeze(1), K_inv.unsqueeze(1)]

        # Pose Network
        #self.profiler.reset()
        loss_pack, F_final, img1_valid_mask, img1_rigid_score, img1_inlier_mask, fwd_flow, fwd_match = self.model_pose(inputs, output_F=True)
        # Get masks
        img1_depth_mask = img1_rigid_score * img1_valid_mask

        # Select top score matches to triangulate depth.
        top_ratio_match, top_ratio_mask = self.top_ratio_sample(fwd_match.view([b,4,-1]), img1_depth_mask.view([b,1,-1]), ratio=0.20) # [b, 4, ratio*h*w]
        depth_match, depth_match_num = self.robust_rand_sample(top_ratio_match, top_ratio_mask, num=self.depth_match_num)

        P1, P2, flags = self.rt_from_fundamental_mat_nyu(F_final.detach(), K, depth_match)
        P1 = P1.detach()
        P2 = P2.detach()
        flags = torch.from_numpy(np.stack(flags, axis=0)).float().to(K.get_device())

        return flags

    def inference(self, img1, img2, K, K_inv):
        b, img_h, img_w = img1.shape[0], img1.shape[2], img1.shape[3]
        visualizer = Visualizer_debug('./vis/', np.transpose(255*img1.detach().cpu().numpy(), [0,2,3,1]), \
            np.transpose(255*img2.detach().cpu().numpy(), [0,2,3,1]))

        F_final, img1_valid_mask, img1_rigid_mask, fwd_flow, fwd_match = self.model_pose.inference(img1, img2, K, K_inv)
        # infer depth
        disp1_list = self.depth_net(img1) # Nscales * [B, 1, H, W]
        disp2_list = self.depth_net(img2)
        disp1, _ = self.disp2depth(disp1_list[0])
        disp2, _ = self.disp2depth(disp2_list[0])

        # Get Camera Matrix
        img1_depth_mask = img1_rigid_mask * img1_valid_mask
        # [b, 4, match_num]
        top_ratio_match, top_ratio_mask = self.top_ratio_sample(fwd_match.view([b,4,-1]), img1_depth_mask.view([b,1,-1]), ratio=0.20) # [b, 4, ratio*h*w]
        depth_match, depth_match_num = self.robust_rand_sample(top_ratio_match, top_ratio_mask, num=self.depth_match_num)
        if self.dataset == 'nyuv2':
            P1, P2, _ = self.rt_from_fundamental_mat_nyu(F_final, K, depth_match)
        else:
            P1, P2 = self.rt_from_fundamental_mat(F_final, K, depth_match)
        Rt = K_inv.bmm(P2)
        
        filt_depth_match, flag1 = self.ray_angle_filter(depth_match, P1, P2) # [b, 4, filt_num]
        point3d_1 = self.midpoint_triangulate(filt_depth_match, K_inv, P1, P2)
        point2d_1_coord, point2d_1_depth = self.reproject(P1, point3d_1) # [b,n,2], [b,n,1]
        point2d_2_coord, point2d_2_depth = self.reproject(P2, point3d_1)
        
        # Filter out some invalid triangulation results to stablize training.
        point2d_1_depth, point2d_2_depth, point2d_1_coord, point2d_2_coord, flag2 = self.filt_negative_depth(point2d_1_depth, \
                point2d_2_depth, point2d_1_coord, point2d_2_coord)
        point2d_1_depth, point2d_2_depth, point2d_1_coord, point2d_2_coord, flag3 = self.filt_invalid_coord(point2d_1_depth, \
                point2d_2_depth, point2d_1_coord, point2d_2_coord, max_h=img_h, max_w=img_w)
        
        return fwd_flow, disp1, disp2, Rt, point2d_1_coord, point2d_1_depth
    

    def forward(self, inputs):
        # initialization
        images, K_ms, K_inv_ms = inputs
        K, K_inv = K_ms[:,0,:,:], K_inv_ms[:,0,:,:]
        assert (images.shape[1] == 3)
        img_h, img_w = int(images.shape[2] / 2), images.shape[3] 
        img1, img2 = images[:,:,:img_h,:], images[:,:,img_h:,:]
        b = img1.shape[0]
        flag1, flag2, flag3 = 0, 0, 0
        visualizer = Visualizer_debug('./vis/', img1=255*img1.permute([0,2,3,1]).detach().cpu().numpy(), \
            img2=255*img2.permute([0,2,3,1]).detach().cpu().numpy())
        
        # Pose Network
        loss_pack, F_final, img1_valid_mask, img1_rigid_mask, fwd_flow, fwd_match = self.model_pose(inputs, output_F=True, visualizer=visualizer)
        # infer depth
        disp1_list = self.depth_net(img1) # Nscales * [B, 1, H, W]
        disp2_list = self.depth_net(img2)

        # Get masks
        img1_depth_mask = img1_rigid_mask * img1_valid_mask
        
        # Select top score matches to triangulate depth.
        top_ratio_match, top_ratio_mask = self.top_ratio_sample(fwd_match.view([b,4,-1]), img1_depth_mask.view([b,1,-1]), ratio=self.depth_sample_ratio) # [b, 4, ratio*h*w]
        depth_match, depth_match_num = self.robust_rand_sample(top_ratio_match, top_ratio_mask, num=self.depth_match_num)
        
        if self.dataset == 'nyuv2':
            P1, P2, flags = self.rt_from_fundamental_mat_nyu(F_final.detach(), K, depth_match)
            flags = torch.from_numpy(np.stack(flags, axis=0)).float().to(K.get_device())
        else:
            P1, P2 = self.rt_from_fundamental_mat(F_final.detach(), K, depth_match)
        P1 = P1.detach()
        P2 = P2.detach()

        # Get triangulated points
        filt_depth_match, flag1 = self.ray_angle_filter(depth_match, P1, P2, return_angle=False) # [b, 4, filt_num]
        
        point3d_1 = self.midpoint_triangulate(filt_depth_match, K_inv, P1, P2)
        point2d_1_coord, point2d_1_depth = self.reproject(P1, point3d_1) # [b,n,2], [b,n,1]
        point2d_2_coord, point2d_2_depth = self.reproject(P2, point3d_1)
        
        # Filter out some invalid triangulation results to stablize training.
        point2d_1_depth, point2d_2_depth, point2d_1_coord, point2d_2_coord, flag2 = self.filt_negative_depth(point2d_1_depth, \
                point2d_2_depth, point2d_1_coord, point2d_2_coord)
        point2d_1_depth, point2d_2_depth, point2d_1_coord, point2d_2_coord, flag3 = self.filt_invalid_coord(point2d_1_depth, \
                point2d_2_depth, point2d_1_coord, point2d_2_coord, max_h=img_h, max_w=img_w)

        
        if flag1 + flag2 + flag3 > 0:
            loss_pack['pt_depth_loss'] = torch.zeros([2]).to(point3d_1.get_device()).requires_grad_()
            loss_pack['pj_depth_loss'] = torch.zeros([2]).to(point3d_1.get_device()).requires_grad_()
            loss_pack['flow_error'] = torch.zeros([2]).to(point3d_1.get_device()).requires_grad_()
            loss_pack['depth_smooth_loss'] = torch.zeros([2]).to(point3d_1.get_device()).requires_grad_()
            return loss_pack

        pt_depth_loss = 0
        pj_depth_loss = 0
        flow_error = 0
        depth_smooth_loss = 0
        for s in range(self.depth_scale):
            disp_pred1 = F.interpolate(disp1_list[s], size=(img_h, img_w), mode='bilinear') # [b,1,h,w]
            disp_pred2 = F.interpolate(disp2_list[s], size=(img_h, img_w), mode='bilinear')
            scaled_disp1, depth_pred1 = self.disp2depth(disp_pred1)
            scaled_disp2, depth_pred2 = self.disp2depth(disp_pred2)
            # Rescale predicted depth according to triangulated depth
            # [b,1,h,w], [b,n,1]
            rescaled_pred1, inter_pred1 = self.register_depth(depth_pred1, point2d_1_coord, point2d_1_depth)
            rescaled_pred2, inter_pred2 = self.register_depth(depth_pred2, point2d_2_coord, point2d_2_depth)
            # Get Losses
            
            pt_depth_loss += self.get_trian_loss(point2d_1_depth, inter_pred1) + self.get_trian_loss(point2d_2_depth, inter_pred2)
            pj_depth, flow_loss = self.get_reproj_fdp_loss(rescaled_pred1, rescaled_pred2, P2, K, K_inv, img1_valid_mask, img1_rigid_mask, fwd_flow, visualizer=visualizer)
            depth_smooth_loss += self.get_smooth_loss(img1, disp_pred1 / (disp_pred1.mean((2,3), True) + 1e-12)) + \
                self.get_smooth_loss(img2, disp_pred2 / (disp_pred2.mean((2,3), True) + 1e-12))
            pj_depth_loss += pj_depth
            flow_error += flow_loss

        if self.dataset == 'nyuv2':
            loss_pack['pt_depth_loss'] = pt_depth_loss * flags
            loss_pack['pj_depth_loss'], loss_pack['flow_error'] = pj_depth_loss * flags, flow_error * flags
            loss_pack['depth_smooth_loss'] = depth_smooth_loss * flags
        else:
            loss_pack['pt_depth_loss'] = pt_depth_loss
            loss_pack['pj_depth_loss'], loss_pack['flow_error'] = pj_depth_loss, flow_error
            loss_pack['depth_smooth_loss'] = depth_smooth_loss
        return loss_pack




================================================
FILE: core/networks/model_flow.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from structures import *
from pytorch_ssim import SSIM
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import pdb
import cv2

def transformerFwd(U,
                   flo,
                   out_size,
                   name='SpatialTransformerFwd'):
    """Forward Warping Layer described in 
    'Occlusion Aware Unsupervised Learning of Optical Flow by Yang Wang et al'

    Parameters
    ----------
    U : float
        The output of a convolutional net should have the
        shape [num_batch, height, width, num_channels].
    flo: float
        The optical flow used for forward warping 
        having the shape of [num_batch, height, width, 2].
    backprop: boolean
        Indicates whether to back-propagate through forward warping layer
    out_size: tuple of two ints
        The size of the output of the network (height, width)
    """

    def _repeat(x, n_repeats):
        rep = torch.ones(size=[n_repeats], dtype=torch.long).unsqueeze(1).transpose(1,0)
        x = x.view([-1,1]).mm(rep)
        return x.view([-1]).int()

    def _interpolate(im, x, y, out_size):
        # constants
        num_batch, height, width, channels = im.shape[0], im.shape[1], im.shape[2], im.shape[3]
        out_height = out_size[0]
        out_width = out_size[1]
        max_y = int(height - 1)
        max_x = int(width - 1)

        # scale indices from [-1, 1] to [0, width/height]
        x = (x + 1.0) * (width - 1.0) / 2.0
        y = (y + 1.0) * (height - 1.0) / 2.0

        # do sampling
        x0 = (torch.floor(x)).int()
        x1 = x0 + 1
        y0 = (torch.floor(y)).int()
        y1 = y0 + 1

        x0_c = torch.clamp(x0, 0, max_x)
        x1_c = torch.clamp(x1, 0, max_x)
        y0_c = torch.clamp(y0, 0, max_y)
        y1_c = torch.clamp(y1, 0, max_y)

        dim2 = width
        dim1 = width * height
        base = _repeat(torch.arange(0, num_batch) * dim1, out_height * out_width).to(im.get_device())

        base_y0 = base + y0_c * dim2
        base_y1 = base + y1_c * dim2
        idx_a = base_y0 + x0_c
        idx_b = base_y1 + x0_c
        idx_c = base_y0 + x1_c
        idx_d = base_y1 + x1_c

        # use indices to lookup pixels in the flat image and restore
        # channels dim
        im_flat = im.view([-1, channels])
        im_flat = im_flat.float()

        # and finally calculate interpolated values
        x0_f = x0.float()
        x1_f = x1.float()
        y0_f = y0.float()
        y1_f = y1.float()
        wa = ((x1_f - x) * (y1_f - y)).unsqueeze(1)
        wb = ((x1_f - x) * (y - y0_f)).unsqueeze(1)
        wc = ((x - x0_f) * (y1_f - y)).unsqueeze(1)
        wd = ((x - x0_f) * (y - y0_f)).unsqueeze(1)

        zerof = torch.zeros_like(wa)
        wa = torch.where(
            (torch.eq(x0_c, x0) & torch.eq(y0_c, y0)).unsqueeze(1), wa, zerof)
        wb = torch.where(
            (torch.eq(x0_c, x0) & torch.eq(y1_c, y1)).unsqueeze(1), wb, zerof)
        wc = torch.where(
            (torch.eq(x1_c, x1) & torch.eq(y0_c, y0)).unsqueeze(1), wc, zerof)
        wd = torch.where(
            (torch.eq(x1_c, x1) & torch.eq(y1_c, y1)).unsqueeze(1), wd, zerof)

        zeros = torch.zeros(
            size=[
                int(num_batch) * int(height) *
                int(width), int(channels)
            ],
            dtype=torch.float)
        output = zeros.to(im.get_device())
        output = output.scatter_add(dim=0, index=idx_a.long().unsqueeze(1).repeat(1,channels), src=im_flat * wa)
        output = output.scatter_add(dim=0, index=idx_b.long().unsqueeze(1).repeat(1,channels), src=im_flat * wb)
        output = output.scatter_add(dim=0, index=idx_c.long().unsqueeze(1).repeat(1,channels), src=im_flat * wc)
        output = output.scatter_add(dim=0, index=idx_d.long().unsqueeze(1).repeat(1,channels), src=im_flat * wd)

        return output

    def _meshgrid(height, width):
        # This should be equivalent to:
        x_t, y_t = np.meshgrid(np.linspace(-1, 1, width),
                                 np.linspace(-1, 1, height))
        #  ones = np.ones(np.prod(x_t.shape))
        #  grid = np.vstack([x_t.flatten(), y_t.flatten(), ones])
        return torch.from_numpy(x_t).float(), torch.from_numpy(y_t).float()

    def _transform(flo, input_dim, out_size):
        num_batch, height, width, num_channels = input_dim.shape[0:4]

        # grid of (x_t, y_t, 1), eq (1) in ref [1]
        height_f = float(height)
        width_f = float(width)
        out_height = out_size[0]
        out_width = out_size[1]
        x_s, y_s = _meshgrid(out_height, out_width)
        x_s = x_s.to(flo.get_device()).unsqueeze(0)
        x_s = x_s.repeat([num_batch, 1, 1])

        y_s = y_s.to(flo.get_device()).unsqueeze(0)
        y_s =y_s.repeat([num_batch, 1, 1])

        x_t = x_s + flo[:, :, :, 0] / ((out_width - 1.0) / 2.0)
        y_t = y_s + flo[:, :, :, 1] / ((out_height - 1.0) / 2.0)

        x_t_flat = x_t.view([-1])
        y_t_flat = y_t.view([-1])

        input_transformed = _interpolate(input_dim, x_t_flat, y_t_flat,
                                            out_size)

        output = input_transformed.view([num_batch, out_height, out_width, num_channels])
        return output

    #out_size = int(out_size)
    output = _transform(flo, U, out_size)
    return output


class Model_flow(nn.Module):
    def __init__(self, cfg):
        super(Model_flow, self).__init__()
        self.fpyramid = FeaturePyramid()
        self.pwc_model = PWC_tf()
        if cfg.mode == 'depth' or cfg.mode == 'flowposenet':
            # Stage 2 training
            for param in self.fpyramid.parameters():
                param.requires_grad = False
            for param in self.pwc_model.parameters():
                param.requires_grad = False
        
        # hyperparameters
        self.dataset = cfg.dataset
        self.num_scales = cfg.num_scales
        self.flow_consist_alpha = cfg.h_flow_consist_alpha
        self.flow_consist_beta = cfg.h_flow_consist_beta

    def get_occlusion_mask_from_flow(self, tensor_size, flow):
        mask = torch.ones(tensor_size).to(flow.get_device())
        h, w = mask.shape[2], mask.shape[3]
        occ_mask = transformerFwd(mask.permute(0,2,3,1), flow.permute(0,2,3,1), out_size=[h,w]).permute(0,3,1,2)
        with torch.no_grad():
            occ_mask = torch.clamp(occ_mask, 0.0, 1.0)
        return occ_mask

    def get_flow_norm(self, flow, p=2):
        '''
        Inputs:
        flow (bs, 2, H, W)
        '''
        flow_norm = torch.norm(flow, p=p, dim=1).unsqueeze(1) + 1e-12
        return flow_norm

    def get_visible_masks(self, optical_flows, optical_flows_rev):
        # get occlusion masks
        batch_size, _, img_h, img_w = optical_flows[0].shape
        img2_visible_masks, img1_visible_masks = [], []
        for s, (optical_flow, optical_flow_rev) in enumerate(zip(optical_flows, optical_flows_rev)):
            shape = [batch_size, 1, int(img_h / (2**s)), int(img_w / (2**s))]
            img2_visible_masks.append(self.get_occlusion_mask_from_flow(shape, optical_flow))
            img1_visible_masks.append(self.get_occlusion_mask_from_flow(shape, optical_flow_rev))
        return img2_visible_masks, img1_visible_masks

    def get_consistent_masks(self, optical_flows, optical_flows_rev):
        # get consist masks
        batch_size, _, img_h, img_w = optical_flows[0].shape
        img2_consis_masks, img1_consis_masks, fwd_flow_diff_pyramid, bwd_flow_diff_pyramid = [], [], [], []
        for s, (optical_flow, optical_flow_rev) in enumerate(zip(optical_flows, optical_flows_rev)):
            
            bwd2fwd_flow = warp_flow(optical_flow_rev, optical_flow)
            fwd2bwd_flow = warp_flow(optical_flow, optical_flow_rev)

            fwd_flow_diff = torch.abs(bwd2fwd_flow + optical_flow)
            fwd_flow_diff_pyramid.append(fwd_flow_diff)
            bwd_flow_diff = torch.abs(fwd2bwd_flow + optical_flow_rev)
            bwd_flow_diff_pyramid.append(bwd_flow_diff)

            # flow consistency condition
            bwd_consist_bound = torch.max(self.flow_consist_beta * self.get_flow_norm(optical_flow_rev), torch.from_numpy(np.array([self.flow_consist_alpha])).float().to(optical_flow_rev.get_device()))
            fwd_consist_bound = torch.max(self.flow_consist_beta * self.get_flow_norm(optical_flow), torch.from_numpy(np.array([self.flow_consist_alpha])).float().to(optical_flow.get_device()))
            with torch.no_grad():
                noc_masks_img2 = (self.get_flow_norm(bwd_flow_diff) < bwd_consist_bound).float()
                noc_masks_img1 = (self.get_flow_norm(fwd_flow_diff) < fwd_consist_bound).float()
                img2_consis_masks.append(noc_masks_img2)
                img1_consis_masks.append(noc_masks_img1)
        return img2_consis_masks, img1_consis_masks, fwd_flow_diff_pyramid, bwd_flow_diff_pyramid

    def generate_img_pyramid(self, img, num_pyramid):
        img_h, img_w = img.shape[2], img.shape[3]
        img_pyramid = []
        for s in range(num_pyramid):
            img_new = F.adaptive_avg_pool2d(img, [int(img_h / (2**s)), int(img_w / (2**s))]).data
            img_pyramid.append(img_new)
        return img_pyramid

    def warp_flow_pyramid(self, img_pyramid, flow_pyramid):
        img_warped_pyramid = []
        for img, flow in zip(img_pyramid, flow_pyramid):
            img_warped_pyramid.append(warp_flow(img, flow))
        return img_warped_pyramid

    def compute_loss_pixel(self, img_pyramid, img_warped_pyramid, occ_mask_list):
        loss_list = []
        for scale in range(self.num_scales):
            img, img_warped, occ_mask = img_pyramid[scale], img_warped_pyramid[scale], occ_mask_list[scale]
            divider = occ_mask.mean((1,2,3))
            img_diff = torch.abs((img - img_warped)) * occ_mask.repeat(1,3,1,1)
            loss_pixel = img_diff.mean((1,2,3)) / (divider + 1e-12) # (B)
            loss_list.append(loss_pixel[:,None])
        loss = torch.cat(loss_list, 1).sum(1) # (B)
        return loss

    def compute_loss_ssim(self, img_pyramid, img_warped_pyramid, occ_mask_list):
        loss_list = []
        for scale in range(self.num_scales):
            img, img_warped, occ_mask = img_pyramid[scale], img_warped_pyramid[scale], occ_mask_list[scale]
            divider = occ_mask.mean((1,2,3))
            occ_mask_pad = occ_mask.repeat(1,3,1,1)
            ssim = SSIM(img * occ_mask_pad, img_warped * occ_mask_pad)
            loss_ssim = torch.clamp((1.0 - ssim) / 2.0, 0, 1).mean((1,2,3))
            loss_ssim = loss_ssim / (divider + 1e-12)
            loss_list.append(loss_ssim[:,None])
        loss = torch.cat(loss_list, 1).sum(1)
        return loss

    def gradients(self, img):
        dy = img[:,:,1:,:] - img[:,:,:-1,:]
        dx = img[:,:,:,1:] - img[:,:,:,:-1]
        return dx, dy

    def cal_grad2_error(self, flow, img):
        img_grad_x, img_grad_y = self.gradients(img)
        w_x = torch.exp(-10.0 * torch.abs(img_grad_x).mean(1).unsqueeze(1))
        w_y = torch.exp(-10.0 * torch.abs(img_grad_y).mean(1).unsqueeze(1))

        dx, dy = self.gradients(flow)
        dx2, _ = self.gradients(dx)
        _, dy2 = self.gradients(dy)
        error = (w_x[:,:,:,1:] * torch.abs(dx2)).mean((1,2,3)) + (w_y[:,:,1:,:] * torch.abs(dy2)).mean((1,2,3))
        return error / 2.0

    def compute_loss_flow_smooth(self, optical_flows, img_pyramid):
        loss_list = []
        for scale in range(self.num_scales):
            flow, img = optical_flows[scale], img_pyramid[scale]
            error = self.cal_grad2_error(flow/20.0, img)
            loss_list.append(error[:,None])
        loss = torch.cat(loss_list, 1).sum(1)
        return loss

    def compute_loss_flow_consis(self, fwd_flow_diff_pyramid, occ_mask_list):
        loss_list = []
        for scale in range(self.num_scales):
            fwd_flow_diff, occ_mask = fwd_flow_diff_pyramid[scale], occ_mask_list[scale]
            divider = occ_mask.mean((1,2,3))
            loss_consis = (fwd_flow_diff * occ_mask).mean((1,2,3))
            loss_consis = loss_consis / (divider + 1e-12)
            loss_list.append(loss_consis[:,None])
        loss = torch.cat(loss_list, 1).sum(1)
        return loss

    def inference_flow(self, img1, img2):
        img_hw = [img1.shape[2], img1.shape[3]]
        feature_list_1, feature_list_2 = self.fpyramid(img1), self.fpyramid(img2)
        optical_flow = self.pwc_model(feature_list_1, feature_list_2, img_hw)[0]
        return optical_flow
    
    def inference_corres(self, img1, img2):
        batch_size, img_h, img_w = img1.shape[0], img1.shape[2], img1.shape[3]
        
        # get the optical flows and reverse optical flows for each pair of adjacent images
        feature_list_1, feature_list_2 = self.fpyramid(img1), self.fpyramid(img2)
        optical_flows = self.pwc_model(feature_list_1, feature_list_2, [img_h, img_w])
        optical_flows_rev = self.pwc_model(feature_list_2, feature_list_1, [img_h, img_w])

        # get occlusion masks
        img2_visible_masks, img1_visible_masks = self.get_visible_masks(optical_flows, optical_flows_rev)
        # get consistent masks
        img2_consis_masks, img1_consis_masks, fwd_flow_diff_pyramid, bwd_flow_diff_pyramid = self.get_consistent_masks(optical_flows, optical_flows_rev)
        # get final valid masks 
        img2_valid_masks, img1_valid_masks = [], []
        for i, (img2_visible_mask, img1_visible_mask, img2_consis_mask, img1_consis_mask) in enumerate(zip(img2_visible_masks, img1_visible_masks, img2_consis_masks, img1_consis_masks)):
            img2_valid_masks.append(img2_visible_mask * img2_consis_mask)
            img1_valid_masks.append(img1_visible_mask * img1_consis_mask)
        
        return optical_flows[0], optical_flows_rev[0], img1_valid_masks[0], img2_valid_masks[0], fwd_flow_diff_pyramid[0], bwd_flow_diff_pyramid[0]

    def forward(self, inputs, output_flow=False, use_flow_loss=True):
        images, K_ms, K_inv_ms = inputs
        assert (images.shape[1] == 3)
        img_h, img_w = int(images.shape[2] / 2), images.shape[3] 
        img1, img2 = images[:,:,:img_h,:], images[:,:,img_h:,:]
        batch_size = img1.shape[0]

        #cv2.imwrite('./test1.png', np.transpose(255*img1[0].cpu().detach().numpy(), [1,2,0]).astype(np.uint8))
        #cv2.imwrite('./test2.png', np.transpose(255*img2[0].cpu().detach().numpy(), [1,2,0]).astype(np.uint8))
        #pdb.set_trace()
        # get the optical flows and reverse optical flows for each pair of adjacent images
        feature_list_1, feature_list_2 = self.fpyramid(img1), self.fpyramid(img2)
        optical_flows = self.pwc_model(feature_list_1, feature_list_2, [img_h, img_w])
        optical_flows_rev = self.pwc_model(feature_list_2, feature_list_1, [img_h, img_w])
        
        # get occlusion masks
        img2_visible_masks, img1_visible_masks = self.get_visible_masks(optical_flows, optical_flows_rev)
        # get consistent masks
        img2_consis_masks, img1_consis_masks, fwd_flow_diff_pyramid, bwd_flow_diff_pyramid = self.get_consistent_masks(optical_flows, optical_flows_rev)
        # get final valid masks 
        img2_valid_masks, img1_valid_masks = [], []
        
        for i, (img2_visible_mask, img1_visible_mask, img2_consis_mask, img1_consis_mask) in enumerate(zip(img2_visible_masks, img1_visible_masks, img2_consis_masks, img1_consis_masks)):
            if self.dataset == 'nyuv2':
                img2_valid_masks.append(img2_visible_mask)
                img1_valid_masks.append(img1_visible_mask)
            else:
                img2_valid_masks.append(img2_visible_mask * img2_consis_mask)
                img1_valid_masks.append(img1_visible_mask * img1_consis_mask)
        
        loss_pack = {}
        if not use_flow_loss:
            loss_pack['loss_pixel'] = torch.zeros([2]).to(img1.get_device()).requires_grad_()
            loss_pack['loss_ssim'] = torch.zeros([2]).to(img1.get_device()).requires_grad_()
            loss_pack['loss_flow_smooth'] = torch.zeros([2]).to(img1.get_device()).requires_grad_()
            loss_pack['loss_flow_consis'] = torch.zeros([2]).to(img1.get_device()).requires_grad_()
            return loss_pack, optical_flows[0], optical_flows_rev[0], img1_valid_masks[0], img2_valid_masks[0], fwd_flow_diff_pyramid[0], bwd_flow_diff_pyramid[0]

        # warp images
        img1_pyramid = self.generate_img_pyramid(img1, len(optical_flows_rev))
        img2_pyramid = self.generate_img_pyramid(img2, len(optical_flows))

        img1_warped_pyramid = self.warp_flow_pyramid(img2_pyramid, optical_flows)
        img2_warped_pyramid = self.warp_flow_pyramid(img1_pyramid, optical_flows_rev)
        
        # compute loss
        loss_pack['loss_pixel'] = self.compute_loss_pixel(img1_pyramid, img1_warped_pyramid, img1_valid_masks) + \
                     self.compute_loss_pixel(img2_pyramid, img2_warped_pyramid, img2_valid_masks)
        loss_pack['loss_ssim'] = self.compute_loss_ssim(img1_pyramid, img1_warped_pyramid, img1_valid_masks) + \
                    self.compute_loss_ssim(img2_pyramid, img2_warped_pyramid, img2_valid_masks)
        loss_pack['loss_flow_smooth'] = self.compute_loss_flow_smooth(optical_flows, img1_pyramid) + \
                           self.compute_loss_flow_smooth(optical_flows_rev, img2_pyramid)
        
        #loss_pack['loss_flow_consis'] = self.compute_loss_flow_consis(fwd_flow_diff_pyramid, img1_valid_masks) + \
        #                   self.compute_loss_flow_consis(bwd_flow_diff_pyramid, img2_valid_masks)
        loss_pack['loss_flow_consis'] = torch.zeros([2]).to(img1.get_device()).requires_grad_()
        if output_flow:
            return loss_pack, optical_flows[0], optical_flows_rev[0], img1_valid_masks[0], img2_valid_masks[0], fwd_flow_diff_pyramid[0], bwd_flow_diff_pyramid[0]
        else:
            return loss_pack



================================================
FILE: core/networks/model_flowposenet.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from structures import *
from pytorch_ssim import SSIM
from model_flow import Model_flow
sys.path.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', 'visualize'))
from visualizer import *
from profiler import Profiler
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import pdb
import cv2

def mean_on_mask(diff, valid_mask):
    mask = valid_mask.expand_as(diff)
    mean_value = (diff * mask).sum() / mask.sum()
    return mean_value


def edge_aware_smoothness_loss(pred_disp, img, max_scales):
    def gradient_x(img):
        gx = img[:, :, :-1, :] - img[:, :, 1:, :]
        return gx

    def gradient_y(img):
        gy = img[:, :, :, :-1] - img[:, :, :, 1:]
        return gy

    def get_edge_smoothness(img, pred):
        pred_gradients_x = gradient_x(pred)
        pred_gradients_y = gradient_y(pred)

        image_gradients_x = gradient_x(img)
        image_gradients_y = gradient_y(img)

        weights_x = torch.exp(-torch.mean(torch.abs(image_gradients_x),
                                          1, keepdim=True))
        weights_y = torch.exp(-torch.mean(torch.abs(image_gradients_y),
                                          1, keepdim=True))

        smoothness_x = torch.abs(pred_gradients_x) * weights_x
        smoothness_y = torch.abs(pred_gradients_y) * weights_y
        return torch.mean(smoothness_x) + torch.mean(smoothness_y)

    loss = 0
    weight = 1.

    s = 0
    for scaled_disp in pred_disp:
        s += 1
        if s > max_scales:
            break

        b, _, h, w = scaled_disp.size()
        scaled_img = nn.functional.adaptive_avg_pool2d(img, (h, w))
        loss += get_edge_smoothness(scaled_img, scaled_disp) * weight
        weight /= 4.0

    return loss


def compute_smooth_loss(tgt_depth, tgt_img, ref_depth, ref_img, max_scales=1):
    loss = edge_aware_smoothness_loss(tgt_depth, tgt_img, max_scales)
    loss = edge_aware_smoothness_loss(ref_depth, ref_img, max_scales)

    return loss


class Model_flowposenet(nn.Module):
    def __init__(self, cfg):
        super(Model_flowposenet, self).__init__()
        assert cfg.depth_scale == 1
        self.pose_net = FlowPoseNet()
        self.model_flow = Model_flow(cfg)
        self.depth_net = Depth_Model(cfg.depth_scale)
    
    def compute_pairwise_loss(self, tgt_img, ref_img, tgt_depth, ref_depth, pose, intrinsic):
        ref_img_warped, valid_mask, projected_depth, computed_depth = inverse_warp2(ref_img, tgt_depth, ref_depth,
                                                                                    pose, intrinsic, 'zeros')

        diff_img = (tgt_img - ref_img_warped).abs()

        diff_depth = ((computed_depth - projected_depth).abs() /
                    (computed_depth + projected_depth).abs()).clamp(0, 1)

        
        ssim_map = (0.5*(1 - SSIM(tgt_img, ref_img_warped))).clamp(0, 1)
        diff_img = (0.15 * diff_img + 0.85 * ssim_map)

        # Modified in 01.19.2020
        #weight_mask = (1 - diff_depth)
        #diff_img = diff_img * weight_mask
        

        # compute loss
        reconstruction_loss = diff_img.mean()
        geometry_consistency_loss = diff_depth.mean()
        #reconstruction_loss = mean_on_mask(diff_img, valid_mask)
        #geometry_consistency_loss = mean_on_mask(diff_depth, valid_mask)

        return reconstruction_loss, geometry_consistency_loss

    
    def disp2depth(self, disp, min_depth=0.01, max_depth=80.0):
        min_disp = 1 / max_depth
        max_disp = 1 / min_depth
        scaled_disp = min_disp + (max_disp - min_disp) * disp
        depth = 1 / scaled_disp
        return scaled_disp, depth

    def infer_depth(self, img):
        b, img_h, img_w = img.shape[0], img.shape[2], img.shape[3]
        disp_list = self.depth_net(img)
        disp, depth = self.disp2depth(disp_list[0])
        return disp_list[0]
    
    def inference(self, img1, img2, K, K_inv):
        flow = self.model_flow.inference_flow(img1, img2)
        return flow, None, None, None, None, None
    
    def inference_flow(self, img1, img2):
        flow = self.model_flow.inference_flow(img1, img2)
        return flow
    
    def infer_pose(self, img1, img2, K, K_inv):
        img_h, img_w = img1.shape[2], img1.shape[3]
        flow = self.model_flow.inference_flow(img1, img2)
        flow[:,0,:,:] /= img_w
        flow[:,1,:,:] /= img_h
        pose = self.pose_net(flow)
        return pose

    def forward(self, inputs):
        # initialization
        images, K_ms, K_inv_ms = inputs
        K, K_inv = K_ms[:,0,:,:], K_inv_ms[:,0,:,:]
        assert (images.shape[1] == 3)
        img_h, img_w = int(images.shape[2] / 2), images.shape[3] 
        img1, img2 = images[:,:,:img_h,:], images[:,:,img_h:,:]
        b = img1.shape[0]
        visualizer = Visualizer_debug('./vis/', img1=255*img1.permute([0,2,3,1]).detach().cpu().numpy(), \
            img2=255*img2.permute([0,2,3,1]).detach().cpu().numpy())
        
        # Flow Network
        loss_pack, fwd_flow, bwd_flow, img1_valid_mask, img2_valid_mask, img1_flow_diff_mask, img2_flow_diff_mask = self.model_flow(inputs, output_flow=True, use_flow_loss=False)
        fwd_flow[:,0,:,:] /= img_w
        fwd_flow[:,1,:,:] /= img_h
        bwd_flow[:,0,:,:] /= img_w
        bwd_flow[:,1,:,:] /= img_h

        # Pose Network
        pose = self.pose_net(fwd_flow)
        pose_inv = self.pose_net(bwd_flow)
        disp1_list = self.depth_net(img1) # Nscales * [B, 1, H, W]
        disp2_list = self.depth_net(img2)
        disp1, depth1 = self.disp2depth(disp1_list[0])
        disp2, depth2 = self.disp2depth(disp2_list[0])
        #pdb.set_trace()

        loss_1, loss_3 = self.compute_pairwise_loss(img1, img2, depth1, depth2, pose, K)
        loss_1_2, loss_3_2 = self.compute_pairwise_loss(img2, img1, depth2, depth1, pose_inv, K)
        loss_ph = loss_1 + loss_1_2
        loss_pj = loss_3 + loss_3_2

        loss_2 = compute_smooth_loss([depth1], img1, [depth2], img2)

        loss_pack['pt_depth_loss'] = torch.zeros([2]).to(loss_2.get_device()).requires_grad_()
        loss_pack['pj_depth_loss'], loss_pack['flow_error'] = loss_pj, loss_ph
        loss_pack['depth_smooth_loss'] = loss_2
        #loss_pack['depth_smooth_loss'] = torch.zeros([2]).to(loss_2.get_device()).requires_grad_()
        loss_pack['geo_loss'] = torch.zeros([2]).to(loss_2.get_device()).requires_grad_()
        return loss_pack




================================================
FILE: core/networks/model_triangulate_pose.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
import torch
import torch.nn as nn
import numpy as np
from structures import *
from model_flow import Model_flow
import pdb
import cv2

class Model_triangulate_pose(nn.Module):
    def __init__(self, cfg):
        super(Model_triangulate_pose, self).__init__()
        self.model_flow = Model_flow(cfg)
        self.mode = cfg.mode
        if cfg.dataset == 'nyuv2':
            self.inlier_thres = 0.1
            self.rigid_thres = 1.0
        else:
            self.inlier_thres = 0.1
            self.rigid_thres = 0.5
        self.filter = reduced_ransac(check_num=cfg.ransac_points, thres=self.inlier_thres, dataset=cfg.dataset)
    
    def meshgrid(self, h, w):
        xx, yy = np.meshgrid(np.arange(0,w), np.arange(0,h))
        meshgrid = np.transpose(np.stack([xx,yy], axis=-1), [2,0,1]) # [2,h,w]
        meshgrid = torch.from_numpy(meshgrid)
        return meshgrid
    
    def compute_epipolar_loss(self, fmat, match, mask):
        # fmat: [b, 3, 3] match: [b, 4, h*w] mask: [b,1,h*w]
        num_batch = match.shape[0]
        match_num = match.shape[-1]

        points1 = match[:,:2,:]
        points2 = match[:,2:,:]
        ones = torch.ones(num_batch, 1, match_num).to(points1.get_device())
        points1 = torch.cat([points1, ones], 1) # [b,3,n]
        points2 = torch.cat([points2, ones], 1).transpose(1,2) # [b,n,3]

        # compute fundamental matrix loss
        fmat = fmat.unsqueeze(1)
        fmat_tiles = fmat.view([-1,3,3])
        epi_lines = fmat_tiles.bmm(points1) #[b,3,n]  [b*n, 3, 1]
        dist_p2l = torch.abs((epi_lines.permute([0, 2, 1]) * points2).sum(-1, keepdim=True)) # [b,n,1]
        a = epi_lines[:,0,:].unsqueeze(1).transpose(1,2) # [b,n,1]
        b = epi_lines[:,1,:].unsqueeze(1).transpose(1,2) # [b,n,1]
        dist_div = torch.sqrt(a*a + b*b) + 1e-6
        dist_map = dist_p2l / dist_div # [B, n, 1]
        loss = (dist_map * mask.transpose(1,2)).mean([1,2]) / mask.mean([1,2])
        return loss, dist_map

    def get_rigid_mask(self, dist_map):
        rigid_mask = (dist_map < self.rigid_thres).float()
        inlier_mask = (dist_map < self.inlier_thres).float()
        rigid_score = rigid_mask * 1.0 / (1.0 + dist_map)
        return rigid_mask, inlier_mask, rigid_score
    
    def inference(self, img1, img2, K, K_inv):
        batch_size, img_h, img_w = img1.shape[0], img1.shape[2], img1.shape[3]
        
        fwd_flow, bwd_flow, img1_valid_mask, img2_valid_mask, img1_flow_diff_mask, img2_flow_diff_mask = self.model_flow.inference_corres(img1, img2)
        
        grid = self.meshgrid(img_h, img_w).float().to(img1.get_device()).unsqueeze(0).repeat(batch_size,1,1,1) #[b,2,h,w]
        corres = torch.cat([(grid[:,0,:,:] + fwd_flow[:,0,:,:]).clamp(0,img_w-1.0).unsqueeze(1), \
            (grid[:,1,:,:] + fwd_flow[:,1,:,:]).clamp(0,img_h-1.0).unsqueeze(1)], 1)
        match = torch.cat([grid, corres], 1) # [b,4,h,w]

        img1_score_mask = img1_valid_mask * 1.0 / (0.1 + img1_flow_diff_mask.mean(1).unsqueeze(1))
        F_final = self.filter(match, img1_score_mask)
        geo_loss, rigid_mask = self.compute_epipolar_loss(F_final, match.view([batch_size,4,-1]), img1_valid_mask.view([batch_size,1,-1]))
        img1_rigid_mask = (rigid_mask.view([batch_size,img_h,img_w,1]) < self.inlier_thres).float()
        
        return F_final, img1_valid_mask, img1_rigid_mask.permute(0,3,1,2), fwd_flow, match

    def forward(self, inputs, output_F=False, visualizer=None):
        images, K_ms, K_inv_ms = inputs
        K, K_inv = K_ms[:,0,:,:], K_inv_ms[:,0,:,:]
        assert (images.shape[1] == 3)
        img_h, img_w = int(images.shape[2] / 2), images.shape[3] 
        img1, img2 = images[:,:,:img_h,:], images[:,:,img_h:,:]
        batch_size = img1.shape[0]

        if self.mode == 'depth':
            loss_pack, fwd_flow, bwd_flow, img1_valid_mask, img2_valid_mask, img1_flow_diff_mask, img2_flow_diff_mask = self.model_flow(inputs, output_flow=True, use_flow_loss=False)
        else:
            loss_pack, fwd_flow, bwd_flow, img1_valid_mask, img2_valid_mask, img1_flow_diff_mask, img2_flow_diff_mask = self.model_flow(inputs, output_flow=True)
        
        grid = self.meshgrid(img_h, img_w).float().to(img1.get_device()).unsqueeze(0).repeat(batch_size,1,1,1) #[b,2,h,w]
        
        fwd_corres = torch.cat([(grid[:,0,:,:] + fwd_flow[:,0,:,:]).unsqueeze(1), (grid[:,1,:,:] + fwd_flow[:,1,:,:]).unsqueeze(1)], 1)
        fwd_match = torch.cat([grid, fwd_corres], 1) # [b,4,h,w]

        bwd_corres = torch.cat([(grid[:,0,:,:] + bwd_flow[:,0,:,:]).unsqueeze(1), (grid[:,1,:,:] + bwd_flow[:,1,:,:]).unsqueeze(1)], 1)
        bwd_match = torch.cat([grid, bwd_corres], 1) # [b,4,h,w]

        # Use fwd-bwd consistency map for filter
        img1_score_mask = img1_valid_mask * 1.0 / (0.1+img1_flow_diff_mask.mean(1).unsqueeze(1))
        img2_score_mask = img2_valid_mask * 1.0 / (0.1+img2_flow_diff_mask.mean(1).unsqueeze(1))
        # img1_score_mask = img1_valid_mask
        
        F_final_1 = self.filter(fwd_match, img1_score_mask, visualizer=visualizer)
        _, dist_map_1 = self.compute_epipolar_loss(F_final_1, fwd_match.view([batch_size,4,-1]), img1_valid_mask.view([batch_size,1,-1]))
        dist_map_1 = dist_map_1.view([batch_size, img_h, img_w, 1])

        # Compute geo loss for regularize correspondence.
        rigid_mask_1, inlier_mask_1, rigid_score_1 = self.get_rigid_mask(dist_map_1)
        
        # We only use rigid mask to filter out the moving objects for computing geo loss.
        geo_loss = (dist_map_1 * (rigid_mask_1 - inlier_mask_1)).mean((1,2,3)) / \
             (rigid_mask_1 - inlier_mask_1).mean((1,2,3))

        loss_pack['geo_loss'] = geo_loss
        
        if output_F:
            return loss_pack, F_final_1, img1_score_mask, rigid_score_1.permute(0,3,1,2), fwd_flow, fwd_match
        else:
            return loss_pack






================================================
FILE: core/networks/pytorch_ssim/__init__.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from ssim import SSIM



================================================
FILE: core/networks/pytorch_ssim/ssim.py
================================================
import torch
import torch.nn as nn

def SSIM(x, y):
    C1 = 0.01 ** 2
    C2 = 0.03 ** 2

    mu_x = nn.AvgPool2d(3, 1, padding=1)(x)
    mu_y = nn.AvgPool2d(3, 1, padding=1)(y)

    sigma_x = nn.AvgPool2d(3, 1, padding=1)(x**2) - mu_x**2
    sigma_y = nn.AvgPool2d(3, 1, padding=1)(y**2) - mu_y**2
    sigma_xy = nn.AvgPool2d(3, 1, padding=1)(x * y) - mu_x * mu_y

    SSIM_n = (2 * mu_x * mu_y + C1) * (2 * sigma_xy + C2)
    SSIM_d = (mu_x**2 + mu_y**2 + C1) * (sigma_x + sigma_y + C2)

    SSIM = SSIM_n / SSIM_d
    return SSIM



================================================
FILE: core/networks/structures/__init__.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from feature_pyramid import FeaturePyramid
from pwc_tf import PWC_tf
from ransac import reduced_ransac
from depth_model import Depth_Model
from net_utils import conv, deconv, warp_flow
from flowposenet import FlowPoseNet
from inverse_warp import inverse_warp2


================================================
FILE: core/networks/structures/depth_model.py
================================================
'''
This code was ported from existing repos
[LINK] https://github.com/nianticlabs/monodepth2
'''
from __future__ import absolute_import, division, print_function
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models
import torch.utils.model_zoo as model_zoo
from collections import OrderedDict
import pdb

class ResNetMultiImageInput(models.ResNet):
    """Constructs a resnet model with varying number of input images.
    Adapted from https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py
    """
    def __init__(self, block, layers, num_classes=1000, num_input_images=1):
        super(ResNetMultiImageInput, self).__init__(block, layers)
        self.inplanes = 64
        self.conv1 = nn.Conv2d(
            num_input_images * 3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

def resnet_multiimage_input(num_layers, pretrained=False, num_input_images=1):
    """Constructs a ResNet model.
    Args:
        num_layers (int): Number of resnet layers. Must be 18 or 50
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        num_input_images (int): Number of frames stacked as input
    """
    assert num_layers in [18, 50], "Can only run with 18 or 50 layer resnet"
    blocks = {18: [2, 2, 2, 2], 50: [3, 4, 6, 3]}[num_layers]
    block_type = {18: models.resnet.BasicBlock, 50: models.resnet.Bottleneck}[num_layers]
    model = ResNetMultiImageInput(block_type, blocks, num_input_images=num_input_images)

    if pretrained:
        loaded = model_zoo.load_url(models.resnet.model_urls['resnet{}'.format(num_layers)])
        loaded['conv1.weight'] = torch.cat(
            [loaded['conv1.weight']] * num_input_images, 1) / num_input_images
        model.load_state_dict(loaded)
    return model

class ResnetEncoder(nn.Module):
    """Pytorch module for a resnet encoder
    """
    def __init__(self, num_layers, pretrained, num_input_images=1):
        super(ResnetEncoder, self).__init__()

        self.num_ch_enc = np.array([64, 64, 128, 256, 512])

        resnets = {18: models.resnet18,
                   34: models.resnet34,
                   50: models.resnet50,
                   101: models.resnet101,
                   152: models.resnet152}

        if num_layers not in resnets:
            raise ValueError("{} is not a valid number of resnet layers".format(num_layers))

        if num_input_images > 1:
            self.encoder = resnet_multiimage_input(num_layers, pretrained, num_input_images)
        else:
            self.encoder = resnets[num_layers](pretrained)

        if num_layers > 34:
            self.num_ch_enc[1:] *= 4

    def forward(self, input_image):
        self.features = []
        x = (input_image - 0.45) / 0.225
        x = self.encoder.conv1(x)
        x = self.encoder.bn1(x)
        self.features.append(self.encoder.relu(x))
        self.features.append(self.encoder.layer1(self.encoder.maxpool(self.features[-1])))
        self.features.append(self.encoder.layer2(self.features[-1]))
        self.features.append(self.encoder.layer3(self.features[-1]))
        self.features.append(self.encoder.layer4(self.features[-1]))
        return self.features

class ConvBlock(nn.Module):
    """Layer to perform a convolution followed by ELU
    """
    def __init__(self, in_channels, out_channels):
        super(ConvBlock, self).__init__()

        self.conv = Conv3x3(in_channels, out_channels)
        self.nonlin = nn.ELU(inplace=True)

    def forward(self, x):
        out = self.conv(x)
        out = self.nonlin(out)
        return out

class Conv3x3(nn.Module):
    """Layer to pad and convolve input
    """
    def __init__(self, in_channels, out_channels, use_refl=True):
        super(Conv3x3, self).__init__()

        if use_refl:
            self.pad = nn.ReflectionPad2d(1)
        else:
            self.pad = nn.ZeroPad2d(1)
        self.conv = nn.Conv2d(int(in_channels), int(out_channels), 3)

    def forward(self, x):
        out = self.pad(x)
        out = self.conv(out)
        return out

def upsample(x):
    """Upsample input tensor by a factor of 2
    """
    #return F.interpolate(x, scale_factor=2, mode="nearest")
    # TODO
    return F.interpolate(x, scale_factor=2, mode='bilinear', align_corners=False)

class DepthDecoder(nn.Module):
    def __init__(self, num_ch_enc, scales=range(4), num_output_channels=1, use_skips=True):
        super(DepthDecoder, self).__init__()

        self.num_output_channels = num_output_channels
        self.use_skips = use_skips
        self.scales = scales

        self.num_ch_enc = num_ch_enc
        self.num_ch_dec = np.array([16, 32, 64, 128, 256])

        # decoder
        self.init_decoder()

    def init_decoder(self):
        self.upconvs = nn.ModuleList()
        for i in range(4, -1, -1):
            upconvs_now = nn.ModuleList()
            # upconv_0
            num_ch_in = self.num_ch_enc[-1] if i == 4 else self.num_ch_dec[i + 1]
            num_ch_out = self.num_ch_dec[i]
            upconvs_now.append(ConvBlock(num_ch_in, num_ch_out))

            # upconv_1
            num_ch_in = self.num_ch_dec[i]
            if self.use_skips and i > 0:
                num_ch_in += self.num_ch_enc[i - 1]
            num_ch_out = self.num_ch_dec[i]
            upconvs_now.append(ConvBlock(num_ch_in, num_ch_out))
            self.upconvs.append(upconvs_now)

        self.dispconvs = nn.ModuleList()
        for s in self.scales:
            self.dispconvs.append(Conv3x3(self.num_ch_dec[s], self.num_output_channels))

        # self.decoder = nn.ModuleList(list(self.convs.values()))
        self.sigmoid = nn.Sigmoid()

    def forward(self, input_features):
        self.outputs = {}

        # decoder
        x = input_features[-1]
        for scale in range(4, -1, -1): # [4, 3, 2, 1, 0]
            idx = 4 - scale
            x = self.upconvs[idx][0](x)
            x = [upsample(x)]
            if self.use_skips and scale > 0:
                x += [input_features[scale - 1]]
            x = torch.cat(x, 1)
            x = self.upconvs[idx][1](x)

            # get disp
            if scale in self.scales:
                scale_idx = self.scales.index(scale)
                self.outputs[("disp", scale)] = self.sigmoid(self.dispconvs[scale_idx](x))
        return self.outputs

class Depth_Model(nn.Module):
    def __init__(self, depth_scale, num_layers=18):
        super(Depth_Model, self).__init__()
        self.depth_scale = depth_scale
        self.encoder = ResnetEncoder(num_layers=num_layers, pretrained=False)
        self.decoder = DepthDecoder(self.encoder.num_ch_enc, scales=range(depth_scale))

    def forward(self, img):
        features = self.encoder(img)
        outputs = self.decoder(features)
        depth_list = []
        disp_list = []
        for i in range(self.depth_scale):
            disp = outputs['disp', i]
            #s_disp, depth = self.disp2depth(disp, self.min_depth, self.max_depth)
            #depth_list.append(depth)
            #disp_list.append(s_disp)
            disp_list.append(disp)
        return disp_list



================================================
FILE: core/networks/structures/feature_pyramid.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from net_utils import conv
import torch
import torch.nn as nn

class FeaturePyramid(nn.Module):
    def __init__(self):
        super(FeaturePyramid, self).__init__()
        self.conv1 = conv(3,   16, kernel_size=3, stride=2)
        self.conv2 = conv(16,  16, kernel_size=3, stride=1)
        self.conv3 = conv(16,  32, kernel_size=3, stride=2)
        self.conv4 = conv(32,  32, kernel_size=3, stride=1)
        self.conv5 = conv(32,  64, kernel_size=3, stride=2)
        self.conv6 = conv(64,  64, kernel_size=3, stride=1)
        self.conv7 = conv(64,  96, kernel_size=3, stride=2)
        self.conv8 = conv(96,  96, kernel_size=3, stride=1)
        self.conv9 = conv(96, 128, kernel_size=3, stride=2)
        self.conv10 = conv(128, 128, kernel_size=3, stride=1)
        self.conv11 = conv(128, 196, kernel_size=3, stride=2)
        self.conv12 = conv(196, 196, kernel_size=3, stride=1)
        '''
        for m in self.modules():
            if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
                nn.init.constant_(m.weight.data, 0.0)
                if m.bias is not None:
                    m.bias.data.zero_()
        '''
    def forward(self, img):
        cnv2 = self.conv2(self.conv1(img))
        cnv4 = self.conv4(self.conv3(cnv2))
        cnv6 = self.conv6(self.conv5(cnv4))
        cnv8 = self.conv8(self.conv7(cnv6))
        cnv10 = self.conv10(self.conv9(cnv8))
        cnv12 = self.conv12(self.conv11(cnv10))
        return cnv2, cnv4, cnv6, cnv8, cnv10, cnv12



================================================
FILE: core/networks/structures/flowposenet.py
================================================
import torch
import torch.nn as nn
from torch import sigmoid
from torch.nn.init import xavier_uniform_, zeros_


def conv(in_planes, out_planes, kernel_size=3):
    return nn.Sequential(
        nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, padding=(kernel_size-1)//2, stride=2),
        nn.ReLU(inplace=True)
    )

def upconv(in_planes, out_planes):
    return nn.Sequential(
        nn.ConvTranspose2d(in_planes, out_planes, kernel_size=4, stride=2, padding=1),
        nn.ReLU(inplace=True)
    )

class FlowPoseNet(nn.Module):

    def __init__(self):
        super(FlowPoseNet, self).__init__()

        conv_planes = [16, 32, 64, 128, 256, 256, 256]
        self.conv1 = conv(2, conv_planes[0], kernel_size=7)
        self.conv2 = conv(conv_planes[0], conv_planes[1], kernel_size=5)
        self.conv3 = conv(conv_planes[1], conv_planes[2])
        self.conv4 = conv(conv_planes[2], conv_planes[3])
        self.conv5 = conv(conv_planes[3], conv_planes[4])
        self.conv6 = conv(conv_planes[4], conv_planes[5])
        self.conv7 = conv(conv_planes[5], conv_planes[6])

        self.pose_pred = nn.Conv2d(conv_planes[6], 6, kernel_size=1, padding=0)

    def init_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
                xavier_uniform_(m.weight.data)
                if m.bias is not None:
                    zeros_(m.bias)

    def forward(self, flow):
        # [B, 2, H, W]
        input = flow
        out_conv1 = self.conv1(input)
        out_conv2 = self.conv2(out_conv1)
        out_conv3 = self.conv3(out_conv2)
        out_conv4 = self.conv4(out_conv3)
        out_conv5 = self.conv5(out_conv4)
        out_conv6 = self.conv6(out_conv5)
        out_conv7 = self.conv7(out_conv6)

        pose = self.pose_pred(out_conv7)
        pose = pose.mean(3).mean(2)
        pose = 0.01 * pose.view(pose.size(0), 6)
 
        return pose






================================================
FILE: core/networks/structures/inverse_warp.py
================================================
from __future__ import division
import torch
import torch.nn.functional as F

pixel_coords = None


def set_id_grid(depth):
    global pixel_coords
    b, h, w = depth.size()
    i_range = torch.arange(0, h).view(1, h, 1).expand(
        1, h, w).type_as(depth)  # [1, H, W]
    j_range = torch.arange(0, w).view(1, 1, w).expand(
        1, h, w).type_as(depth)  # [1, H, W]
    ones = torch.ones(1, h, w).type_as(depth)

    pixel_coords = torch.stack((j_range, i_range, ones), dim=1)  # [1, 3, H, W]


def check_sizes(input, input_name, expected):
    condition = [input.ndimension() == len(expected)]
    for i, size in enumerate(expected):
        if size.isdigit():
            condition.append(input.size(i) == int(size))
    assert(all(condition)), "wrong size for {}, expected {}, got  {}".format(
        input_name, 'x'.join(expected), list(input.size()))


def pixel2cam(depth, intrinsics_inv):
    global pixel_coords
    """Transform coordinates in the pixel frame to the camera frame.
    Args:
        depth: depth maps -- [B, H, W]
        intrinsics_inv: intrinsics_inv matrix for each element of batch -- [B, 3, 3]
    Returns:
        array of (u,v,1) cam coordinates -- [B, 3, H, W]
    """
    b, h, w = depth.size()
    if (pixel_coords is None) or pixel_coords.size(2) < h:
        set_id_grid(depth)
    current_pixel_coords = pixel_coords[:, :, :h, :w].expand(
        b, 3, h, w).reshape(b, 3, -1)  # [B, 3, H*W]
    cam_coords = (intrinsics_inv @ current_pixel_coords).reshape(b, 3, h, w)
    return cam_coords * depth.unsqueeze(1)


def cam2pixel(cam_coords, proj_c2p_rot, proj_c2p_tr, padding_mode):
    """Transform coordinates in the camera frame to the pixel frame.
    Args:
        cam_coords: pixel coordinates defined in the first camera coordinates system -- [B, 4, H, W]
        proj_c2p_rot: rotation matrix of cameras -- [B, 3, 4]
        proj_c2p_tr: translation vectors of cameras -- [B, 3, 1]
    Returns:
        array of [-1,1] coordinates -- [B, 2, H, W]
    """
    b, _, h, w = cam_coords.size()
    cam_coords_flat = cam_coords.reshape(b, 3, -1)  # [B, 3, H*W]
    if proj_c2p_rot is not None:
        pcoords = proj_c2p_rot @ cam_coords_flat
    else:
        pcoords = cam_coords_flat

    if proj_c2p_tr is not None:
        pcoords = pcoords + proj_c2p_tr  # [B, 3, H*W]
    X = pcoords[:, 0]
    Y = pcoords[:, 1]
    Z = pcoords[:, 2].clamp(min=1e-3)

    # Normalized, -1 if on extreme left, 1 if on extreme right (x = w-1) [B, H*W]
    X_norm = 2*(X / Z)/(w-1) - 1
    Y_norm = 2*(Y / Z)/(h-1) - 1  # Idem [B, H*W]

    pixel_coords = torch.stack([X_norm, Y_norm], dim=2)  # [B, H*W, 2]
    return pixel_coords.reshape(b, h, w, 2)


def euler2mat(angle):
    """Convert euler angles to rotation matrix.
     Reference: https://github.com/pulkitag/pycaffe-utils/blob/master/rot_utils.py#L174
    Args:
        angle: rotation angle along 3 axis (in radians) -- size = [B, 3]
    Returns:
        Rotation matrix corresponding to the euler angles -- size = [B, 3, 3]
    """
    B = angle.size(0)
    x, y, z = angle[:, 0], angle[:, 1], angle[:, 2]

    cosz = torch.cos(z)
    sinz = torch.sin(z)

    zeros = z.detach()*0
    ones = zeros.detach()+1
    zmat = torch.stack([cosz, -sinz, zeros,
                        sinz,  cosz, zeros,
                        zeros, zeros,  ones], dim=1).reshape(B, 3, 3)

    cosy = torch.cos(y)
    siny = torch.sin(y)

    ymat = torch.stack([cosy, zeros,  siny,
                        zeros,  ones, zeros,
                        -siny, zeros,  cosy], dim=1).reshape(B, 3, 3)

    cosx = torch.cos(x)
    sinx = torch.sin(x)

    xmat = torch.stack([ones, zeros, zeros,
                        zeros,  cosx, -sinx,
                        zeros,  sinx,  cosx], dim=1).reshape(B, 3, 3)

    rotMat = xmat @ ymat @ zmat
    return rotMat


def quat2mat(quat):
    """Convert quaternion coefficients to rotation matrix.
    Args:
        quat: first three coeff of quaternion of rotation. fourht is then computed to have a norm of 1 -- size = [B, 3]
    Returns:
        Rotation matrix corresponding to the quaternion -- size = [B, 3, 3]
    """
    norm_quat = torch.cat([quat[:, :1].detach()*0 + 1, quat], dim=1)
    norm_quat = norm_quat/norm_quat.norm(p=2, dim=1, keepdim=True)
    w, x, y, z = norm_quat[:, 0], norm_quat[:,
                                            1], norm_quat[:, 2], norm_quat[:, 3]

    B = quat.size(0)

    w2, x2, y2, z2 = w.pow(2), x.pow(2), y.pow(2), z.pow(2)
    wx, wy, wz = w*x, w*y, w*z
    xy, xz, yz = x*y, x*z, y*z

    rotMat = torch.stack([w2 + x2 - y2 - z2, 2*xy - 2*wz, 2*wy + 2*xz,
                          2*wz + 2*xy, w2 - x2 + y2 - z2, 2*yz - 2*wx,
                          2*xz - 2*wy, 2*wx + 2*yz, w2 - x2 - y2 + z2], dim=1).reshape(B, 3, 3)
    return rotMat


def pose_vec2mat(vec, rotation_mode='euler'):
    """
    Convert 6DoF parameters to transformation matrix.
    Args:s
        vec: 6DoF parameters in the order of tx, ty, tz, rx, ry, rz -- [B, 6]
    Returns:
        A transformation matrix -- [B, 3, 4]
    """
    translation = vec[:, :3].unsqueeze(-1)  # [B, 3, 1]
    rot = vec[:, 3:]
    if rotation_mode == 'euler':
        rot_mat = euler2mat(rot)  # [B, 3, 3]
    elif rotation_mode == 'quat':
        rot_mat = quat2mat(rot)  # [B, 3, 3]
    transform_mat = torch.cat([rot_mat, translation], dim=2)  # [B, 3, 4]
    return transform_mat


def inverse_warp(img, depth, pose, intrinsics, rotation_mode='euler', padding_mode='zeros'):
    """
    Inverse warp a source image to the target image plane.
    Args:
        img: the source image (where to sample pixels) -- [B, 3, H, W]
        depth: depth map of the target image -- [B, H, W]
        pose: 6DoF pose parameters from target to source -- [B, 6]
        intrinsics: camera intrinsic matrix -- [B, 3, 3]
    Returns:
        projected_img: Source image warped to the target image plane
        valid_points: Boolean array indicating point validity
    """
    check_sizes(img, 'img', 'B3HW')
    check_sizes(depth, 'depth', 'BHW')
    check_sizes(pose, 'pose', 'B6')
    check_sizes(intrinsics, 'intrinsics', 'B33')

    batch_size, _, img_height, img_width = img.size()

    cam_coords = pixel2cam(depth, intrinsics.inverse())  # [B,3,H,W]

    pose_mat = pose_vec2mat(pose, rotation_mode)  # [B,3,4]

    # Get projection matrix for tgt camera frame to source pixel frame
    proj_cam_to_src_pixel = intrinsics @ pose_mat  # [B, 3, 4]

    rot, tr = proj_cam_to_src_pixel[:, :, :3], proj_cam_to_src_pixel[:, :, -1:]
    src_pixel_coords = cam2pixel(
        cam_coords, rot, tr, padding_mode)  # [B,H,W,2]
    projected_img = F.grid_sample(
        img, src_pixel_coords, padding_mode=padding_mode)

    valid_points = src_pixel_coords.abs().max(dim=-1)[0] <= 1

    return projected_img, valid_points


def cam2pixel2(cam_coords, proj_c2p_rot, proj_c2p_tr, padding_mode):
    """Transform coordinates in the camera frame to the pixel frame.
    Args:
        cam_coords: pixel coordinates defined in the first camera coordinates system -- [B, 4, H, W]
        proj_c2p_rot: rotation matrix of cameras -- [B, 3, 4]
        proj_c2p_tr: translation vectors of cameras -- [B, 3, 1]
    Returns:
        array of [-1,1] coordinates -- [B, 2, H, W]
    """
    b, _, h, w = cam_coords.size()
    cam_coords_flat = cam_coords.reshape(b, 3, -1)  # [B, 3, H*W]
    if proj_c2p_rot is not None:
        pcoords = proj_c2p_rot @ cam_coords_flat
    else:
        pcoords = cam_coords_flat

    if proj_c2p_tr is not None:
        pcoords = pcoords + proj_c2p_tr  # [B, 3, H*W]
    X = pcoords[:, 0]
    Y = pcoords[:, 1]
    Z = pcoords[:, 2].clamp(min=1e-3)

    # Normalized, -1 if on extreme left, 1 if on extreme right (x = w-1) [B, H*W]
    X_norm = 2*(X / Z)/(w-1) - 1
    Y_norm = 2*(Y / Z)/(h-1) - 1  # Idem [B, H*W]
    if padding_mode == 'zeros':
        X_mask = ((X_norm > 1)+(X_norm < -1)).detach()
        # make sure that no point in warped image is a combinaison of im and gray
        X_norm[X_mask] = 2
        Y_mask = ((Y_norm > 1)+(Y_norm < -1)).detach()
        Y_norm[Y_mask] = 2

    pixel_coords = torch.stack([X_norm, Y_norm], dim=2)  # [B, H*W, 2]
    return pixel_coords.reshape(b, h, w, 2), Z.reshape(b, 1, h, w)


def inverse_warp2(img, depth, ref_depth, pose, intrinsics, padding_mode='zeros'):
    """
    Inverse warp a source image to the target image plane.
    Args:
        img: the source image (where to sample pixels) -- [B, 3, H, W]
        depth: depth map of the target image -- [B, 1, H, W]
        ref_depth: the source depth map (where to sample depth) -- [B, 1, H, W] 
        pose: 6DoF pose parameters from target to source -- [B, 6]
        intrinsics: camera intrinsic matrix -- [B, 3, 3]
    Returns:
        projected_img: Source image warped to the target image plane
        valid_mask: Float array indicating point validity
    """
    check_sizes(img, 'img', 'B3HW')
    check_sizes(depth, 'depth', 'B1HW')
    check_sizes(ref_depth, 'ref_depth', 'B1HW')
    check_sizes(pose, 'pose', 'B6')
    check_sizes(intrinsics, 'intrinsics', 'B33')

    batch_size, _, img_height, img_width = img.size()

    cam_coords = pixel2cam(depth.squeeze(1), intrinsics.inverse())  # [B,3,H,W]

    pose_mat = pose_vec2mat(pose)  # [B,3,4]

    # Get projection matrix for tgt camera frame to source pixel frame
    proj_cam_to_src_pixel = intrinsics @ pose_mat  # [B, 3, 4]

    rot, tr = proj_cam_to_src_pixel[:, :, :3], proj_cam_to_src_pixel[:, :, -1:]
    src_pixel_coords, computed_depth = cam2pixel2(
        cam_coords, rot, tr, padding_mode)  # [B,H,W,2]
    projected_img = F.grid_sample(
        img, src_pixel_coords, padding_mode=padding_mode)

    valid_points = src_pixel_coords.abs().max(dim=-1)[0] <= 1
    valid_mask = valid_points.unsqueeze(1).float()

    projected_depth = F.grid_sample(
        ref_depth, src_pixel_coords, padding_mode=padding_mode).clamp(min=1e-3)

    return projected_img, valid_mask, projected_depth, computed_depth

================================================
FILE: core/networks/structures/net_utils.py
================================================
import torch
import torch.nn as nn
from torch.autograd import Variable
import pdb
import numpy as np

def conv(in_planes, out_planes, kernel_size=3, stride=1, padding=1, dilation=1):
    return nn.Sequential(
            nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride,
                        padding=padding, dilation=dilation, bias=True),
            nn.LeakyReLU(0.1))

def deconv(in_planes, out_planes, kernel_size=4, stride=2, padding=1):
    return nn.ConvTranspose2d(in_planes, out_planes, kernel_size, stride, padding, bias=True)

def warp_flow(x, flow, use_mask=False):
    """
    warp an image/tensor (im2) back to im1, according to the optical flow

    Inputs:
    x: [B, C, H, W] (im2)
    flow: [B, 2, H, W] flow

    Returns:
    ouptut: [B, C, H, W]
    """
    B, C, H, W = x.size()
    # mesh grid 
    xx = torch.arange(0, W).view(1,-1).repeat(H,1)
    yy = torch.arange(0, H).view(-1,1).repeat(1,W)
    xx = xx.view(1,1,H,W).repeat(B,1,1,1)
    yy = yy.view(1,1,H,W).repeat(B,1,1,1)
    grid = torch.cat((xx,yy),1).float()

    if grid.shape != flow.shape:
        raise ValueError('the shape of grid {0} is not equal to the shape of flow {1}.'.format(grid.shape, flow.shape))
    if x.is_cuda:
        grid = grid.to(x.get_device())
    vgrid = grid + flow

    # scale grid to [-1,1] 
    vgrid[:,0,:,:] = 2.0*vgrid[:,0,:,:].clone() / max(W-1,1)-1.0
    vgrid[:,1,:,:] = 2.0*vgrid[:,1,:,:].clone() / max(H-1,1)-1.0

    vgrid = vgrid.permute(0,2,3,1)        
    output = nn.functional.grid_sample(x, vgrid)
    if use_mask:
        mask = torch.autograd.Variable(torch.ones(x.size())).to(x.get_device())
        mask = nn.functional.grid_sample(mask, vgrid)
        mask[mask < 0.9999] = 0
        mask[mask > 0] = 1
        return output * mask
    else:
        return output

if __name__ == '__main__':
    x = np.ones([1,1,10,10])
    flow = np.stack([np.ones([1,10,10])*3.0, np.zeros([1,10,10])], axis=1)
    y = warp_flow(torch.from_numpy(x).cuda().float(),torch.from_numpy(flow).cuda().float()).cpu().detach().numpy()
    print(y)



================================================
FILE: core/networks/structures/pwc_tf.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from net_utils import conv, deconv, warp_flow
sys.path.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', '..', 'external'))
# from correlation_package.correlation import Correlation
# from spatial_correlation_sampler import SpatialCorrelationSampler as Correlation

import torch
import torch.nn as nn
from torch.autograd import Variable
import numpy as np
import pdb
import torch.nn.functional as F
#from spatial_correlation_sampler import spatial_correlation_sample

class PWC_tf(nn.Module):
    def __init__(self, md=4):
        super(PWC_tf, self).__init__()
        self.corr = self.corr_naive
        # self.corr = self.correlate
        self.leakyRELU = nn.LeakyReLU(0.1)
        
        nd = (2*md+1)**2
        #dd = np.cumsum([128,128,96,64,32])
        dd = np.array([128,128,96,64,32])

        od = nd
        self.conv6_0 = conv(od,      128, kernel_size=3, stride=1)
        self.conv6_1 = conv(dd[0],   128, kernel_size=3, stride=1)
        self.conv6_2 = conv(dd[0]+dd[1],96,  kernel_size=3, stride=1)
        self.conv6_3 = conv(dd[1]+dd[2],64,  kernel_size=3, stride=1)
        self.conv6_4 = conv(dd[2]+dd[3],32,  kernel_size=3, stride=1)        
        self.predict_flow6 = self.predict_flow(dd[3]+dd[4])
        #self.deconv6 = deconv(2, 2, kernel_size=4, stride=2, padding=1) 
        #self.upfeat6 = deconv(od+dd[4], 2, kernel_size=4, stride=2, padding=1) 
        
        od = nd+128+2
        self.conv5_0 = conv(od,      128, kernel_size=3, stride=1)
        self.conv5_1 = conv(dd[0],   128, kernel_size=3, stride=1)
        self.conv5_2 = conv(dd[0]+dd[1],96,  kernel_size=3, stride=1)
        self.conv5_3 = conv(dd[1]+dd[2],64,  kernel_size=3, stride=1)
        self.conv5_4 = conv(dd[2]+dd[3],32,  kernel_size=3, stride=1)
        self.predict_flow5 = self.predict_flow(dd[3]+dd[4]) 
        #self.deconv5 = deconv(2, 2, kernel_size=4, stride=2, padding=1) 
        #self.upfeat5 = deconv(od+dd[4], 2, kernel_size=4, stride=2, padding=1) 
        
        od = nd+96+2
        self.conv4_0 = conv(od,      128, kernel_size=3, stride=1)
        self.conv4_1 = conv(dd[0],   128, kernel_size=3, stride=1)
        self.conv4_2 = conv(dd[0]+dd[1],96,  kernel_size=3, stride=1)
        self.conv4_3 = conv(dd[1]+dd[2],64,  kernel_size=3, stride=1)
        self.conv4_4 = conv(dd[2]+dd[3],32,  kernel_size=3, stride=1)
        self.predict_flow4 = self.predict_flow(dd[3]+dd[4]) 
        #self.deconv4 = deconv(2, 2, kernel_size=4, stride=2, padding=1) 
        #self.upfeat4 = deconv(od+dd[4], 2, kernel_size=4, stride=2, padding=1) 
        
        od = nd+64+2
        self.conv3_0 = conv(od,      128, kernel_size=3, stride=1)
        self.conv3_1 = conv(dd[0],   128, kernel_size=3, stride=1)
        self.conv3_2 = conv(dd[0]+dd[1],96,  kernel_size=3, stride=1)
        self.conv3_3 = conv(dd[1]+dd[2],64,  kernel_size=3, stride=1)
        self.conv3_4 = conv(dd[2]+dd[3],32,  kernel_size=3, stride=1)
        self.predict_flow3 = self.predict_flow(dd[3]+dd[4])
        #self.deconv3 = deconv(2, 2, kernel_size=4, stride=2, padding=1) 
        #self.upfeat3 = deconv(od+dd[4], 2, kernel_size=4, stride=2, padding=1) 
        
        od = nd+32+2
        self.conv2_0 = conv(od,      128, kernel_size=3, stride=1)
        self.conv2_1 = conv(dd[0],   128, kernel_size=3, stride=1)
        self.conv2_2 = conv(dd[0]+dd[1],96,  kernel_size=3, stride=1)
        self.conv2_3 = conv(dd[1]+dd[2],64,  kernel_size=3, stride=1)
        self.conv2_4 = conv(dd[2]+dd[3],32,  kernel_size=3, stride=1)
        self.predict_flow2 = self.predict_flow(dd[3]+dd[4]) 
        #self.deconv2 = deconv(2, 2, kernel_size=4, stride=2, padding=1) 
        
        self.dc_conv1 = conv(dd[4]+2,  128, kernel_size=3, stride=1, padding=1,  dilation=1)
        self.dc_conv2 = conv(128,      128, kernel_size=3, stride=1, padding=2,  dilation=2)
        self.dc_conv3 = conv(128,      128, kernel_size=3, stride=1, padding=4,  dilation=4)
        self.dc_conv4 = conv(128,      96,  kernel_size=3, stride=1, padding=8,  dilation=8)
        self.dc_conv5 = conv(96,       64,  kernel_size=3, stride=1, padding=16, dilation=16)
        self.dc_conv6 = conv(64,       32,  kernel_size=3, stride=1, padding=1,  dilation=1)
        self.dc_conv7 = self.predict_flow(32)

        '''
        for m in self.modules():
            if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
                nn.init.kaiming_zeros_(m.weight.data)
                if m.bias is not None:
                    m.bias.data.zero_()
        '''
    def predict_flow(self, in_planes):
        return nn.Conv2d(in_planes,2,kernel_size=3,stride=1,padding=1,bias=True)

    def warp(self, x, flow):
        return warp_flow(x, flow, use_mask=False)

    def corr_naive(self, input1, input2, d=4):
        # naive pytorch implementation of the correlation layer.
        assert (input1.shape == input2.shape)
        batch_size, feature_num, H, W = input1.shape[0:4]
        input2 = F.pad(input2, (d,d,d,d), value=0)
        cv = []
        for i in range(2 * d + 1):
            for j in range(2 * d + 1):
                cv.append((input1 * input2[:, :, i:(i + H), j:(j + W)]).mean(1).unsqueeze(1))
        return torch.cat(cv, 1)
    
    def forward(self, feature_list_1, feature_list_2, img_hw):
        c11, c12, c13, c14, c15, c16 = feature_list_1
        c21, c22, c23, c24, c25, c26 = feature_list_2

        corr6 = self.corr(c16, c26) 
        x0 = self.conv6_0(corr6)
        x1 = self.conv6_1(x0)
        x2 = self.conv6_2(torch.cat((x0,x1),1))
        x3 = self.conv6_3(torch.cat((x1,x2),1))
        x4 = self.conv6_4(torch.cat((x2,x3),1))
        flow6 = self.predict_flow6(torch.cat((x3,x4),1))
        up_flow6 = F.interpolate(flow6, scale_factor=2.0, mode='bilinear')*2.0

        warp5 = self.warp(c25, up_flow6)
        corr5 = self.corr(c15, warp5) 
        x = torch.cat((corr5, c15, up_flow6), 1)
        x0 = self.conv5_0(x)
        x1 = self.conv5_1(x0)
        x2 = self.conv5_2(torch.cat((x0,x1),1))
        x3 = self.conv5_3(torch.cat((x1,x2),1))
        x4 = self.conv5_4(torch.cat((x2,x3),1))
        flow5 = self.predict_flow5(torch.cat((x3,x4),1))
        flow5 = flow5 + up_flow6
        up_flow5 = F.interpolate(flow5, scale_factor=2.0, mode='bilinear')*2.0

       
        warp4 = self.warp(c24, up_flow5)
        corr4 = self.corr(c14, warp4)  
        x = torch.cat((corr4, c14, up_flow5), 1)
        x0 = self.conv4_0(x)
        x1 = self.conv4_1(x0)
        x2 = self.conv4_2(torch.cat((x0,x1),1))
        x3 = self.conv4_3(torch.cat((x1,x2),1))
        x4 = self.conv4_4(torch.cat((x2,x3),1))
        flow4 = self.predict_flow4(torch.cat((x3,x4),1))
        flow4 = flow4 + up_flow5
        up_flow4 = F.interpolate(flow4, scale_factor=2.0, mode='bilinear')*2.0

        warp3 = self.warp(c23, up_flow4)
        corr3 = self.corr(c13, warp3) 
        x = torch.cat((corr3, c13, up_flow4), 1)
        x0 = self.conv3_0(x)
        x1 = self.conv3_1(x0)
        x2 = self.conv3_2(torch.cat((x0,x1),1))
        x3 = self.conv3_3(torch.cat((x1,x2),1))
        x4 = self.conv3_4(torch.cat((x2,x3),1))
        flow3 = self.predict_flow3(torch.cat((x3,x4),1))
        flow3 = flow3 + up_flow4
        up_flow3 = F.interpolate(flow3, scale_factor=2.0, mode='bilinear')*2.0


        warp2 = self.warp(c22, up_flow3) 
        corr2 = self.corr(c12, warp2)
        x = torch.cat((corr2, c12, up_flow3), 1)
        x0 = self.conv2_0(x)
        x1 = self.conv2_1(x0)
        x2 = self.conv2_2(torch.cat((x0,x1),1))
        x3 = self.conv2_3(torch.cat((x1,x2),1))
        x4 = self.conv2_4(torch.cat((x2,x3),1))
        flow2 = self.predict_flow2(torch.cat((x3,x4),1))
        flow2 = flow2 + up_flow3
 
        x = self.dc_conv4(self.dc_conv3(self.dc_conv2(self.dc_conv1(torch.cat([flow2, x4], 1)))))
        flow2 = flow2 + self.dc_conv7(self.dc_conv6(self.dc_conv5(x)))

        img_h, img_w = img_hw[0], img_hw[1]
        flow2 = F.interpolate(flow2 * 4.0, [img_h, img_w], mode='bilinear')
        flow3 = F.interpolate(flow3 * 4.0, [img_h // 2, img_w // 2], mode='bilinear')
        flow4 = F.interpolate(flow4 * 4.0, [img_h // 4, img_w // 4], mode='bilinear')
        flow5 = F.interpolate(flow5 * 4.0, [img_h // 8, img_w // 8], mode='bilinear')
        
        return [flow2, flow3, flow4, flow5]



================================================
FILE: core/networks/structures/ransac.py
================================================
import torch
import numpy as np
import os, sys
import torch.nn as nn
import pdb
import cv2

class reduced_ransac(nn.Module):
    def __init__(self, check_num, thres, dataset):
        super(reduced_ransac, self).__init__()
        self.check_num = check_num
        self.thres = thres
        self.dataset = dataset

    def robust_rand_sample(self, match, mask, num, robust=True):
        # match: [b, 4, -1] mask: [b, 1, -1]
        b, n = match.shape[0], match.shape[2]
        nonzeros_num = torch.min(torch.sum(mask > 0, dim=-1)) # []
        if nonzeros_num.detach().cpu().numpy() == n:
            rand_int = torch.randint(0, n, [num])
            select_match = match[:,:,rand_int]
        else:
            # If there is zero score in match, sample the non-zero matches.
            select_idxs = []
            if robust:
                num = np.minimum(nonzeros_num.detach().cpu().numpy(), num)
            for i in range(b):
                nonzero_idx = torch.nonzero(mask[i,0,:]) # [nonzero_num,1]
                rand_int = torch.randint(0, nonzero_idx.shape[0], [int(num)])
                select_idx = nonzero_idx[rand_int, :] # [num, 1]
                select_idxs.append(select_idx)
            select_idxs = torch.stack(select_idxs, 0) # [b,num,1]
            select_match = torch.gather(match.transpose(1,2), index=select_idxs.repeat(1,1,4), dim=1).transpose(1,2) # [b, 4, num]
        return select_match, num

    def top_ratio_sample(self, match, mask, ratio):
        # match: [b, 4, -1] mask: [b, 1, -1]
        b, total_num = match.shape[0], match.shape[-1]
        scores, indices = torch.topk(mask, int(ratio*total_num), dim=-1) # [B, 1, ratio*tnum]
        select_match = torch.gather(match.transpose(1,2), index=indices.squeeze(1).unsqueeze(-1).repeat(1,1,4), dim=1).transpose(1,2) # [b, 4, ratio*tnum]
        return select_match, scores


    def forward(self, match, mask, visualizer=None):
        # match: [B, 4, H, W] mask: [B, 1, H, W]
        b, h, w = match.shape[0], match.shape[2], match.shape[3]
        match = match.view([b, 4, -1]).contiguous()
        mask = mask.view([b, 1, -1]).contiguous()
        
        # Sample matches for RANSAC 8-point and best F selection
        top_ratio_match, top_ratio_mask = self.top_ratio_sample(match, mask, ratio=0.20) # [b, 4, ratio*H*W] 
        check_match, check_num = self.robust_rand_sample(top_ratio_match, top_ratio_mask, num=self.check_num) # [b, 4, check_num]
        check_match = check_match.contiguous()

        cv_f = []
        for i in range(b):
            if self.dataset == 'nyuv2':
                f, m = cv2.findFundamentalMat(check_match[i,:2,:].transpose(0,1).detach().cpu().numpy(), check_match[i,2:,:].transpose(0,1).detach().cpu().numpy(), cv2.FM_LMEDS, 0.99)
            else:
                f, m = cv2.findFundamentalMat(check_match[i,:2,:].transpose(0,1).detach().cpu().numpy(), check_match[i,2:,:].transpose(0,1).detach().cpu().numpy(), cv2.FM_RANSAC, 0.1, 0.99)
            cv_f.append(f)
        cv_f = np.stack(cv_f, axis=0)
        cv_f = torch.from_numpy(cv_f).float().to(match.get_device())
        
        return cv_f



================================================
FILE: core/visualize/__init__.py
================================================
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from visualizer import Visualizer
from visualizer import Visualizer_debug
from profiler import Profiler


================================================
FILE: core/visualize/profiler.py
================================================
import os
import time
import torch
import pdb

class Profiler(object):
    def __init__(self, silent=False):
        self.silent = silent
        torch.cuda.synchronize()
        self.start = time.time()
        self.cache_time = self.start

    def reset(self, silent=None):
        if silent is None:
            silent = self.silent
        self.__init__(silent=silent)

    def report_process(self, process_name):
        if self.silent:
            return None
        torch.cuda.synchronize()
        now = time.time()
        print('{0}\t: {1:.4f}'.format(process_name, now - self.cache_time))
        self.cache_time = now

    def report_all(self, whole_process_name):
        if self.silent:
            return None
        torch.cuda.synchronize()
        now = time.time()
        print('{0}\t: {1:.4f}'.format(whole_process_name, now - self.start))
        pdb.set_trace()



================================================
FILE: core/visualize/visualizer.py
================================================
import os, sys
import numpy as np
import cv2
import pdb
import pickle
from mpl_toolkits import mplot3d
import matplotlib.pyplot as plt
import PIL.Image as pil
import matplotlib as mpl
import matplotlib.cm as cm


colorlib = [(0,0,255),(255,0,0),(0,255,0),(255,255,0),(0,255,255),(255,0,255),(0,0,0),(255,255,255)]

class Visualizer(object):
    def __init__(self, loss_weights_dict, dump_dir=None):
        self.loss_weights_dict = loss_weights_dict
        self.use_flow_error = (self.loss_weights_dict['flow_error'] > 0)
        self.dump_dir = dump_dir

        self.log_list = []

    def add_log_pack(self, log_pack):
        self.log_list.append(log_pack)

    def dump_log(self, fname=None):
        if fname is None:
            fname = self.dump_dir
        with open(fname, 'wb') as f:
            pickle.dump(self.log_list, f)

    def print_loss(self, loss_pack, iter_=None):
        loss_pixel = loss_pack['loss_pixel'].mean().detach().cpu().numpy()
        loss_ssim = loss_pack['loss_ssim'].mean().detach().cpu().numpy()
        loss_flow_smooth = loss_pack['loss_flow_smooth'].mean().detach().cpu().numpy()
        loss_flow_consis = loss_pack['loss_flow_consis'].mean().detach().cpu().numpy()
        if 'pt_depth_loss' in loss_pack.keys():
            loss_pt_depth = loss_pack['pt_depth_loss'].mean().detach().cpu().numpy()
            loss_pj_depth = loss_pack['pj_depth_loss'].mean().detach().cpu().numpy()
            loss_depth_smooth = loss_pack['depth_smooth_loss'].mean().detach().cpu().numpy()
            str_= ('iter: {0}, loss_pixel: {1:.6f}, loss_ssim: {2:.6f}, loss_pt_depth: {3:.6f}, loss_pj_depth: {4:.6f}, los
Download .txt
gitextract_5bwpkhvi/

├── LICENSE
├── README.md
├── config/
│   ├── kitti.yaml
│   ├── kitti_3stage.yaml
│   ├── nyu.yaml
│   ├── nyu_192.yaml
│   ├── nyu_192_3stage.yaml
│   ├── nyu_3stage.yaml
│   ├── nyu_posenet_192.yaml
│   ├── odo.yaml
│   ├── odo_3stage.yaml
│   └── odo_posenet.yaml
├── core/
│   ├── config/
│   │   ├── __init__.py
│   │   └── config_utils.py
│   ├── dataset/
│   │   ├── __init__.py
│   │   ├── kitti_2012.py
│   │   ├── kitti_2015.py
│   │   ├── kitti_odo.py
│   │   ├── kitti_prepared.py
│   │   ├── kitti_raw.py
│   │   └── nyu_v2.py
│   ├── evaluation/
│   │   ├── __init__.py
│   │   ├── eval_odom.py
│   │   ├── evaluate_depth.py
│   │   ├── evaluate_flow.py
│   │   ├── evaluate_mask.py
│   │   ├── evaluation_utils.py
│   │   └── flowlib.py
│   ├── networks/
│   │   ├── __init__.py
│   │   ├── model_depth_pose.py
│   │   ├── model_flow.py
│   │   ├── model_flowposenet.py
│   │   ├── model_triangulate_pose.py
│   │   ├── pytorch_ssim/
│   │   │   ├── __init__.py
│   │   │   └── ssim.py
│   │   └── structures/
│   │       ├── __init__.py
│   │       ├── depth_model.py
│   │       ├── feature_pyramid.py
│   │       ├── flowposenet.py
│   │       ├── inverse_warp.py
│   │       ├── net_utils.py
│   │       ├── pwc_tf.py
│   │       └── ransac.py
│   └── visualize/
│       ├── __init__.py
│       ├── profiler.py
│       └── visualizer.py
├── data/
│   └── eigen/
│       ├── export_gt_depth.py
│       ├── static_frames.txt
│       ├── test_files.txt
│       └── test_scenes.txt
├── infer_vo.py
├── requirements.txt
├── test.py
└── train.py
Download .txt
SYMBOL INDEX (306 symbols across 32 files)

FILE: core/config/config_utils.py
  function generate_loss_weights_dict (line 3) | def generate_loss_weights_dict(cfg):

FILE: core/dataset/kitti_2012.py
  class KITTI_2012 (line 13) | class KITTI_2012(KITTI_Prepared):
    method __init__ (line 14) | def __init__(self, data_dir, img_hw=(256, 832), init=True):
    method get_data_list (line 21) | def get_data_list(self):
    method __len__ (line 31) | def __len__(self):
    method read_cam_intrinsic (line 34) | def read_cam_intrinsic(self, calib_file):
    method __getitem__ (line 38) | def __getitem__(self, idx):

FILE: core/dataset/kitti_2015.py
  class KITTI_2015 (line 5) | class KITTI_2015(KITTI_2012):
    method __init__ (line 6) | def __init__(self, data_dir, img_hw=(256, 832)):

FILE: core/dataset/kitti_odo.py
  function process_folder (line 7) | def process_folder(q, data_dir, output_dir, stride=1):
  class KITTI_Odo (line 33) | class KITTI_Odo(object):
    method __init__ (line 34) | def __init__(self, data_dir):
    method __len__ (line 38) | def __len__(self):
    method prepare_data_mp (line 41) | def prepare_data_mp(self, output_dir, stride=1):
    method __getitem__ (line 76) | def __getitem__(self, idx):

FILE: core/dataset/kitti_prepared.py
  class KITTI_Prepared (line 10) | class KITTI_Prepared(torch.utils.data.Dataset):
    method __init__ (line 11) | def __init__(self, data_dir, num_scales=3, img_hw=(256, 832), num_iter...
    method get_data_list (line 22) | def get_data_list(self, info_file):
    method count (line 35) | def count(self):
    method rand_num (line 38) | def rand_num(self, idx):
    method __len__ (line 44) | def __len__(self):
    method resize_img (line 50) | def resize_img(self, img, img_hw):
    method random_flip_img (line 63) | def random_flip_img(self, img):
    method preprocess_img (line 69) | def preprocess_img(self, img, img_hw=None, is_test=False):
    method read_cam_intrinsic (line 78) | def read_cam_intrinsic(self, fname):
    method rescale_intrinsics (line 87) | def rescale_intrinsics(self, K, img_hw_orig, img_hw_new):
    method get_intrinsics_per_scale (line 92) | def get_intrinsics_per_scale(self, K, scale):
    method get_multiscale_intrinsics (line 99) | def get_multiscale_intrinsics(self, K, num_scales):
    method __getitem__ (line 109) | def __getitem__(self, idx):

FILE: core/dataset/kitti_raw.py
  function process_folder (line 8) | def process_folder(q, static_frames, test_scenes, data_dir, output_dir, ...
  class KITTI_RAW (line 45) | class KITTI_RAW(object):
    method __init__ (line 46) | def __init__(self, data_dir, static_frames_txt, test_scenes_txt):
    method __len__ (line 51) | def __len__(self):
    method collect_static_frame (line 54) | def collect_static_frame(self):
    method collect_test_scenes (line 66) | def collect_test_scenes(self):
    method prepare_data_mp (line 74) | def prepare_data_mp(self, output_dir, stride=1):
    method prepare_data (line 122) | def prepare_data(self, output_dir):
    method __getitem__ (line 177) | def __getitem__(self, idx):

FILE: core/dataset/nyu_v2.py
  function collect_image_list (line 14) | def collect_image_list(path):
  function process_folder (line 24) | def process_folder(q, data_dir, output_dir, stride, train_scenes):
  class NYU_Prepare (line 68) | class NYU_Prepare(object):
    method __init__ (line 69) | def __init__(self, data_dir, test_dir):
    method __len__ (line 78) | def __len__(self):
    method get_all_scenes (line 81) | def get_all_scenes(self):
    method get_test_scenes (line 90) | def get_test_scenes(self):
    method get_train_scenes (line 104) | def get_train_scenes(self):
    method prepare_data_mp (line 118) | def prepare_data_mp(self, output_dir, stride=1):
    method __getitem__ (line 163) | def __getitem__(self, idx):
  class NYU_v2 (line 168) | class NYU_v2(torch.utils.data.Dataset):
    method __init__ (line 169) | def __init__(self, data_dir, num_scales=3, img_hw=(448, 576), num_iter...
    method get_data_list (line 182) | def get_data_list(self, info_file):
    method count (line 195) | def count(self):
    method rand_num (line 198) | def rand_num(self, idx):
    method __len__ (line 204) | def __len__(self):
    method resize_img (line 210) | def resize_img(self, img, img_hw):
    method random_flip_img (line 223) | def random_flip_img(self, img):
    method undistort_img (line 229) | def undistort_img(self, img, K):
    method preprocess_img (line 250) | def preprocess_img(self, img, K, img_hw=None, is_test=False):
    method read_cam_intrinsic (line 262) | def read_cam_intrinsic(self, fname):
    method rescale_intrinsics (line 271) | def rescale_intrinsics(self, K, img_hw_orig, img_hw_new):
    method get_intrinsics_per_scale (line 277) | def get_intrinsics_per_scale(self, K, scale):
    method get_multiscale_intrinsics (line 284) | def get_multiscale_intrinsics(self, K, num_scales):
    method __getitem__ (line 294) | def __getitem__(self, idx):

FILE: core/evaluation/eval_odom.py
  function scale_lse_solver (line 9) | def scale_lse_solver(X, Y):
  function umeyama_alignment (line 22) | def umeyama_alignment(x, y, with_scale=False):
  class KittiEvalOdom (line 72) | class KittiEvalOdom():
    method __init__ (line 77) | def __init__(self):
    method loadPoses (line 81) | def loadPoses(self, file_name):
    method trajectory_distances (line 106) | def trajectory_distances(self, poses):
    method rotation_error (line 123) | def rotation_error(self, pose_error):
    method translation_error (line 131) | def translation_error(self, pose_error):
    method last_frame_from_segment_length (line 137) | def last_frame_from_segment_length(self, dist, first_frame, len_):
    method calc_sequence_errors (line 143) | def calc_sequence_errors(self, poses_gt, poses_result):
    method save_sequence_errors (line 178) | def save_sequence_errors(self, err, file_name):
    method compute_overall_err (line 185) | def compute_overall_err(self, seq_err):
    method plotPath (line 198) | def plotPath(self, seq, poses_gt, poses_result):
    method compute_segment_error (line 230) | def compute_segment_error(self, seq_errs):
    method scale_optimization (line 259) | def scale_optimization(self, gt, pred):
    method eval (line 282) | def eval(self, gt_txt, result_txt, seq=None):

FILE: core/evaluation/evaluate_depth.py
  function process_depth (line 3) | def process_depth(gt_depth, pred_depth, min_depth, max_depth):
  function eval_depth (line 13) | def eval_depth(gt_depths,

FILE: core/evaluation/evaluate_flow.py
  function get_scaled_intrinsic_matrix (line 9) | def get_scaled_intrinsic_matrix(calib_file, zoom_x, zoom_y):
  function load_intrinsics_raw (line 19) | def load_intrinsics_raw(calib_file):
  function read_raw_calib_file (line 29) | def read_raw_calib_file(filepath):
  function scale_intrinsics (line 45) | def scale_intrinsics(mat, sx, sy):
  function read_flow_gt_worker (line 53) | def read_flow_gt_worker(dir_gt, i):
  function load_gt_flow_kitti (line 60) | def load_gt_flow_kitti(gt_dataset_dir, mode):
  function calculate_error_rate (line 85) | def calculate_error_rate(epe_map, gt_flow, mask):
  function eval_flow_avg (line 93) | def eval_flow_avg(gt_flows,

FILE: core/evaluation/evaluate_mask.py
  class EvalSegErr (line 12) | class EvalSegErr(Exception):
    method __init__ (line 13) | def __init__(self, value):
    method __str__ (line 16) | def __str__(self):
  function pixel_accuracy (line 20) | def pixel_accuracy(eval_segm, gt_segm):
  function mean_accuracy (line 48) | def mean_accuracy(eval_segm, gt_segm):
  function mean_IU (line 74) | def mean_IU(eval_segm, gt_segm):
  function frequency_weighted_IU (line 104) | def frequency_weighted_IU(eval_segm, gt_segm):
  function get_pixel_area (line 140) | def get_pixel_area(segm):
  function extract_both_masks (line 144) | def extract_both_masks(eval_segm, gt_segm, cl, n_cl):
  function extract_classes (line 151) | def extract_classes(segm):
  function union_classes (line 158) | def union_classes(eval_segm, gt_segm):
  function extract_masks (line 168) | def extract_masks(segm, cl, n_cl):
  function segm_size (line 178) | def segm_size(segm):
  function check_size (line 188) | def check_size(eval_segm, gt_segm):
  function read_mask_gt_worker (line 195) | def read_mask_gt_worker(gt_dataset_dir, idx):
  function load_gt_mask (line 199) | def load_gt_mask(gt_dataset_dir):
  function eval_mask (line 216) | def eval_mask(pred_masks, gt_masks, opt):

FILE: core/evaluation/evaluation_utils.py
  function compute_errors (line 11) | def compute_errors(gt, pred, nyu=False):

FILE: core/evaluation/flowlib.py
  function show_flow (line 29) | def show_flow(filename):
  function visualize_flow (line 41) | def visualize_flow(flow, mode='Y'):
  function read_flow (line 84) | def read_flow(filename):
  function read_flow_png (line 107) | def read_flow_png(flow_file):
  function write_flow_png (line 130) | def write_flow_png(flo, flow_file):
  function write_flow (line 147) | def write_flow(flow, filename):
  function segment_flow (line 166) | def segment_flow(flow):
  function flow_error (line 203) | def flow_error(tu, tv, u, v):
  function flow_to_image (line 258) | def flow_to_image(flow):
  function evaluate_flow_file (line 299) | def evaluate_flow_file(gt, pred):
  function evaluate_flow (line 315) | def evaluate_flow(gt_flow, pred_flow):
  function read_disp_png (line 332) | def read_disp_png(file_name):
  function disp_to_flowfile (line 350) | def disp_to_flowfile(disp, filename):
  function read_image (line 378) | def read_image(filename):
  function warp_image (line 389) | def warp_image(im, flow):
  function scale_image (line 428) | def scale_image(image, new_range):
  function compute_color (line 444) | def compute_color(u, v):
  function make_color_wheel (line 488) | def make_color_wheel():

FILE: core/networks/__init__.py
  function get_model (line 8) | def get_model(mode):

FILE: core/networks/model_depth_pose.py
  class Model_depth_pose (line 14) | class Model_depth_pose(nn.Module):
    method __init__ (line 15) | def __init__(self, cfg):
    method meshgrid (line 26) | def meshgrid(self, h, w):
    method robust_rand_sample (line 32) | def robust_rand_sample(self, match, mask, num):
    method top_ratio_sample (line 52) | def top_ratio_sample(self, match, mask, ratio):
    method rand_sample (line 59) | def rand_sample(self, match, num):
    method filt_negative_depth (line 65) | def filt_negative_depth(self, point2d_1_depth, point2d_2_depth, point2...
    method filt_invalid_coord (line 91) | def filt_invalid_coord(self, point2d_1_depth, point2d_2_depth, point2d...
    method ray_angle_filter (line 123) | def ray_angle_filter(self, match, P1, P2, return_angle=False):
    method midpoint_triangulate (line 168) | def midpoint_triangulate(self, match, K_inv, P1, P2):
    method rt_from_fundamental_mat_nyu (line 199) | def rt_from_fundamental_mat_nyu(self, fmat, K, depth_match):
    method verifyRT (line 228) | def verifyRT(self, match, K_inv, P1, P2):
    method rt_from_fundamental_mat (line 239) | def rt_from_fundamental_mat(self, fmat, K, depth_match):
    method reproject (line 277) | def reproject(self, P, point3d):
    method scale_adapt (line 284) | def scale_adapt(self, depth1, depth2, eps=1e-12):
    method affine_adapt (line 291) | def affine_adapt(self, depth1, depth2, use_translation=True, eps=1e-12):
    method register_depth (line 312) | def register_depth(self, depth_pred, coord_tri, depth_tri):
    method get_trian_loss (line 331) | def get_trian_loss(self, tri_depth, pred_tri_depth):
    method get_reproj_fdp_loss (line 336) | def get_reproj_fdp_loss(self, pred1, pred2, P2, K, K_inv, valid_mask, ...
    method disp2depth (line 359) | def disp2depth(self, disp, min_depth=0.1, max_depth=100.0):
    method get_smooth_loss (line 366) | def get_smooth_loss(self, img, disp):
    method infer_depth (line 382) | def infer_depth(self, img):
    method infer_vo (line 387) | def infer_vo(self, img1, img2, K, K_inv, match_num=6000):
    method check_rt (line 402) | def check_rt(self, img1, img2, K, K_inv):
    method inference (line 426) | def inference(self, img1, img2, K, K_inv):
    method forward (line 463) | def forward(self, inputs):

FILE: core/networks/model_flow.py
  function transformerFwd (line 12) | def transformerFwd(U,
  class Model_flow (line 151) | class Model_flow(nn.Module):
    method __init__ (line 152) | def __init__(self, cfg):
    method get_occlusion_mask_from_flow (line 169) | def get_occlusion_mask_from_flow(self, tensor_size, flow):
    method get_flow_norm (line 177) | def get_flow_norm(self, flow, p=2):
    method get_visible_masks (line 185) | def get_visible_masks(self, optical_flows, optical_flows_rev):
    method get_consistent_masks (line 195) | def get_consistent_masks(self, optical_flows, optical_flows_rev):
    method generate_img_pyramid (line 219) | def generate_img_pyramid(self, img, num_pyramid):
    method warp_flow_pyramid (line 227) | def warp_flow_pyramid(self, img_pyramid, flow_pyramid):
    method compute_loss_pixel (line 233) | def compute_loss_pixel(self, img_pyramid, img_warped_pyramid, occ_mask...
    method compute_loss_ssim (line 244) | def compute_loss_ssim(self, img_pyramid, img_warped_pyramid, occ_mask_...
    method gradients (line 257) | def gradients(self, img):
    method cal_grad2_error (line 262) | def cal_grad2_error(self, flow, img):
    method compute_loss_flow_smooth (line 273) | def compute_loss_flow_smooth(self, optical_flows, img_pyramid):
    method compute_loss_flow_consis (line 282) | def compute_loss_flow_consis(self, fwd_flow_diff_pyramid, occ_mask_list):
    method inference_flow (line 293) | def inference_flow(self, img1, img2):
    method inference_corres (line 299) | def inference_corres(self, img1, img2):
    method forward (line 319) | def forward(self, inputs, output_flow=False, use_flow_loss=True):

FILE: core/networks/model_flowposenet.py
  function mean_on_mask (line 16) | def mean_on_mask(diff, valid_mask):
  function edge_aware_smoothness_loss (line 22) | def edge_aware_smoothness_loss(pred_disp, img, max_scales):
  function compute_smooth_loss (line 64) | def compute_smooth_loss(tgt_depth, tgt_img, ref_depth, ref_img, max_scal...
  class Model_flowposenet (line 71) | class Model_flowposenet(nn.Module):
    method __init__ (line 72) | def __init__(self, cfg):
    method compute_pairwise_loss (line 79) | def compute_pairwise_loss(self, tgt_img, ref_img, tgt_depth, ref_depth...
    method disp2depth (line 106) | def disp2depth(self, disp, min_depth=0.01, max_depth=80.0):
    method infer_depth (line 113) | def infer_depth(self, img):
    method inference (line 119) | def inference(self, img1, img2, K, K_inv):
    method inference_flow (line 123) | def inference_flow(self, img1, img2):
    method infer_pose (line 127) | def infer_pose(self, img1, img2, K, K_inv):
    method forward (line 135) | def forward(self, inputs):

FILE: core/networks/model_triangulate_pose.py
  class Model_triangulate_pose (line 11) | class Model_triangulate_pose(nn.Module):
    method __init__ (line 12) | def __init__(self, cfg):
    method meshgrid (line 24) | def meshgrid(self, h, w):
    method compute_epipolar_loss (line 30) | def compute_epipolar_loss(self, fmat, match, mask):
    method get_rigid_mask (line 53) | def get_rigid_mask(self, dist_map):
    method inference (line 59) | def inference(self, img1, img2, K, K_inv):
    method forward (line 76) | def forward(self, inputs, output_F=False, visualizer=None):

FILE: core/networks/pytorch_ssim/ssim.py
  function SSIM (line 4) | def SSIM(x, y):

FILE: core/networks/structures/depth_model.py
  class ResNetMultiImageInput (line 17) | class ResNetMultiImageInput(models.ResNet):
    method __init__ (line 21) | def __init__(self, block, layers, num_classes=1000, num_input_images=1):
  function resnet_multiimage_input (line 41) | def resnet_multiimage_input(num_layers, pretrained=False, num_input_imag...
  class ResnetEncoder (line 60) | class ResnetEncoder(nn.Module):
    method __init__ (line 63) | def __init__(self, num_layers, pretrained, num_input_images=1):
    method forward (line 85) | def forward(self, input_image):
  class ConvBlock (line 97) | class ConvBlock(nn.Module):
    method __init__ (line 100) | def __init__(self, in_channels, out_channels):
    method forward (line 106) | def forward(self, x):
  class Conv3x3 (line 111) | class Conv3x3(nn.Module):
    method __init__ (line 114) | def __init__(self, in_channels, out_channels, use_refl=True):
    method forward (line 123) | def forward(self, x):
  function upsample (line 128) | def upsample(x):
  class DepthDecoder (line 135) | class DepthDecoder(nn.Module):
    method __init__ (line 136) | def __init__(self, num_ch_enc, scales=range(4), num_output_channels=1,...
    method init_decoder (line 149) | def init_decoder(self):
    method forward (line 173) | def forward(self, input_features):
  class Depth_Model (line 193) | class Depth_Model(nn.Module):
    method __init__ (line 194) | def __init__(self, depth_scale, num_layers=18):
    method forward (line 200) | def forward(self, img):

FILE: core/networks/structures/feature_pyramid.py
  class FeaturePyramid (line 7) | class FeaturePyramid(nn.Module):
    method __init__ (line 8) | def __init__(self):
    method forward (line 29) | def forward(self, img):

FILE: core/networks/structures/flowposenet.py
  function conv (line 7) | def conv(in_planes, out_planes, kernel_size=3):
  function upconv (line 13) | def upconv(in_planes, out_planes):
  class FlowPoseNet (line 19) | class FlowPoseNet(nn.Module):
    method __init__ (line 21) | def __init__(self):
    method init_weights (line 35) | def init_weights(self):
    method forward (line 42) | def forward(self, flow):

FILE: core/networks/structures/inverse_warp.py
  function set_id_grid (line 8) | def set_id_grid(depth):
  function check_sizes (line 20) | def check_sizes(input, input_name, expected):
  function pixel2cam (line 29) | def pixel2cam(depth, intrinsics_inv):
  function cam2pixel (line 47) | def cam2pixel(cam_coords, proj_c2p_rot, proj_c2p_tr, padding_mode):
  function euler2mat (line 77) | def euler2mat(angle):
  function quat2mat (line 115) | def quat2mat(quat):
  function pose_vec2mat (line 139) | def pose_vec2mat(vec, rotation_mode='euler'):
  function inverse_warp (line 157) | def inverse_warp(img, depth, pose, intrinsics, rotation_mode='euler', pa...
  function cam2pixel2 (line 194) | def cam2pixel2(cam_coords, proj_c2p_rot, proj_c2p_tr, padding_mode):
  function inverse_warp2 (line 230) | def inverse_warp2(img, depth, ref_depth, pose, intrinsics, padding_mode=...

FILE: core/networks/structures/net_utils.py
  function conv (line 7) | def conv(in_planes, out_planes, kernel_size=3, stride=1, padding=1, dila...
  function deconv (line 13) | def deconv(in_planes, out_planes, kernel_size=4, stride=2, padding=1):
  function warp_flow (line 16) | def warp_flow(x, flow, use_mask=False):

FILE: core/networks/structures/pwc_tf.py
  class PWC_tf (line 16) | class PWC_tf(nn.Module):
    method __init__ (line 17) | def __init__(self, md=4):
    method predict_flow (line 91) | def predict_flow(self, in_planes):
    method warp (line 94) | def warp(self, x, flow):
    method corr_naive (line 97) | def corr_naive(self, input1, input2, d=4):
    method forward (line 108) | def forward(self, feature_list_1, feature_list_2, img_hw):

FILE: core/networks/structures/ransac.py
  class reduced_ransac (line 8) | class reduced_ransac(nn.Module):
    method __init__ (line 9) | def __init__(self, check_num, thres, dataset):
    method robust_rand_sample (line 15) | def robust_rand_sample(self, match, mask, num, robust=True):
    method top_ratio_sample (line 36) | def top_ratio_sample(self, match, mask, ratio):
    method forward (line 44) | def forward(self, match, mask, visualizer=None):

FILE: core/visualize/profiler.py
  class Profiler (line 6) | class Profiler(object):
    method __init__ (line 7) | def __init__(self, silent=False):
    method reset (line 13) | def reset(self, silent=None):
    method report_process (line 18) | def report_process(self, process_name):
    method report_all (line 26) | def report_all(self, whole_process_name):

FILE: core/visualize/visualizer.py
  class Visualizer (line 15) | class Visualizer(object):
    method __init__ (line 16) | def __init__(self, loss_weights_dict, dump_dir=None):
    method add_log_pack (line 23) | def add_log_pack(self, log_pack):
    method dump_log (line 26) | def dump_log(self, fname=None):
    method print_loss (line 32) | def print_loss(self, loss_pack, iter_=None):
  class Visualizer_debug (line 50) | class Visualizer_debug():
    method __init__ (line 51) | def __init__(self, dump_dir=None, img1=None, img2=None):
    method draw_point_corres (line 56) | def draw_point_corres(self, batch_idx, match, name):
    method draw_invalid_corres_ray (line 62) | def draw_invalid_corres_ray(self, img1, img2, depth_match, point2d_1_c...
    method draw_epipolar_line (line 69) | def draw_epipolar_line(self, batch_idx, match, F, name):
    method show_corres (line 76) | def show_corres(self, img1, img2, match, name):
    method show_mask (line 93) | def show_mask(self, mask, name):
    method save_img (line 98) | def save_img(self, img, name):
    method save_depth_img (line 101) | def save_depth_img(self, depth, name):
    method save_disp_color_img (line 109) | def save_disp_color_img(self, disp, name):
    method drawlines (line 120) | def drawlines(self, img1, img2, lines, pts1, pts2):
    method show_epipolar_line (line 133) | def show_epipolar_line(self, img1, img2, match, F, name):
    method show_ray (line 153) | def show_ray(self, ax, K, RT, point2d, cmap='Greens'):
    method visualize_points (line 166) | def visualize_points(self, ax, points, cmap=None):
    method scatter_3d (line 171) | def scatter_3d(self, ax, point, scatter_color='r'):
    method visualize_two_rays (line 174) | def visualize_two_rays(self, ax, match, P1, P2):

FILE: data/eigen/export_gt_depth.py
  function load_velodyne_points (line 19) | def load_velodyne_points(filename):
  function read_calib_file (line 28) | def read_calib_file(path):
  function sub2ind (line 50) | def sub2ind(matrixSize, rowSub, colSub):
  function generate_depth_map (line 57) | def generate_depth_map(calib_dir, velo_filename, cam=2, vel_depth=False):
  function export_gt_depths_kitti (line 112) | def export_gt_depths_kitti():

FILE: infer_vo.py
  function save_traj (line 19) | def save_traj(path, poses):
  function projection (line 31) | def projection(xy, points, h_max, w_max):
  function unprojection (line 47) | def unprojection(xy, depth, K):
  function cv_triangulation (line 62) | def cv_triangulation(matches, pose):
  class infer_vo (line 76) | class infer_vo():
    method __init__ (line 77) | def __init__(self, seq_id, sequences_root_dir):
    method read_rescale_camera_intrinsics (line 99) | def read_rescale_camera_intrinsics(self, path):
    method load_images (line 114) | def load_images(self):
    method get_prediction (line 129) | def get_prediction(self, img1, img2, model, K, K_inv, match_num):
    method process_video (line 141) | def process_video(self, images, model):
    method normalize_coord (line 175) | def normalize_coord(self, xy, K):
    method align_to_depth (line 182) | def align_to_depth(self, xy1, xy2, pose, depth2):
    method solve_pose_pnp (line 213) | def solve_pose_pnp(self, xy1, xy2, depth1):
    method solve_pose_flow (line 255) | def solve_pose_flow(self, xy1, xy2):
  class pObject (line 311) | class pObject(object):
    method __init__ (line 312) | def __init__(self):

FILE: test.py
  function test_kitti_2012 (line 16) | def test_kitti_2012(cfg, model, gt_flows, noc_masks):
  function test_kitti_2015 (line 43) | def test_kitti_2015(cfg, model, gt_flows, noc_masks, gt_masks, depth_sav...
  function disp2depth (line 78) | def disp2depth(disp, min_depth=0.001, max_depth=80.0):
  function resize_depths (line 85) | def resize_depths(gt_depth_list, pred_disp_list):
  function test_eigen_depth (line 99) | def test_eigen_depth(cfg, model):
  function resize_disp (line 130) | def resize_disp(pred_disp_list, gt_depths):
  function load_nyu_test_data (line 143) | def load_nyu_test_data(data_dir):
  function test_nyu (line 153) | def test_nyu(cfg, model, test_images, test_gt_depths):
  function test_single_image (line 185) | def test_single_image(img_path, model, training_hw, save_dir='./'):
  class pObject (line 227) | class pObject(object):
    method __init__ (line 228) | def __init__(self):

FILE: train.py
  function save_model (line 19) | def save_model(iter_, model_dir, filename, model, optimizer):
  function load_model (line 22) | def load_model(model_dir, filename, model, optimizer):
  function train (line 29) | def train(cfg):
  class pObject (line 204) | class pObject(object):
    method __init__ (line 205) | def __init__(self):
Condensed preview — 54 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (690K chars).
[
  {
    "path": "LICENSE",
    "chars": 1068,
    "preview": "MIT License\n\nCopyright (c) 2021 Shaohui Liu\n\nPermission is hereby granted, free of charge, to any person obtaining a cop"
  },
  {
    "path": "README.md",
    "chars": 7085,
    "preview": "## Towards Better Generalization: Joint Depth-Pose Learning without PoseNet\nCreated by <a href=\"https://github.com/thuzh"
  },
  {
    "path": "config/kitti.yaml",
    "chars": 874,
    "preview": "cfg_name: 'default'\n\n# dataset\nraw_base_dir: '/home4/zhaow/data/kitti'\nprepared_base_dir: '/home5/zhaow/data/kitti_relea"
  },
  {
    "path": "config/kitti_3stage.yaml",
    "chars": 978,
    "preview": "cfg_name: 'default'\n\n# dataset\nraw_base_dir: '/home/zhaow/data/kitti'\nprepared_base_dir: '/home/zhaow/data/kitti_seq'\ngt"
  },
  {
    "path": "config/nyu.yaml",
    "chars": 670,
    "preview": "cfg_name: 'default'\n\n# dataset\nraw_base_dir: '/home5/zhaow/data/nyuv2'\nprepared_base_dir: '/home5/zhaow/data/nyu_seq_rel"
  },
  {
    "path": "config/nyu_192.yaml",
    "chars": 670,
    "preview": "cfg_name: 'default'\n\n# dataset\nraw_base_dir: '/home5/zhaow/data/nyuv2'\nprepared_base_dir: '/home5/zhaow/data/nyu_seq_rel"
  },
  {
    "path": "config/nyu_192_3stage.yaml",
    "chars": 698,
    "preview": "cfg_name: 'default'\n\n# dataset\nraw_base_dir: '/home5/zhaow/data/nyuv2'\nprepared_base_dir: '/home5/zhaow/data/nyu_seq_rel"
  },
  {
    "path": "config/nyu_3stage.yaml",
    "chars": 698,
    "preview": "cfg_name: 'default'\n\n# dataset\nraw_base_dir: '/home5/zhaow/data/nyuv2'\nprepared_base_dir: '/home5/zhaow/data/nyu_seq_rel"
  },
  {
    "path": "config/nyu_posenet_192.yaml",
    "chars": 671,
    "preview": "cfg_name: 'default'\n\n# dataset\nraw_base_dir: '/home5/zhaow/data/nyuv2_sub2'\nprepared_base_dir: '/home5/zhaow/data/nyu_se"
  },
  {
    "path": "config/odo.yaml",
    "chars": 740,
    "preview": "cfg_name: 'default'\n\n# dataset\nraw_base_dir: '/home4/zhaow/data/kitti_odometry/sequences'\nprepared_base_dir: '/home5/zha"
  },
  {
    "path": "config/odo_3stage.yaml",
    "chars": 751,
    "preview": "cfg_name: 'default'\n\n# dataset\nraw_base_dir: '/home4/zhaow/data/kitti_odometry/sequences'\nprepared_base_dir: '/home5/zha"
  },
  {
    "path": "config/odo_posenet.yaml",
    "chars": 748,
    "preview": "cfg_name: 'default'\n\n# dataset\nraw_base_dir: '/home4/zhaow/data/kitti_odometry/sequences'\nprepared_base_dir: '/home5/zha"
  },
  {
    "path": "core/config/__init__.py",
    "chars": 128,
    "preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom config_utils import generate_loss_weight"
  },
  {
    "path": "core/config/config_utils.py",
    "chars": 546,
    "preview": "import os, sys\n\ndef generate_loss_weights_dict(cfg):\n    weight_dict = {}\n    weight_dict['loss_pixel'] = 1 - cfg.w_ssim"
  },
  {
    "path": "core/dataset/__init__.py",
    "chars": 287,
    "preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom kitti_raw import KITTI_RAW\nfrom kitti_pr"
  },
  {
    "path": "core/dataset/kitti_2012.py",
    "chars": 2288,
    "preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom kitti_prepared import KITTI_Prepared\nsys"
  },
  {
    "path": "core/dataset/kitti_2015.py",
    "chars": 378,
    "preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom kitti_2012 import KITTI_2012\n\nclass KITT"
  },
  {
    "path": "core/dataset/kitti_odo.py",
    "chars": 3745,
    "preview": "import os, sys\nimport numpy as np\nimport imageio\nfrom tqdm import tqdm\nimport torch.multiprocessing as mp\n\ndef process_f"
  },
  {
    "path": "core/dataset/kitti_prepared.py",
    "chars": 4630,
    "preview": "import os, sys\nimport numpy as np\nimport cv2\nimport copy\n\nimport torch\nimport torch.utils.data\nimport pdb\n\nclass KITTI_P"
  },
  {
    "path": "core/dataset/kitti_raw.py",
    "chars": 8568,
    "preview": "import os, sys\nimport numpy as np\nimport imageio\nfrom tqdm import tqdm\nimport torch.multiprocessing as mp\nimport pdb\n\nde"
  },
  {
    "path": "core/dataset/nyu_v2.py",
    "chars": 13298,
    "preview": "import os, sys\nimport numpy as np\nimport imageio\nimport cv2\nimport copy\nimport h5py\nimport scipy.io as sio\nimport torch\n"
  },
  {
    "path": "core/evaluation/__init__.py",
    "chars": 212,
    "preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom evaluate_flow import eval_flow_avg, load"
  },
  {
    "path": "core/evaluation/eval_odom.py",
    "chars": 13822,
    "preview": "import copy\nfrom matplotlib import pyplot as plt\nimport numpy as np\nimport os\nfrom glob import glob\nimport pdb\n\n\ndef sca"
  },
  {
    "path": "core/evaluation/evaluate_depth.py",
    "chars": 1977,
    "preview": "from evaluation_utils import *\n\ndef process_depth(gt_depth, pred_depth, min_depth, max_depth):\n    mask = gt_depth > 0\n "
  },
  {
    "path": "core/evaluation/evaluate_flow.py",
    "chars": 6496,
    "preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nimport numpy as np\nfrom flowlib import read_f"
  },
  {
    "path": "core/evaluation/evaluate_mask.py",
    "chars": 6269,
    "preview": "import os\nimport numpy as np\nimport cv2\nimport functools\nimport matplotlib.pyplot as plt\nimport multiprocessing\n\n\"\"\"\nAdo"
  },
  {
    "path": "core/evaluation/evaluation_utils.py",
    "chars": 870,
    "preview": "import numpy as np\nimport os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nimport cv2, skimage\nimport"
  },
  {
    "path": "core/evaluation/flowlib.py",
    "chars": 14420,
    "preview": "#!/usr/bin/python\n\"\"\"\nAdopted from https://github.com/liruoteng/OpticalFlowToolkit\n# ==============================\n# fl"
  },
  {
    "path": "core/networks/__init__.py",
    "chars": 635,
    "preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom model_flow import Model_flow\nfrom model_"
  },
  {
    "path": "core/networks/model_depth_pose.py",
    "chars": 30237,
    "preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom structures import *\nfrom model_triangula"
  },
  {
    "path": "core/networks/model_flow.py",
    "chars": 18005,
    "preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom structures import *\nfrom pytorch_ssim im"
  },
  {
    "path": "core/networks/model_flowposenet.py",
    "chars": 6517,
    "preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom structures import *\nfrom pytorch_ssim im"
  },
  {
    "path": "core/networks/model_triangulate_pose.py",
    "chars": 5969,
    "preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nimport torch\nimport torch.nn as nn\nimport num"
  },
  {
    "path": "core/networks/pytorch_ssim/__init__.py",
    "chars": 98,
    "preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom ssim import SSIM\n\n"
  },
  {
    "path": "core/networks/pytorch_ssim/ssim.py",
    "chars": 535,
    "preview": "import torch\nimport torch.nn as nn\n\ndef SSIM(x, y):\n    C1 = 0.01 ** 2\n    C2 = 0.03 ** 2\n\n    mu_x = nn.AvgPool2d(3, 1,"
  },
  {
    "path": "core/networks/structures/__init__.py",
    "chars": 335,
    "preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom feature_pyramid import FeaturePyramid\nfr"
  },
  {
    "path": "core/networks/structures/depth_model.py",
    "chars": 7964,
    "preview": "'''\nThis code was ported from existing repos\n[LINK] https://github.com/nianticlabs/monodepth2\n'''\nfrom __future__ import"
  },
  {
    "path": "core/networks/structures/feature_pyramid.py",
    "chars": 1586,
    "preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom net_utils import conv\nimport torch\nimpor"
  },
  {
    "path": "core/networks/structures/flowposenet.py",
    "chars": 1951,
    "preview": "import torch\nimport torch.nn as nn\nfrom torch import sigmoid\nfrom torch.nn.init import xavier_uniform_, zeros_\n\n\ndef con"
  },
  {
    "path": "core/networks/structures/inverse_warp.py",
    "chars": 10018,
    "preview": "from __future__ import division\nimport torch\nimport torch.nn.functional as F\n\npixel_coords = None\n\n\ndef set_id_grid(dept"
  },
  {
    "path": "core/networks/structures/net_utils.py",
    "chars": 2088,
    "preview": "import torch\nimport torch.nn as nn\nfrom torch.autograd import Variable\nimport pdb\nimport numpy as np\n\ndef conv(in_planes"
  },
  {
    "path": "core/networks/structures/pwc_tf.py",
    "chars": 8423,
    "preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom net_utils import conv, deconv, warp_flow"
  },
  {
    "path": "core/networks/structures/ransac.py",
    "chars": 3145,
    "preview": "import torch\nimport numpy as np\nimport os, sys\nimport torch.nn as nn\nimport pdb\nimport cv2\n\nclass reduced_ransac(nn.Modu"
  },
  {
    "path": "core/visualize/__init__.py",
    "chars": 179,
    "preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom visualizer import Visualizer\nfrom visual"
  },
  {
    "path": "core/visualize/profiler.py",
    "chars": 887,
    "preview": "import os\nimport time\nimport torch\nimport pdb\n\nclass Profiler(object):\n    def __init__(self, silent=False):\n        sel"
  },
  {
    "path": "core/visualize/visualizer.py",
    "chars": 8652,
    "preview": "import os, sys\nimport numpy as np\nimport cv2\nimport pdb\nimport pickle\nfrom mpl_toolkits import mplot3d\nimport matplotlib"
  },
  {
    "path": "data/eigen/export_gt_depth.py",
    "chars": 4963,
    "preview": "# Copyright Niantic 2019. Patent Pending. All rights reserved.\n#\n# This software is licensed under the terms of the Mono"
  },
  {
    "path": "data/eigen/static_frames.txt",
    "chars": 394155,
    "preview": "2011_09_26 2011_09_26_drive_0009_sync 0000000386\n2011_09_26 2011_09_26_drive_0009_sync 0000000387\n2011_09_26 2011_09_26_"
  },
  {
    "path": "data/eigen/test_files.txt",
    "chars": 35547,
    "preview": "2011_09_26/2011_09_26_drive_0002_sync 0000000069 l\n2011_09_26/2011_09_26_drive_0002_sync 0000000054 l\n2011_09_26/2011_09"
  },
  {
    "path": "data/eigen/test_scenes.txt",
    "chars": 615,
    "preview": "2011_09_26_drive_0117\n2011_09_28_drive_0002\n2011_09_26_drive_0052\n2011_09_30_drive_0016\n2011_09_26_drive_0059\n2011_09_26"
  },
  {
    "path": "infer_vo.py",
    "chars": 13467,
    "preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom core.networks.model_depth_pose import Mo"
  },
  {
    "path": "requirements.txt",
    "chars": 475,
    "preview": "cffi==1.12.3\ncycler==0.10.0\nCython==0.29.13\ndecorator==4.4.0\neasydict==1.9\nh5py==2.10.0\nimageio==2.5.0\njoblib==0.14.0\nki"
  },
  {
    "path": "test.py",
    "chars": 10749,
    "preview": "import os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom core.dataset import KITTI_2012, KITTI_20"
  },
  {
    "path": "train.py",
    "chars": 10881,
    "preview": "import os, sys\nimport yaml\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nfrom core.dataset import KITTI_RA"
  }
]

About this extraction

This page contains the full source code of the B1ueber2y/TrianFlow GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 54 files (655.9 KB), approximately 282.2k tokens, and a symbol index with 306 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!