Repository: NVlabs/Bi3D Branch: master Commit: 4b5fdb48d820 Files: 20 Total size: 69.8 KB Directory structure: gitextract_ulzmvk8p/ ├── .gitignore ├── LICENSE.md ├── README.md ├── envs/ │ ├── bi3d_conda_env.yml │ └── bi3d_pytorch_19_01.DockerFile └── src/ ├── models/ │ ├── Bi3DNet.py │ ├── DispRefine2D.py │ ├── FeatExtractNet.py │ ├── GCNet.py │ ├── PSMNet.py │ ├── RefineNet2D.py │ ├── RefineNet3D.py │ ├── SegNet2D.py │ └── __init__.py ├── project.toml ├── run_binary_depth_estimation.py ├── run_continuous_depth_estimation.py ├── run_demo_kitti15.sh ├── run_demo_sf.sh └── util.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ # Add any directories, files, or patterns you don't want to be tracked by version control *.png *.pfm *.pth.tar *.npy *.ppm *.pyc *.tar *.zip *.gif ================================================ FILE: LICENSE.md ================================================ # NVIDIA Source Code License for Bi3D ## 1. Definitions “Licensor” means any person or entity that distributes its Work. “Software” means the original work of authorship made available under this License. “Work” means the Software and any additions to or derivative works of the Software that are made available under this License. “NVIDIA Processors” means any central processing unit (CPU), graphics processing unit (GPU), field-programmable gate array (FPGA), application-specific integrated circuit (ASIC) or any combination thereof designed, made, sold, or provided by NVIDIA or its affiliates. The terms “reproduce,” “reproduction,” “derivative works,” and “distribution” have the meaning as provided under U.S. copyright law; provided, however, that for the purposes of this License, derivative works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work. Works, including the Software, are “made available” under this License by including in or with the Work either (a) a copyright notice referencing the applicability of this License to the Work, or (b) a copy of this License. ## 2. License Grant ### 2.1 Copyright Grant. Subject to the terms and conditions of this License, each Licensor grants to you a perpetual, worldwide, non-exclusive, royalty-free, copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense and distribute its Work and any resulting derivative works in any form. ## 3. Limitations ### 3.1 Redistribution. You may reproduce or distribute the Work only if (a) you do so under this License, (b) you include a complete copy of this License with your distribution, and (c) you retain without modification any copyright, patent, trademark, or attribution notices that are present in the Work. ### 3.2 Derivative Works. You may specify that additional or different terms apply to the use, reproduction, and distribution of your derivative works of the Work (“Your Terms”) only if (a) Your Terms provide that the use limitation in Section 3.3 applies to your derivative works, and (b) you identify the specific derivative works that are subject to Your Terms. Notwithstanding Your Terms, this License (including the redistribution requirements in Section 3.1) will continue to apply to the Work itself. ### 3.3 Use Limitation. The Work and any derivative works thereof only may be used or intended for use non-commercially and with NVIDIA Processors. Notwithstanding the foregoing, NVIDIA and its affiliates may use the Work and any derivative works commercially. As used herein, “non-commercially” means for research or evaluation purposes only. ### 3.4 Patent Claims. If you bring or threaten to bring a patent claim against any Licensor (including any claim, cross-claim or counterclaim in a lawsuit) to enforce any patents that you allege are infringed by any Work, then your rights under this License from such Licensor (including the grant in Section 2.1) will terminate immediately. ### 3.5 Trademarks. This License does not grant any rights to use any Licensor’s or its affiliates’ names, logos, or trademarks, except as necessary to reproduce the notices described in this License. ### 3.6 Termination. If you violate any term of this License, then your rights under this License (including the grant in Section 2.1) will terminate immediately. ## 4. Disclaimer of Warranty. THE WORK IS PROVIDED “AS IS” WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON-INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER THIS LICENSE. ## 5. Limitation of Liability. EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF OR RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK (INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION, LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY OTHER COMMERCIAL DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ================================================ FILE: README.md ================================================ ## Bi3D — Official PyTorch Implementation ![Teaser image](data/teaser.png) **Bi3D: Stereo Depth Estimation via Binary Classifications**
Abhishek Badki, Alejandro Troccoli, Kihwan Kim, Jan Kautz, Pradeep Sen, and Orazio Gallo
IEEE CVPR 2020
## Abstract: *Stereo-based depth estimation is a cornerstone of computer vision, with state-of-the-art methods delivering accurate results in real time. For several applications such as autonomous navigation, however, it may be useful to trade accuracy for lower latency. We present Bi3D, a method that estimates depth via a series of binary classifications. Rather than testing if objects are* at *a particular depth D, as existing stereo methods do, it classifies them as being* closer *or* farther *than D. This property offers a powerful mechanism to balance accuracy and latency. Given a strict time budget, Bi3D can detect objects closer than a given distance in as little as a few milliseconds, or estimate depth with arbitrarily coarse quantization, with complexity linear with the number of quantization levels. Bi3D can also use the allotted quantization levels to get continuous depth, but in a specific depth range. For standard stereo (i.e., continuous depth on the whole range), our method is close to or on par with state-of-the-art, finely tuned stereo methods.* ## Paper: https://arxiv.org/pdf/2005.07274.pdf
## Videos:
## Citing Bi3D: @InProceedings{badki2020Bi3D, author = {Badki, Abhishek and Troccoli, Alejandro and Kim, Kihwan and Kautz, Jan and Sen, Pradeep and Gallo, Orazio}, title = {{Bi3D}: {S}tereo Depth Estimation via Binary Classifications}, booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2020} } or the arXiv paper @InProceedings{badki2020Bi3D, author = {Badki, Abhishek and Troccoli, Alejandro and Kim, Kihwan and Kautz, Jan and Sen, Pradeep and Gallo, Orazio}, title = {{Bi3D}: {S}tereo Depth Estimation via Binary Classifications}, booktitle = {arXiv preprint arXiv:2005.07274}, year = {2020} } ## Code:
### License Copyright (C) 2020 NVIDIA Corporation. All rights reserved. Licensed under the [NVIDIA Source Code License](LICENSE.md) ### Description ### Setup We offer two ways of setting up your environemnt, through Docker or Conda. #### Docker For convenience, we provide a Dockerfile to build a container image to run the code. The image will contain the Python dependencies. System requirements: 1. Docker (Tested on version 19.03.11) 2. [NVIDIA Docker](https://github.com/NVIDIA/nvidia-docker/wiki) 3. NVIDIA GPU driver. Build the container image: ``` docker build -t bi3d . -f envs/bi3d_pytorch_19_01.DockerFile ``` To launch the container, run the following: ``` docker run --rm -it --gpus=all -v $(pwd):/bi3d -w /bi3d --net=host --ipc=host bi3d:latest /bin/bash ``` #### Conda All dependencies will be installed automatically using the following: ``` conda env create -f envs/bi3d_conda_env.yml ``` You can activate the environment by running: ``` conda activate bi3d ``` ### Pre-trained models Download the pre-trained models [here](https://drive.google.com/file/d/1X4Ing9WumtIxonNXXCzKJulJtPgzk61n). ### Run the demo ``` cd src # RUN DEMO FOR SCENEFLOW DATASET sh run_demo_sf.sh # RUN DEMO FOR KITTI15 DATASET sh run_demo_kitti15.sh ``` ================================================ FILE: envs/bi3d_conda_env.yml ================================================ name: bi3d channels: - pytorch - soumith - defaults dependencies: - _libgcc_mutex=0.1=main - blas=1.0=mkl - ca-certificates=2020.6.24=0 - certifi=2020.6.20=py37_0 - cudatoolkit=10.0.130=0 - freetype=2.10.2=h5ab3b9f_0 - intel-openmp=2020.1=217 - jpeg=9b=h024ee3a_2 - lcms2=2.11=h396b838_0 - ld_impl_linux-64=2.33.1=h53a641e_7 - libedit=3.1.20191231=h14c3975_1 - libffi=3.3=he6710b0_2 - libgcc-ng=9.1.0=hdf63c60_0 - libgfortran-ng=7.3.0=hdf63c60_0 - libpng=1.6.37=hbc83047_0 - libstdcxx-ng=9.1.0=hdf63c60_0 - libtiff=4.1.0=h2733197_1 - lz4-c=1.9.2=he6710b0_0 - mkl=2020.1=217 - mkl-service=2.3.0=py37he904b0f_0 - mkl_fft=1.1.0=py37h23d657b_0 - mkl_random=1.1.1=py37h0573a6f_0 - ncurses=6.2=he6710b0_1 - ninja=1.9.0=py37hfd86e86_0 - numpy=1.18.5=py37ha1c710e_0 - numpy-base=1.18.5=py37hde5b4d6_0 - olefile=0.46=py_0 - openssl=1.1.1g=h7b6447c_0 - pillow=7.2.0=py37hb39fc2d_0 - pip=20.1.1=py37_1 - python=3.7.7=hcff3b4d_5 - pytorch=1.4.0=py3.7_cuda10.0.130_cudnn7.6.3_0 - readline=8.0=h7b6447c_0 - setuptools=49.2.0=py37_0 - six=1.15.0=py_0 - sqlite=3.32.3=h62c20be_0 - tk=8.6.10=hbc83047_0 - torchvision=0.5.0=py37_cu100 - wheel=0.34.2=py37_0 - xz=5.2.5=h7b6447c_0 - zlib=1.2.11=h7b6447c_3 - zstd=1.4.5=h0b5b093_0 - pip: - imageio==2.9.0 - opencv-python==4.3.0.36 - protobuf==3.12.2 - tensorboardx==2.1 ================================================ FILE: envs/bi3d_pytorch_19_01.DockerFile ================================================ FROM nvcr.io/nvidia/pytorch:19.01-py3 RUN pip install Pillow RUN pip install imageio RUN pip install tensorboardX RUN pip install opencv-python ================================================ FILE: src/models/Bi3DNet.py ================================================ # Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. # # NVIDIA CORPORATION and its licensors retain all intellectual property # and proprietary rights in and to this software, related documentation # and any modifications thereto. Any use, reproduction, disclosure or # distribution of this software and related documentation without an express # license agreement from NVIDIA CORPORATION is strictly prohibited. import numpy as np import torch import torch.nn as nn from torch.autograd import Variable import torch.nn.functional as F import models.FeatExtractNet as FeatNet import models.SegNet2D as SegNet import models.RefineNet2D as RefineNet import models.RefineNet3D as RefineNet3D __all__ = ["bi3dnet_binary_depth", "bi3dnet_continuous_depth_2D", "bi3dnet_continuous_depth_3D"] def compute_cost_volume(features_left, features_right, disp_ids, max_disp, is_disps_per_example): batch_size = features_left.shape[0] feature_size = features_left.shape[1] H = features_left.shape[2] W = features_left.shape[3] psv_size = disp_ids.shape[1] psv = Variable(features_left.new_zeros(batch_size, psv_size, feature_size * 2, H, W + max_disp)).cuda() if is_disps_per_example: for i in range(batch_size): psv[i, 0, :feature_size, :, 0:W] = features_left[i] psv[i, 0, feature_size:, :, disp_ids[i, 0] : W + disp_ids[i, 0]] = features_right[i] psv = psv.contiguous() else: for i in range(psv_size): psv[:, i, :feature_size, :, 0:W] = features_left psv[:, i, feature_size:, :, disp_ids[0, i] : W + disp_ids[0, i]] = features_right psv = psv.contiguous() return psv """ Bi3DNet for continuous depthmap generation. Doesn't use 3D regularization. """ class Bi3DNetContinuousDepth2D(nn.Module): def __init__(self, options, featnet_arch, segnet_arch, refinenet_arch=None, max_disparity=192): super(Bi3DNetContinuousDepth2D, self).__init__() self.max_disparity = max_disparity self.max_disparity_seg = int(self.max_disparity / 3) self.is_disps_per_example = False self.is_save_memory = False self.is_refine = True if refinenet_arch == None: self.is_refine = False self.featnet = FeatNet.__dict__[featnet_arch](options, data=None) self.segnet = SegNet.__dict__[segnet_arch](options, data=None) if self.is_refine: self.refinenet = RefineNet.__dict__[refinenet_arch](options, data=None) return def forward(self, img_left, img_right, disp_ids): batch_size = img_left.shape[0] psv_size = disp_ids.shape[1] if psv_size == 1: self.is_disps_per_example = True else: self.is_disps_per_example = False # Feature Extraction features_left = self.featnet(img_left) features_right = self.featnet(img_right) feature_size = features_left.shape[1] H = features_left.shape[2] W = features_left.shape[3] # Cost Volume Generation psv = compute_cost_volume( features_left, features_right, disp_ids, self.max_disparity_seg, self.is_disps_per_example ) psv = psv.view(batch_size * psv_size, feature_size * 2, H, W + self.max_disparity_seg) # Segmentation Network seg_raw_low_res = self.segnet(psv)[:, :, :, :W] seg_raw_low_res = seg_raw_low_res.view(batch_size, 1, psv_size, H, W) # Upsampling seg_prob_low_res_up = torch.sigmoid( F.interpolate( seg_raw_low_res, size=[psv_size * 3, img_left.size()[-2], img_left.size()[-1]], mode="trilinear", align_corners=False, ) ) seg_prob_low_res_up = seg_prob_low_res_up[:, 0, 1:-1, :, :] # Projection disparity_normalized = torch.mean((seg_prob_low_res_up), dim=1, keepdim=True) # Refinement if self.is_refine: refine_net_input = torch.cat((disparity_normalized, img_left), dim=1) disparity_normalized = self.refinenet(refine_net_input) return seg_prob_low_res_up, disparity_normalized def bi3dnet_continuous_depth_2D(options, data=None): print("==> USING Bi3DNetContinuousDepth2D") for key in options: if "bi3dnet" in key: print("{} : {}".format(key, options[key])) model = Bi3DNetContinuousDepth2D( options, featnet_arch=options["bi3dnet_featnet_arch"], segnet_arch=options["bi3dnet_segnet_arch"], refinenet_arch=options["bi3dnet_refinenet_arch"], max_disparity=options["bi3dnet_max_disparity"], ) if data is not None: model.load_state_dict(data["state_dict"]) return model """ Bi3DNet for continuous depthmap generation. Uses 3D regularization. """ class Bi3DNetContinuousDepth3D(nn.Module): def __init__( self, options, featnet_arch, segnet_arch, refinenet_arch=None, refinenet3d_arch=None, max_disparity=192, ): super(Bi3DNetContinuousDepth3D, self).__init__() self.max_disparity = max_disparity self.max_disparity_seg = int(self.max_disparity / 3) self.is_disps_per_example = False self.is_save_memory = False self.is_refine = True if refinenet_arch == None: self.is_refine = False self.featnet = FeatNet.__dict__[featnet_arch](options, data=None) self.segnet = SegNet.__dict__[segnet_arch](options, data=None) if self.is_refine: self.refinenet = RefineNet.__dict__[refinenet_arch](options, data=None) self.refinenet3d = RefineNet3D.__dict__[refinenet3d_arch](options, data=None) return def forward(self, img_left, img_right, disp_ids): batch_size = img_left.shape[0] psv_size = disp_ids.shape[1] if psv_size == 1: self.is_disps_per_example = True else: self.is_disps_per_example = False # Feature Extraction features_left = self.featnet(img_left) features_right = self.featnet(img_right) feature_size = features_left.shape[1] H = features_left.shape[2] W = features_left.shape[3] # Cost Volume Generation psv = compute_cost_volume( features_left, features_right, disp_ids, self.max_disparity_seg, self.is_disps_per_example ) psv = psv.view(batch_size * psv_size, feature_size * 2, H, W + self.max_disparity_seg) # Segmentation Network seg_raw_low_res = self.segnet(psv)[:, :, :, :W] # cropped to remove excess boundary seg_raw_low_res = seg_raw_low_res.view(batch_size, 1, psv_size, H, W) # Upsampling seg_prob_low_res_up = torch.sigmoid( F.interpolate( seg_raw_low_res, size=[psv_size * 3, img_left.size()[-2], img_left.size()[-1]], mode="trilinear", align_corners=False, ) ) seg_prob_low_res_up = seg_prob_low_res_up[:, 0, 1:-1, :, :] # Upsampling after 3D Regularization seg_raw_low_res_refined = seg_raw_low_res seg_raw_low_res_refined[:, :, 1:, :, :] = self.refinenet3d( features_left, seg_raw_low_res_refined[:, :, 1:, :, :] ) seg_prob_low_res_refined_up = torch.sigmoid( F.interpolate( seg_raw_low_res_refined, size=[psv_size * 3, img_left.size()[-2], img_left.size()[-1]], mode="trilinear", align_corners=False, ) ) seg_prob_low_res_refined_up = seg_prob_low_res_refined_up[:, 0, 1:-1, :, :] # Projection disparity_normalized_noisy = torch.mean((seg_prob_low_res_refined_up), dim=1, keepdim=True) # Refinement if self.is_refine: refine_net_input = torch.cat((disparity_normalized_noisy, img_left), dim=1) disparity_normalized = self.refinenet(refine_net_input) return ( seg_prob_low_res_up, seg_prob_low_res_refined_up, disparity_normalized_noisy, disparity_normalized, ) def bi3dnet_continuous_depth_3D(options, data=None): print("==> USING Bi3DNetContinuousDepth3D") for key in options: if "bi3dnet" in key: print("{} : {}".format(key, options[key])) model = Bi3DNetContinuousDepth3D( options, featnet_arch=options["bi3dnet_featnet_arch"], segnet_arch=options["bi3dnet_segnet_arch"], refinenet_arch=options["bi3dnet_refinenet_arch"], refinenet3d_arch=options["bi3dnet_regnet_arch"], max_disparity=options["bi3dnet_max_disparity"], ) if data is not None: model.load_state_dict(data["state_dict"]) return model """ Bi3DNet for binary depthmap generation. """ class Bi3DNetBinaryDepth(nn.Module): def __init__( self, options, featnet_arch, segnet_arch, refinenet_arch=None, featnethr_arch=None, max_disparity=192, is_disps_per_example=False, ): super(Bi3DNetBinaryDepth, self).__init__() self.max_disparity = max_disparity self.max_disparity_seg = int(max_disparity / 3) self.is_disps_per_example = is_disps_per_example self.is_refine = True if refinenet_arch == None: self.is_refine = False self.featnet = FeatNet.__dict__[featnet_arch](options, data=None) self.featnethr = FeatNet.__dict__[featnethr_arch](options, data=None) self.segnet = SegNet.__dict__[segnet_arch](options, data=None) if self.is_refine: self.refinenet = RefineNet.__dict__[refinenet_arch](options, data=None) return def forward(self, img_left, img_right, disp_ids): batch_size = img_left.shape[0] psv_size = disp_ids.shape[1] if psv_size == 1: self.is_disps_per_example = True else: self.is_disps_per_example = False # Feature Extraction features = self.featnet(torch.cat((img_left, img_right), dim=0)) features_left = features[:batch_size, :, :, :] features_right = features[batch_size:, :, :, :] if self.is_refine: features_lefthr = self.featnethr(img_left) feature_size = features_left.shape[1] H = features_left.shape[2] W = features_left.shape[3] # Cost Volume Generation psv = compute_cost_volume( features_left, features_right, disp_ids, self.max_disparity_seg, self.is_disps_per_example ) psv = psv.view(batch_size * psv_size, feature_size * 2, H, W + self.max_disparity_seg) # Segmentation Network seg_raw_low_res = self.segnet(psv)[:, :, :, :W] # cropped to remove excess boundary seg_prob_low_res = torch.sigmoid(seg_raw_low_res) seg_prob_low_res = seg_prob_low_res.view(batch_size, psv_size, H, W) seg_prob_low_res_up = F.interpolate( seg_prob_low_res, size=img_left.size()[-2:], mode="bilinear", align_corners=False ) out = [] out.append(seg_prob_low_res_up) # Refinement if self.is_refine: seg_raw_high_res = F.interpolate( seg_raw_low_res, size=img_left.size()[-2:], mode="bilinear", align_corners=False ) # Refine Net features_left_expand = ( features_lefthr[:, None, :, :, :].expand(-1, psv_size, -1, -1, -1).contiguous() ) features_left_expand = features_left_expand.view( -1, features_lefthr.size()[1], features_lefthr.size()[2], features_lefthr.size()[3] ) refine_net_input = torch.cat((seg_raw_high_res, features_left_expand), dim=1) seg_raw_high_res = self.refinenet(refine_net_input) seg_prob_high_res = torch.sigmoid(seg_raw_high_res) seg_prob_high_res = seg_prob_high_res.view( batch_size, psv_size, img_left.size()[-2], img_left.size()[-1] ) out.append(seg_prob_high_res) else: out.append(seg_prob_low_res_up) return out def bi3dnet_binary_depth(options, data=None): print("==> USING Bi3DNetBinaryDepth") for key in options: if "bi3dnet" in key: print("{} : {}".format(key, options[key])) model = Bi3DNetBinaryDepth( options, featnet_arch=options["bi3dnet_featnet_arch"], segnet_arch=options["bi3dnet_segnet_arch"], refinenet_arch=options["bi3dnet_refinenet_arch"], featnethr_arch=options["bi3dnet_featnethr_arch"], max_disparity=options["bi3dnet_max_disparity"], is_disps_per_example=options["bi3dnet_disps_per_example_true"], ) if data is not None: model.load_state_dict(data["state_dict"]) return model ================================================ FILE: src/models/DispRefine2D.py ================================================ # MIT License # # Copyright (c) 2019 Xuanyi Li (xuanyili.edu@gmail.com) # Copyright (c) 2020 NVIDIA # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. import torch import torch.nn as nn import torch.nn.functional as F import math from models.PSMNet import conv2d from models.PSMNet import conv2d_lrelu """ The code in this file is adapted from https://github.com/meteorshowers/StereoNet-ActiveStereoNet """ class BasicBlock(nn.Module): expansion = 1 def __init__(self, inplanes, planes, stride, downsample, pad, dilation): super(BasicBlock, self).__init__() self.conv1 = conv2d_lrelu(inplanes, planes, 3, stride, pad, dilation) self.conv2 = conv2d(planes, planes, 3, 1, pad, dilation) self.downsample = downsample self.stride = stride def forward(self, x): out = self.conv1(x) out = self.conv2(out) if self.downsample is not None: x = self.downsample(x) out += x return out class DispRefineNet(nn.Module): def __init__(self, out_planes=32): super(DispRefineNet, self).__init__() self.out_planes = out_planes self.conv2d_feature = conv2d_lrelu( in_planes=4, out_planes=self.out_planes, kernel_size=3, stride=1, pad=1, dilation=1 ) self.residual_astrous_blocks = nn.ModuleList() astrous_list = [1, 2, 4, 8, 1, 1] for di in astrous_list: self.residual_astrous_blocks.append( BasicBlock(self.out_planes, self.out_planes, stride=1, downsample=None, pad=1, dilation=di) ) self.conv2d_out = nn.Conv2d(self.out_planes, 1, kernel_size=3, stride=1, padding=1) for m in self.modules(): if isinstance(m, nn.Conv2d): n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels m.weight.data.normal_(0, math.sqrt(2.0 / n)) elif isinstance(m, nn.Conv3d): n = m.kernel_size[0] * m.kernel_size[1] * m.kernel_size[2] * m.out_channels m.weight.data.normal_(0, math.sqrt(2.0 / n)) elif isinstance(m, nn.BatchNorm2d): m.weight.data.fill_(1) m.bias.data.zero_() elif isinstance(m, nn.BatchNorm3d): m.weight.data.fill_(1) m.bias.data.zero_() elif isinstance(m, nn.Linear): m.bias.data.zero_() return def forward(self, x): disp = x[:, 0, :, :][:, None, :, :] output = self.conv2d_feature(x) for astrous_block in self.residual_astrous_blocks: output = astrous_block(output) output = self.conv2d_out(output) # residual disparity output = output + disp # final disparity return output ================================================ FILE: src/models/FeatExtractNet.py ================================================ # Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. # # NVIDIA CORPORATION and its licensors retain all intellectual property # and proprietary rights in and to this software, related documentation # and any modifications thereto. Any use, reproduction, disclosure or # distribution of this software and related documentation without an express # license agreement from NVIDIA CORPORATION is strictly prohibited. from __future__ import print_function import torch import torch.nn as nn import math from models.PSMNet import conv2d from models.PSMNet import conv2d_relu from models.PSMNet import FeatExtractNetSPP __all__ = ["featextractnetspp", "featextractnethr"] """ Feature extraction network. Generates 16D features at the image resolution. Used for final refinement. """ class FeatExtractNetHR(nn.Module): def __init__(self, out_planes=16): super(FeatExtractNetHR, self).__init__() self.conv1 = nn.Sequential( conv2d_relu(3, out_planes, kernel_size=3, stride=1, pad=1, dilation=1), conv2d_relu(out_planes, out_planes, kernel_size=3, stride=1, pad=1, dilation=1), nn.Conv2d(out_planes, out_planes, kernel_size=1, padding=0, stride=1, bias=False), ) for m in self.modules(): if isinstance(m, nn.Conv2d): n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels m.weight.data.normal_(0, math.sqrt(2.0 / n)) elif isinstance(m, nn.Conv3d): n = m.kernel_size[0] * m.kernel_size[1] * m.kernel_size[2] * m.out_channels m.weight.data.normal_(0, math.sqrt(2.0 / n)) elif isinstance(m, nn.BatchNorm2d): m.weight.data.fill_(1) m.bias.data.zero_() elif isinstance(m, nn.BatchNorm3d): m.weight.data.fill_(1) m.bias.data.zero_() elif isinstance(m, nn.Linear): m.bias.data.zero_() return def forward(self, input): output = self.conv1(input) return output def featextractnethr(options, data=None): print("==> USING FeatExtractNetHR") for key in options: if "featextractnethr" in key: print("{} : {}".format(key, options[key])) model = FeatExtractNetHR(out_planes=options["featextractnethr_out_planes"]) if data is not None: model.load_state_dict(data["state_dict"]) return model """ Feature extraction network. Generates 32D features at 3x less resolution. Uses Spatial Pyramid Pooling inspired by PSMNet. """ def featextractnetspp(options, data=None): print("==> USING FeatExtractNetSPP") for key in options: if "feat" in key: print("{} : {}".format(key, options[key])) model = FeatExtractNetSPP() if data is not None: model.load_state_dict(data["state_dict"]) return model ================================================ FILE: src/models/GCNet.py ================================================ # Copyright (c) 2018 Wang Yufeng # Copyright (c) 2020 NVIDIA # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import torch import torch.nn as nn """ The code in this file is adapted from https://github.com/wyf2017/DSMnet """ def conv3d_relu(in_planes, out_planes, kernel_size=3, stride=1, activefun=nn.ReLU(inplace=True)): return nn.Sequential( nn.Conv3d(in_planes, out_planes, kernel_size, stride, padding=(kernel_size - 1) // 2, bias=True), activefun, ) def deconv3d_relu(in_planes, out_planes, kernel_size=4, stride=2, activefun=nn.ReLU(inplace=True)): assert stride > 1 p = (kernel_size - 1) // 2 op = stride - (kernel_size - 2 * p) return nn.Sequential( nn.ConvTranspose3d( in_planes, out_planes, kernel_size, stride, padding=p, output_padding=op, bias=True ), activefun, ) """ GCNet style 3D regularization network """ class feature3d(nn.Module): def __init__(self, num_F): super(feature3d, self).__init__() self.F = num_F self.l19 = conv3d_relu(self.F + 32, self.F, kernel_size=3, stride=1) self.l20 = conv3d_relu(self.F, self.F, kernel_size=3, stride=1) self.l21 = conv3d_relu(self.F + 32, self.F * 2, kernel_size=3, stride=2) self.l22 = conv3d_relu(self.F * 2, self.F * 2, kernel_size=3, stride=1) self.l23 = conv3d_relu(self.F * 2, self.F * 2, kernel_size=3, stride=1) self.l24 = conv3d_relu(self.F * 2, self.F * 2, kernel_size=3, stride=2) self.l25 = conv3d_relu(self.F * 2, self.F * 2, kernel_size=3, stride=1) self.l26 = conv3d_relu(self.F * 2, self.F * 2, kernel_size=3, stride=1) self.l27 = conv3d_relu(self.F * 2, self.F * 2, kernel_size=3, stride=2) self.l28 = conv3d_relu(self.F * 2, self.F * 2, kernel_size=3, stride=1) self.l29 = conv3d_relu(self.F * 2, self.F * 2, kernel_size=3, stride=1) self.l30 = conv3d_relu(self.F * 2, self.F * 4, kernel_size=3, stride=2) self.l31 = conv3d_relu(self.F * 4, self.F * 4, kernel_size=3, stride=1) self.l32 = conv3d_relu(self.F * 4, self.F * 4, kernel_size=3, stride=1) self.l33 = deconv3d_relu(self.F * 4, self.F * 2, kernel_size=3, stride=2) self.l34 = deconv3d_relu(self.F * 2, self.F * 2, kernel_size=3, stride=2) self.l35 = deconv3d_relu(self.F * 2, self.F * 2, kernel_size=3, stride=2) self.l36 = deconv3d_relu(self.F * 2, self.F, kernel_size=3, stride=2) self.l37 = nn.Conv3d(self.F, 1, kernel_size=3, stride=1, padding=1, bias=True) def forward(self, x): x18 = x x21 = self.l21(x18) x24 = self.l24(x21) x27 = self.l27(x24) x30 = self.l30(x27) x31 = self.l31(x30) x32 = self.l32(x31) x29 = self.l29(self.l28(x27)) x33 = self.l33(x32) + x29 x26 = self.l26(self.l25(x24)) x34 = self.l34(x33) + x26 x23 = self.l23(self.l22(x21)) x35 = self.l35(x34) + x23 x20 = self.l20(self.l19(x18)) x36 = self.l36(x35) + x20 x37 = self.l37(x36) conf_volume_wo_sig = x37 return conf_volume_wo_sig ================================================ FILE: src/models/PSMNet.py ================================================ # MIT License # # Copyright (c) 2018 Jia-Ren Chang # Copyright (c) 2020 NVIDIA # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. import torch import torch.nn as nn import torch.nn.functional as F import math """ The code in this file is adapted from https://github.com/JiaRenChang/PSMNet """ def conv2d(in_planes, out_planes, kernel_size, stride, pad, dilation): return nn.Sequential( nn.Conv2d( in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=dilation if dilation > 1 else pad, dilation=dilation, bias=True, ) ) def conv2d_relu(in_planes, out_planes, kernel_size, stride, pad, dilation): return nn.Sequential( nn.Conv2d( in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=dilation if dilation > 1 else pad, dilation=dilation, bias=True, ), nn.ReLU(inplace=True), ) def conv2d_lrelu(in_planes, out_planes, kernel_size, stride, pad, dilation=1): return nn.Sequential( nn.Conv2d( in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=dilation if dilation > 1 else pad, dilation=dilation, bias=True, ), nn.LeakyReLU(0.1, inplace=True), ) class BasicBlock(nn.Module): expansion = 1 def __init__(self, inplanes, planes, stride, downsample, pad, dilation): super(BasicBlock, self).__init__() self.conv1 = conv2d_relu(inplanes, planes, 3, stride, pad, dilation) self.conv2 = conv2d(planes, planes, 3, 1, pad, dilation) self.downsample = downsample self.stride = stride def forward(self, x): out = self.conv1(x) out = self.conv2(out) if self.downsample is not None: x = self.downsample(x) out += x return out class FeatExtractNetSPP(nn.Module): def __init__(self): super(FeatExtractNetSPP, self).__init__() self.align_corners = False self.inplanes = 32 self.firstconv = nn.Sequential( conv2d_relu(3, 32, 3, 3, 1, 1), conv2d_relu(32, 32, 3, 1, 1, 1), conv2d_relu(32, 32, 3, 1, 1, 1) ) self.layer1 = self._make_layer(BasicBlock, 32, 2, 1, 1, 2) self.branch1 = nn.Sequential(nn.AvgPool2d((64, 64), stride=(64, 64)), conv2d_relu(32, 32, 1, 1, 0, 1)) self.branch2 = nn.Sequential(nn.AvgPool2d((32, 32), stride=(32, 32)), conv2d_relu(32, 32, 1, 1, 0, 1)) self.branch3 = nn.Sequential(nn.AvgPool2d((16, 16), stride=(16, 16)), conv2d_relu(32, 32, 1, 1, 0, 1)) self.branch4 = nn.Sequential(nn.AvgPool2d((8, 8), stride=(8, 8)), conv2d_relu(32, 32, 1, 1, 0, 1)) self.lastconv = nn.Sequential( conv2d_relu(160, 64, 3, 1, 1, 1), nn.Conv2d(64, 32, kernel_size=1, padding=0, stride=1, bias=False), ) for m in self.modules(): if isinstance(m, nn.Conv2d): n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels m.weight.data.normal_(0, math.sqrt(2.0 / n)) elif isinstance(m, nn.Conv3d): n = m.kernel_size[0] * m.kernel_size[1] * m.kernel_size[2] * m.out_channels m.weight.data.normal_(0, math.sqrt(2.0 / n)) elif isinstance(m, nn.BatchNorm2d): m.weight.data.fill_(1) m.bias.data.zero_() elif isinstance(m, nn.BatchNorm3d): m.weight.data.fill_(1) m.bias.data.zero_() elif isinstance(m, nn.Linear): m.bias.data.zero_() def _make_layer(self, block, planes, blocks, stride, pad, dilation): downsample = None if stride != 1 or self.inplanes != planes * block.expansion: downsample = nn.Sequential( nn.Conv2d(self.inplanes, planes * block.expansion, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(planes * block.expansion), ) layers = [] layers.append(block(self.inplanes, planes, stride, downsample, pad, dilation)) self.inplanes = planes * block.expansion for i in range(1, blocks): layers.append(block(self.inplanes, planes, 1, None, pad, dilation)) return nn.Sequential(*layers) def forward(self, input): output0 = self.firstconv(input) output1 = self.layer1(output0) output_branch1 = self.branch1(output1) output_branch1 = F.interpolate( output_branch1, (output1.size()[2], output1.size()[3]), mode="bilinear", align_corners=self.align_corners, ) output_branch2 = self.branch2(output1) output_branch2 = F.interpolate( output_branch2, (output1.size()[2], output1.size()[3]), mode="bilinear", align_corners=self.align_corners, ) output_branch3 = self.branch3(output1) output_branch3 = F.interpolate( output_branch3, (output1.size()[2], output1.size()[3]), mode="bilinear", align_corners=self.align_corners, ) output_branch4 = self.branch4(output1) output_branch4 = F.interpolate( output_branch4, (output1.size()[2], output1.size()[3]), mode="bilinear", align_corners=self.align_corners, ) output_feature = torch.cat( (output1, output_branch4, output_branch3, output_branch2, output_branch1), 1 ) output_feature = self.lastconv(output_feature) return output_feature ================================================ FILE: src/models/RefineNet2D.py ================================================ # Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. # # NVIDIA CORPORATION and its licensors retain all intellectual property # and proprietary rights in and to this software, related documentation # and any modifications thereto. Any use, reproduction, disclosure or # distribution of this software and related documentation without an express # license agreement from NVIDIA CORPORATION is strictly prohibited. from __future__ import print_function import torch import torch.nn as nn import torch.nn.functional as F import math import argparse import time import torch.backends.cudnn as cudnn from models.PSMNet import conv2d from models.PSMNet import conv2d_lrelu from models.DispRefine2D import DispRefineNet __all__ = ["disprefinenet", "segrefinenet"] """ Disparity refinement network. Takes concatenated input image and the disparity map to generate refined disparity map. Generates refined output using input image as guide. """ def disprefinenet(options, data=None): print("==> USING DispRefineNet") for key in options: if "disprefinenet" in key: print("{} : {}".format(key, options[key])) model = DispRefineNet(out_planes=options["disprefinenet_out_planes"]) if data is not None: model.load_state_dict(data["state_dict"]) return model """ Binary segmentation refinement network. Takes as input high resolution features of input image and the disparity map. Generates refined output using input image as guide. """ class SegRefineNet(nn.Module): def __init__(self, in_planes=17, out_planes=8): super(SegRefineNet, self).__init__() self.conv1 = nn.Sequential(conv2d_lrelu(in_planes, out_planes, kernel_size=3, stride=1, pad=1)) self.classif1 = nn.Conv2d(out_planes, 1, kernel_size=3, padding=1, stride=1, bias=False) for m in self.modules(): if isinstance(m, nn.Conv2d): n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels m.weight.data.normal_(0, math.sqrt(2.0 / n)) elif isinstance(m, nn.Conv3d): n = m.kernel_size[0] * m.kernel_size[1] * m.kernel_size[2] * m.out_channels m.weight.data.normal_(0, math.sqrt(2.0 / n)) elif isinstance(m, nn.BatchNorm2d): m.weight.data.fill_(1) m.bias.data.zero_() elif isinstance(m, nn.BatchNorm3d): m.weight.data.fill_(1) m.bias.data.zero_() elif isinstance(m, nn.Linear): m.bias.data.zero_() def forward(self, input): output0 = self.conv1(input) output = self.classif1(output0) return output def segrefinenet(options, data=None): print("==> USING SegRefineNet") for key in options: if "segrefinenet" in key: print("{} : {}".format(key, options[key])) model = SegRefineNet( in_planes=options["segrefinenet_in_planes"], out_planes=options["segrefinenet_out_planes"] ) if data is not None: model.load_state_dict(data["state_dict"]) return model ================================================ FILE: src/models/RefineNet3D.py ================================================ # Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. # # NVIDIA CORPORATION and its licensors retain all intellectual property # and proprietary rights in and to this software, related documentation # and any modifications thereto. Any use, reproduction, disclosure or # distribution of this software and related documentation without an express # license agreement from NVIDIA CORPORATION is strictly prohibited. import torch import torch.nn as nn import numpy as np __all__ = ["segregnet3d"] from models.GCNet import conv3d_relu from models.GCNet import deconv3d_relu from models.GCNet import feature3d def net_init(net): for m in net.modules(): if isinstance(m, nn.Linear): m.weight.data = fanin_init(m.weight.data.size()) elif isinstance(m, nn.Conv3d): n = m.kernel_size[0] * m.kernel_size[1] * m.kernel_size[2] * m.out_channels m.weight.data.normal_(0, np.sqrt(2.0 / n)) elif isinstance(m, nn.Conv2d): n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels m.weight.data.normal_(0, np.sqrt(2.0 / n)) elif isinstance(m, nn.Conv1d): n = m.kernel_size[0] * m.out_channels m.weight.data.normal_(0, np.sqrt(2.0 / n)) elif isinstance(m, nn.BatchNorm3d): m.weight.data.fill_(1) m.bias.data.zero_() elif isinstance(m, nn.BatchNorm2d): m.weight.data.fill_(1) m.bias.data.zero_() elif isinstance(m, nn.BatchNorm1d): m.weight.data.fill_(1) m.bias.data.zero_() class SegRegNet3D(nn.Module): def __init__(self, F=16): super(SegRegNet3D, self).__init__() self.conf_preprocess = conv3d_relu(1, F, kernel_size=3, stride=1) self.layer3d = feature3d(F) net_init(self) def forward(self, fL, conf_volume): fL_stack = fL[:, :, None, :, :].repeat(1, 1, int(conf_volume.shape[2]), 1, 1) conf_vol_preprocess = self.conf_preprocess(conf_volume) input_volume = torch.cat((fL_stack, conf_vol_preprocess), dim=1) oL = self.layer3d(input_volume) return oL def segregnet3d(options, data=None): print("==> USING SegRegNet3D") for key in options: if "regnet" in key: print("{} : {}".format(key, options[key])) model = SegRegNet3D(F=options["regnet_out_planes"]) if data is not None: model.load_state_dict(data["state_dict"]) return model ================================================ FILE: src/models/SegNet2D.py ================================================ # Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. # # NVIDIA CORPORATION and its licensors retain all intellectual property # and proprietary rights in and to this software, related documentation # and any modifications thereto. Any use, reproduction, disclosure or # distribution of this software and related documentation without an express # license agreement from NVIDIA CORPORATION is strictly prohibited. import torch import torch.nn as nn import argparse import math import torch.nn.functional as F import torch.backends.cudnn as cudnn import time __all__ = ["segnet2d"] # Util Functions def conv(in_planes, out_planes, kernel_size=3, stride=1, activefun=nn.LeakyReLU(0.1, inplace=True)): return nn.Sequential( nn.Conv2d( in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=(kernel_size - 1) // 2, bias=True, ), activefun, ) def deconv(in_planes, out_planes, kernel_size=4, stride=2, activefun=nn.LeakyReLU(0.1, inplace=True)): return nn.Sequential( nn.ConvTranspose2d( in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=1, bias=True ), activefun, ) class SegNet2D(nn.Module): def __init__(self): super(SegNet2D, self).__init__() self.activefun = nn.LeakyReLU(0.1, inplace=True) cps = [64, 128, 256, 512, 512, 512] dps = [512, 512, 256, 128, 64] # Encoder self.conv1 = conv(cps[0], cps[1], kernel_size=3, stride=2, activefun=self.activefun) self.conv1_1 = conv(cps[1], cps[1], kernel_size=3, stride=1, activefun=self.activefun) self.conv2 = conv(cps[1], cps[2], kernel_size=3, stride=2, activefun=self.activefun) self.conv2_1 = conv(cps[2], cps[2], kernel_size=3, stride=1, activefun=self.activefun) self.conv3 = conv(cps[2], cps[3], kernel_size=3, stride=2, activefun=self.activefun) self.conv3_1 = conv(cps[3], cps[3], kernel_size=3, stride=1, activefun=self.activefun) self.conv4 = conv(cps[3], cps[4], kernel_size=3, stride=2, activefun=self.activefun) self.conv4_1 = conv(cps[4], cps[4], kernel_size=3, stride=1, activefun=self.activefun) self.conv5 = conv(cps[4], cps[5], kernel_size=3, stride=2, activefun=self.activefun) self.conv5_1 = conv(cps[5], cps[5], kernel_size=3, stride=1, activefun=self.activefun) # Decoder self.deconv5 = deconv(cps[5], dps[0], kernel_size=4, stride=2, activefun=self.activefun) self.deconv5_1 = conv(dps[0] + cps[4], dps[0], kernel_size=3, stride=1, activefun=self.activefun) self.deconv4 = deconv(cps[4], dps[1], kernel_size=4, stride=2, activefun=self.activefun) self.deconv4_1 = conv(dps[1] + cps[3], dps[1], kernel_size=3, stride=1, activefun=self.activefun) self.deconv3 = deconv(dps[1], dps[2], kernel_size=4, stride=2, activefun=self.activefun) self.deconv3_1 = conv(dps[2] + cps[2], dps[2], kernel_size=3, stride=1, activefun=self.activefun) self.deconv2 = deconv(dps[2], dps[3], kernel_size=4, stride=2, activefun=self.activefun) self.deconv2_1 = conv(dps[3] + cps[1], dps[3], kernel_size=3, stride=1, activefun=self.activefun) self.deconv1 = deconv(dps[3], dps[4], kernel_size=4, stride=2, activefun=self.activefun) self.deconv1_1 = conv(dps[4] + cps[0], dps[4], kernel_size=3, stride=1, activefun=self.activefun) self.last_conv = nn.Conv2d(dps[4], 1, kernel_size=3, stride=1, padding=1, bias=True) # Init for m in self.modules(): if isinstance(m, nn.Conv2d): n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels m.weight.data.normal_(0, math.sqrt(2.0 / n)) elif isinstance(m, nn.Conv3d): n = m.kernel_size[0] * m.kernel_size[1] * m.kernel_size[2] * m.out_channels m.weight.data.normal_(0, math.sqrt(2.0 / n)) elif isinstance(m, nn.BatchNorm2d): m.weight.data.fill_(1) m.bias.data.zero_() elif isinstance(m, nn.BatchNorm3d): m.weight.data.fill_(1) m.bias.data.zero_() elif isinstance(m, nn.Linear): m.bias.data.zero_() return def forward(self, x): out_conv0 = x out_conv1 = self.conv1_1(self.conv1(out_conv0)) out_conv2 = self.conv2_1(self.conv2(out_conv1)) out_conv3 = self.conv3_1(self.conv3(out_conv2)) out_conv4 = self.conv4_1(self.conv4(out_conv3)) out_conv5 = self.conv5_1(self.conv5(out_conv4)) out_deconv5 = self.deconv5(out_conv5) out_deconv5_1 = self.deconv5_1(torch.cat((out_conv4, out_deconv5), 1)) out_deconv4 = self.deconv4(out_deconv5_1) out_deconv4_1 = self.deconv4_1(torch.cat((out_conv3, out_deconv4), 1)) out_deconv3 = self.deconv3(out_deconv4_1) out_deconv3_1 = self.deconv3_1(torch.cat((out_conv2, out_deconv3), 1)) out_deconv2 = self.deconv2(out_deconv3_1) out_deconv2_1 = self.deconv2_1(torch.cat((out_conv1, out_deconv2), 1)) out_deconv1 = self.deconv1(out_deconv2_1) out_deconv1_1 = self.deconv1_1(torch.cat((out_conv0, out_deconv1), 1)) raw_seg = self.last_conv(out_deconv1_1) return raw_seg def segnet2d(options, data=None): print("==> USING SegNet2D") for key in options: if "segnet2d" in key: print("{} : {}".format(key, options[key])) model = SegNet2D() if data is not None: model.load_state_dict(data["state_dict"]) return model ================================================ FILE: src/models/__init__.py ================================================ from .Bi3DNet import * from .FeatExtractNet import * from .SegNet2D import * from .RefineNet2D import * from .RefineNet3D import * from .PSMNet import * from .GCNet import * from .DispRefine2D import * ================================================ FILE: src/project.toml ================================================ [tool.black] line-length = 110 target-version = ['py37'] ================================================ FILE: src/run_binary_depth_estimation.py ================================================ # Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. # # NVIDIA CORPORATION and its licensors retain all intellectual property # and proprietary rights in and to this software, related documentation # and any modifications thereto. Any use, reproduction, disclosure or # distribution of this software and related documentation without an express # license agreement from NVIDIA CORPORATION is strictly prohibited. import argparse import os import torch import torchvision.transforms as transforms from PIL import Image import models import cv2 import numpy as np from util import disp2rgb, str2bool import random model_names = sorted(name for name in models.__dict__ if name.islower() and not name.startswith("__")) # Parse arguments parser = argparse.ArgumentParser(allow_abbrev=False) # Model parser.add_argument("--arch", type=str, default="bi3dnet_binary_depth") parser.add_argument("--bi3dnet_featnet_arch", type=str, default="featextractnetspp") parser.add_argument("--bi3dnet_featnethr_arch", type=str, default="featextractnethr") parser.add_argument("--bi3dnet_segnet_arch", type=str, default="segnet2d") parser.add_argument("--bi3dnet_refinenet_arch", type=str, default="segrefinenet") parser.add_argument("--bi3dnet_max_disparity", type=int, default=192) parser.add_argument("--bi3dnet_disps_per_example_true", type=str2bool, default=True) parser.add_argument("--featextractnethr_out_planes", type=int, default=16) parser.add_argument("--segrefinenet_in_planes", type=int, default=17) parser.add_argument("--segrefinenet_out_planes", type=int, default=8) # Input parser.add_argument("--pretrained", type=str) parser.add_argument("--img_left", type=str) parser.add_argument("--img_right", type=str) parser.add_argument("--disp_vals", type=float, nargs="*") parser.add_argument("--crop_height", type=int) parser.add_argument("--crop_width", type=int) args, unknown = parser.parse_known_args() #################################################################################################### def main(): options = vars(args) print("==> ALL PARAMETERS") for key in options: print("{} : {}".format(key, options[key])) out_dir = "out" if not os.path.isdir(out_dir): os.mkdir(out_dir) base_name = os.path.splitext(os.path.basename(args.img_left))[0] # Model network_data = torch.load(args.pretrained) print("=> using pre-trained model '{}'".format(args.arch)) model = models.__dict__[args.arch](options, network_data).cuda() # Inputs img_left = Image.open(args.img_left).convert("RGB") img_left = transforms.functional.to_tensor(img_left) img_left = transforms.functional.normalize(img_left, [0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) img_left = img_left.type(torch.cuda.FloatTensor)[None, :, :, :] img_right = Image.open(args.img_right).convert("RGB") img_right = transforms.functional.to_tensor(img_right) img_right = transforms.functional.normalize(img_right, [0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) img_right = img_right.type(torch.cuda.FloatTensor)[None, :, :, :] segs = [] for disp_val in args.disp_vals: assert disp_val % 3 == 0, "disparity value should be a multiple of 3 as we downsample the image by 3" disp_long = torch.Tensor([[disp_val / 3]]).type(torch.LongTensor).cuda() # Pad inputs tw = args.crop_width th = args.crop_height assert tw % 96 == 0, "image dimensions should be a multiple of 96" assert th % 96 == 0, "image dimensions should be a multiple of 96" h = img_left.shape[2] w = img_left.shape[3] x1 = random.randint(0, max(0, w - tw)) y1 = random.randint(0, max(0, h - th)) pad_w = tw - w if tw - w > 0 else 0 pad_h = th - h if th - h > 0 else 0 pad_opr = torch.nn.ZeroPad2d((pad_w, 0, pad_h, 0)) img_left = img_left[:, :, y1 : y1 + min(th, h), x1 : x1 + min(tw, w)] img_right = img_right[:, :, y1 : y1 + min(th, h), x1 : x1 + min(tw, w)] img_left_pad = pad_opr(img_left) img_right_pad = pad_opr(img_right) # Inference model.eval() with torch.no_grad(): output = model(img_left_pad, img_right_pad, disp_long)[1][:, :, pad_h:, pad_w:] # Write binary depth results seg_img = output[0, 0][None, :, :].clone().cpu().detach().numpy() seg_img = np.transpose(seg_img * 255.0, (1, 2, 0)) cv2.imwrite( os.path.join(out_dir, "%s_%s_seg_confidence_%d.png" % (base_name, args.arch, disp_val)), seg_img ) segs.append(output[0, 0][None, :, :].clone().cpu().detach().numpy()) # Generate quantized depth results segs = np.concatenate(segs, axis=0) segs = np.insert(segs, 0, np.ones((1, h, w), dtype=np.float32), axis=0) segs = np.append(segs, np.zeros((1, h, w), dtype=np.float32), axis=0) segs = 1.0 - segs # Get the pdf values for each segmented region pdf_method = segs[1:, :, :] - segs[:-1, :, :] # Get the labels labels_method = np.argmax(pdf_method, axis=0).astype(np.int) disp_map = labels_method.astype(np.float32) disp_vals = args.disp_vals disp_vals.insert(0, 0) disp_vals.append(args.bi3dnet_max_disparity) for i in range(len(disp_vals) - 1): min_disp = disp_vals[i] max_disp = disp_vals[i + 1] mid_disp = 0.5 * (min_disp + max_disp) disp_map[labels_method == i] = mid_disp disp_vals_str_list = ["%d" % disp_val for disp_val in disp_vals] disp_vals_str = "-".join(disp_vals_str_list) img_disp = np.clip(disp_map, 0, args.bi3dnet_max_disparity) img_disp = img_disp / args.bi3dnet_max_disparity img_disp = (disp2rgb(img_disp) * 255.0).astype(np.uint8) cv2.imwrite( os.path.join(out_dir, "%s_%s_quant_depth_%s.png" % (base_name, args.arch, disp_vals_str)), img_disp ) return if __name__ == "__main__": main() ================================================ FILE: src/run_continuous_depth_estimation.py ================================================ # Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. # # NVIDIA CORPORATION and its licensors retain all intellectual property # and proprietary rights in and to this software, related documentation # and any modifications thereto. Any use, reproduction, disclosure or # distribution of this software and related documentation without an express # license agreement from NVIDIA CORPORATION is strictly prohibited. import argparse import os import time import torch import torchvision.transforms as transforms from PIL import Image import models import cv2 import numpy as np from util import disp2rgb, str2bool import random model_names = sorted(name for name in models.__dict__ if name.islower() and not name.startswith("__")) # Parse Arguments parser = argparse.ArgumentParser(allow_abbrev=False) # Experiment Type parser.add_argument("--arch", type=str, default="bi3dnet_continuous_depth_2D") parser.add_argument("--bi3dnet_featnet_arch", type=str, default="featextractnetspp") parser.add_argument("--bi3dnet_segnet_arch", type=str, default="segnet2d") parser.add_argument("--bi3dnet_refinenet_arch", type=str, default="disprefinenet") parser.add_argument("--bi3dnet_regnet_arch", type=str, default="segregnet3d") parser.add_argument("--bi3dnet_max_disparity", type=int, default=192) parser.add_argument("--regnet_out_planes", type=int, default=16) parser.add_argument("--disprefinenet_out_planes", type=int, default=32) parser.add_argument("--bi3dnet_disps_per_example_true", type=str2bool, default=True) # Input parser.add_argument("--pretrained", type=str) parser.add_argument("--img_left", type=str) parser.add_argument("--img_right", type=str) parser.add_argument("--disp_range_min", type=int) parser.add_argument("--disp_range_max", type=int) parser.add_argument("--crop_height", type=int) parser.add_argument("--crop_width", type=int) args, unknown = parser.parse_known_args() ############################################################################################################## def main(): options = vars(args) print("==> ALL PARAMETERS") for key in options: print("{} : {}".format(key, options[key])) out_dir = "out" if not os.path.isdir(out_dir): os.mkdir(out_dir) base_name = os.path.splitext(os.path.basename(args.img_left))[0] # Model if args.pretrained: network_data = torch.load(args.pretrained) else: print("Need an input model") exit() print("=> using pre-trained model '{}'".format(args.arch)) model = models.__dict__[args.arch](options, network_data).cuda() # Inputs img_left = Image.open(args.img_left).convert("RGB") img_right = Image.open(args.img_right).convert("RGB") img_left = transforms.functional.to_tensor(img_left) img_right = transforms.functional.to_tensor(img_right) img_left = transforms.functional.normalize(img_left, [0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) img_right = transforms.functional.normalize(img_right, [0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) img_left = img_left.type(torch.cuda.FloatTensor)[None, :, :, :] img_right = img_right.type(torch.cuda.FloatTensor)[None, :, :, :] # Prepare Disparities max_disparity = args.disp_range_max min_disparity = args.disp_range_min assert max_disparity % 3 == 0 and min_disparity % 3 == 0, "disparities should be divisible by 3" if args.arch == "bi3dnet_continuous_depth_3D": assert ( max_disparity - min_disparity ) % 48 == 0, "for 3D regularization the difference in disparities should be divisible by 48" max_disp_levels = (max_disparity - min_disparity) + 1 max_disparity_3x = int(max_disparity / 3) min_disparity_3x = int(min_disparity / 3) max_disp_levels_3x = (max_disparity_3x - min_disparity_3x) + 1 disp_3x = np.linspace(min_disparity_3x, max_disparity_3x, max_disp_levels_3x, dtype=np.int32) disp_long_3x_main = torch.from_numpy(disp_3x).type(torch.LongTensor).cuda() disp_float_main = np.linspace(min_disparity, max_disparity, max_disp_levels, dtype=np.float32) disp_float_main = torch.from_numpy(disp_float_main).type(torch.float32).cuda() delta = 1 d_min_GT = min_disparity - 0.5 * delta d_max_GT = max_disparity + 0.5 * delta disp_long_3x = disp_long_3x_main[None, :].expand(img_left.shape[0], -1) disp_float = disp_float_main[None, :].expand(img_left.shape[0], -1) # Pad Inputs tw = args.crop_width th = args.crop_height assert tw % 96 == 0, "image dimensions should be multiple of 96" assert th % 96 == 0, "image dimensions should be multiple of 96" h = img_left.shape[2] w = img_left.shape[3] x1 = random.randint(0, max(0, w - tw)) y1 = random.randint(0, max(0, h - th)) pad_w = tw - w if tw - w > 0 else 0 pad_h = th - h if th - h > 0 else 0 pad_opr = torch.nn.ZeroPad2d((pad_w, 0, pad_h, 0)) img_left = img_left[:, :, y1 : y1 + min(th, h), x1 : x1 + min(tw, w)] img_right = img_right[:, :, y1 : y1 + min(th, h), x1 : x1 + min(tw, w)] img_left_pad = pad_opr(img_left) img_right_pad = pad_opr(img_right) # Inference model.eval() with torch.no_grad(): if args.arch == "bi3dnet_continuous_depth_2D": output_seg_low_res_upsample, output_disp_normalized = model( img_left_pad, img_right_pad, disp_long_3x ) output_seg = output_seg_low_res_upsample else: ( output_seg_low_res_upsample, output_seg_low_res_upsample_refined, output_disp_normalized_no_reg, output_disp_normalized, ) = model(img_left_pad, img_right_pad, disp_long_3x) output_seg = output_seg_low_res_upsample_refined output_seg = output_seg[:, :, pad_h:, pad_w:] output_disp_normalized = output_disp_normalized[:, :, pad_h:, pad_w:] output_disp = torch.clamp( output_disp_normalized * delta * max_disp_levels + d_min_GT, min=d_min_GT, max=d_max_GT ) # Write Results max_disparity_color = 192 output_disp_clamp = output_disp[0, 0, :, :].cpu().clone().numpy() output_disp_clamp[output_disp_clamp < min_disparity] = 0 output_disp_clamp[output_disp_clamp > max_disparity] = max_disparity_color disp_np_ours_color = disp2rgb(output_disp_clamp / max_disparity_color) * 255.0 cv2.imwrite( os.path.join(out_dir, "%s_%s_%d_%d.png" % (base_name, args.arch, min_disparity, max_disparity)), disp_np_ours_color, ) return if __name__ == "__main__": main() ================================================ FILE: src/run_demo_kitti15.sh ================================================ #!/usr/bin/env bash # GENERATE BINARY DEPTH SEGMENTATIONS AND COMBINE THEM TO GENERATE QUANTIZED DEPTH CUDA_VISIBLE_DEVICES=0 python run_binary_depth_estimation.py \ --arch bi3dnet_binary_depth \ --bi3dnet_featnet_arch featextractnetspp \ --bi3dnet_featnethr_arch featextractnethr \ --bi3dnet_segnet_arch segnet2d \ --bi3dnet_refinenet_arch segrefinenet \ --featextractnethr_out_planes 16 \ --segrefinenet_in_planes 17 \ --segrefinenet_out_planes 8 \ --crop_height 384 --crop_width 1248 \ --disp_vals 12 21 30 39 48 \ --img_left '../data/kitti15_img_left.jpg' \ --img_right '../data/kitti15_img_right.jpg' \ --pretrained '../model_weights/kitti15_binary_depth.pth.tar' # FULL RANGE CONTINOUS DEPTH ESTIMATION WITHOUT 3D REGULARIZATION CUDA_VISIBLE_DEVICES=0 python run_continuous_depth_estimation.py \ --arch bi3dnet_continuous_depth_2D \ --bi3dnet_featnet_arch featextractnetspp \ --bi3dnet_segnet_arch segnet2d \ --bi3dnet_refinenet_arch disprefinenet \ --disprefinenet_out_planes 32 \ --crop_height 384 --crop_width 1248 \ --disp_range_min 0 \ --disp_range_max 192 \ --bi3dnet_max_disparity 192 \ --img_left '../data/kitti15_img_left.jpg' \ --img_right '../data/kitti15_img_right.jpg' \ --pretrained '../model_weights/kitti15_continuous_depth_no_conf_reg.pth.tar' # SELECTIVE RANGE CONTINOUS DEPTH ESTIMATION WITHOUT 3D REGULARIZATION CUDA_VISIBLE_DEVICES=0 python run_continuous_depth_estimation.py \ --arch bi3dnet_continuous_depth_2D \ --bi3dnet_featnet_arch featextractnetspp \ --bi3dnet_segnet_arch segnet2d \ --bi3dnet_refinenet_arch disprefinenet \ --disprefinenet_out_planes 32 \ --crop_height 384 --crop_width 1248 \ --disp_range_min 12 \ --disp_range_max 48 \ --bi3dnet_max_disparity 192 \ --img_left '../data/kitti15_img_left.jpg' \ --img_right '../data/kitti15_img_right.jpg' \ --pretrained '../model_weights/kitti15_continuous_depth_no_conf_reg.pth.tar' # FULL RANGE CONTINOUS DEPTH ESTIMATION WITH 3D REGULARIZATION CUDA_VISIBLE_DEVICES=0 python run_continuous_depth_estimation.py \ --arch bi3dnet_continuous_depth_3D \ --bi3dnet_featnet_arch featextractnetspp \ --bi3dnet_segnet_arch segnet2d \ --bi3dnet_refinenet_arch disprefinenet \ --bi3dnet_regnet_arch segregnet3d \ --disprefinenet_out_planes 32 \ --regnet_out_planes 16 \ --crop_height 384 --crop_width 1248 \ --disp_range_min 0 \ --disp_range_max 192 \ --bi3dnet_max_disparity 192 \ --img_left '../data/kitti15_img_left.jpg' \ --img_right '../data/kitti15_img_right.jpg' \ --pretrained '../model_weights/kitti15_continuous_depth_conf_reg.pth.tar' ================================================ FILE: src/run_demo_sf.sh ================================================ #!/usr/bin/env bash # GENERATE BINARY DEPTH SEGMENTATIONS AND COMBINE THEM TO GENERATE QUANTIZED DEPTH CUDA_VISIBLE_DEVICES=0 python run_binary_depth_estimation.py \ --arch bi3dnet_binary_depth \ --bi3dnet_featnet_arch featextractnetspp \ --bi3dnet_featnethr_arch featextractnethr \ --bi3dnet_segnet_arch segnet2d \ --bi3dnet_refinenet_arch segrefinenet \ --featextractnethr_out_planes 16 \ --segrefinenet_in_planes 17 \ --segrefinenet_out_planes 8 \ --crop_height 576 --crop_width 960 \ --disp_vals 24 36 54 96 144 \ --img_left '../data/sf_img_left.jpg' \ --img_right '../data/sf_img_right.jpg' \ --pretrained '../model_weights/sf_binary_depth.pth.tar' # FULL RANGE CONTINOUS DEPTH ESTIMATION WITHOUT 3D REGULARIZATION CUDA_VISIBLE_DEVICES=0 python run_continuous_depth_estimation.py \ --arch bi3dnet_continuous_depth_2D \ --bi3dnet_featnet_arch featextractnetspp \ --bi3dnet_segnet_arch segnet2d \ --bi3dnet_refinenet_arch disprefinenet \ --disprefinenet_out_planes 32 \ --crop_height 576 --crop_width 960 \ --disp_range_min 0 \ --disp_range_max 192 \ --bi3dnet_max_disparity 192 \ --img_left '../data/sf_img_left.jpg' \ --img_right '../data/sf_img_right.jpg' \ --pretrained '../model_weights/sf_continuous_depth_no_conf_reg.pth.tar' # SELECTIVE RANGE CONTINOUS DEPTH ESTIMATION WITHOUT 3D REGULARIZATION CUDA_VISIBLE_DEVICES=0 python run_continuous_depth_estimation.py \ --arch bi3dnet_continuous_depth_2D \ --bi3dnet_featnet_arch featextractnetspp \ --bi3dnet_segnet_arch segnet2d \ --bi3dnet_refinenet_arch disprefinenet \ --disprefinenet_out_planes 32 \ --crop_height 576 --crop_width 960 \ --disp_range_min 18 \ --disp_range_max 60 \ --bi3dnet_max_disparity 192 \ --img_left '../data/sf_img_left.jpg' \ --img_right '../data/sf_img_right.jpg' \ --pretrained '../model_weights/sf_continuous_depth_no_conf_reg.pth.tar' # FULL RANGE CONTINOUS DEPTH ESTIMATION WITH 3D REGULARIZATION CUDA_VISIBLE_DEVICES=0 python run_continuous_depth_estimation.py \ --arch bi3dnet_continuous_depth_3D \ --bi3dnet_featnet_arch featextractnetspp \ --bi3dnet_segnet_arch segnet2d \ --bi3dnet_refinenet_arch disprefinenet \ --bi3dnet_regnet_arch segregnet3d \ --disprefinenet_out_planes 32 \ --regnet_out_planes 16 \ --crop_height 576 --crop_width 960 \ --disp_range_min 0 \ --disp_range_max 192 \ --bi3dnet_max_disparity 192 \ --img_left '../data/sf_img_left.jpg' \ --img_right '../data/sf_img_right.jpg' \ --pretrained '../model_weights/sf_continuous_depth_conf_reg.pth.tar' ================================================ FILE: src/util.py ================================================ # Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. # # NVIDIA CORPORATION and its licensors retain all intellectual property # and proprietary rights in and to this software, related documentation # and any modifications thereto. Any use, reproduction, disclosure or # distribution of this software and related documentation without an express # license agreement from NVIDIA CORPORATION is strictly prohibited. import os import numpy as np def disp2rgb(disp): H = disp.shape[0] W = disp.shape[1] I = disp.flatten() map = np.array( [ [0, 0, 0, 114], [0, 0, 1, 185], [1, 0, 0, 114], [1, 0, 1, 174], [0, 1, 0, 114], [0, 1, 1, 185], [1, 1, 0, 114], [1, 1, 1, 0], ] ) bins = map[:-1, 3] cbins = np.cumsum(bins) bins = bins / cbins[-1] cbins = cbins[:-1] / cbins[-1] ind = np.minimum( np.sum(np.repeat(I[None, :], 6, axis=0) > np.repeat(cbins[:, None], I.shape[0], axis=1), axis=0), 6 ) bins = np.reciprocal(bins) cbins = np.append(np.array([[0]]), cbins[:, None]) I = np.multiply(I - cbins[ind], bins[ind]) I = np.minimum( np.maximum( np.multiply(map[ind, 0:3], np.repeat(1 - I[:, None], 3, axis=1)) + np.multiply(map[ind + 1, 0:3], np.repeat(I[:, None], 3, axis=1)), 0, ), 1, ) I = np.reshape(I, [H, W, 3]).astype(np.float32) return I def str2bool(bool_input_string): if isinstance(bool_input_string, bool): return bool_input_string if bool_input_string.lower() in ("true"): return True elif bool_input_string.lower() in ("false"): return False else: raise NameError("Please provide boolean type.")