[
  {
    "path": ".gitignore",
    "content": "# Add any directories, files, or patterns you don't want to be tracked by version control\n\n*.png\n*.pfm\n*.pth.tar\n*.npy\n*.ppm\n*.pyc\n*.tar\n*.zip\n*.gif"
  },
  {
    "path": "LICENSE.md",
    "content": "# NVIDIA Source Code License for Bi3D\n\n## 1. Definitions\n\n“Licensor” means any person or entity that distributes its Work.\n\n“Software” means the original work of authorship made available under this License.\n\n“Work” means the Software and any additions to or derivative works of the Software that are made available under this License.\n\n“NVIDIA Processors” means any central processing unit (CPU), graphics processing unit (GPU), field-programmable gate array (FPGA), application-specific integrated circuit (ASIC) or any combination thereof designed, made, sold, or provided by NVIDIA or its affiliates.\n\nThe terms “reproduce,” “reproduction,” “derivative works,” and “distribution” have the meaning as provided under U.S. copyright law; provided, however, that for the purposes of this License, derivative works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work.\n\nWorks, including the Software, are “made available” under this License by including in or with the Work either (a) a copyright notice referencing the applicability of this License to the Work, or (b) a copy of this License.\n\n## 2. License Grant\n\n### 2.1 Copyright Grant.\nSubject to the terms and conditions of this License, each Licensor grants to you a perpetual, worldwide, non-exclusive, royalty-free, copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense and distribute its Work and any resulting derivative works in any form.\n\n## 3. Limitations\n\n### 3.1 Redistribution.\nYou may reproduce or distribute the Work only if (a) you do so under this License, (b) you include a complete copy of this License with your distribution, and (c) you retain without modification any copyright, patent, trademark, or attribution notices that are present in the Work.\n\n### 3.2 Derivative Works.\nYou may specify that additional or different terms apply to the use, reproduction, and distribution of your derivative works of the Work (“Your Terms”) only if (a) Your Terms provide that the use limitation in Section 3.3 applies to your derivative works, and (b) you identify the specific derivative works that are subject to Your Terms. Notwithstanding Your Terms, this License (including the redistribution requirements in Section 3.1) will continue to apply to the Work itself.\n\n### 3.3 Use Limitation.\nThe Work and any derivative works thereof only may be used or intended for use non-commercially and with NVIDIA Processors. Notwithstanding the foregoing, NVIDIA and its affiliates may use the Work and any derivative works commercially. As used herein, “non-commercially” means for research or evaluation purposes only.\n\n### 3.4 Patent Claims.\nIf you bring or threaten to bring a patent claim against any Licensor (including any claim, cross-claim or counterclaim in a lawsuit) to enforce any patents that you allege are infringed by any Work, then your rights under this License from such Licensor (including the grant in Section 2.1) will terminate immediately.\n\n### 3.5 Trademarks.\nThis License does not grant any rights to use any Licensor’s or its affiliates’ names, logos, or trademarks, except as necessary to reproduce the notices described in this License.\n\n### 3.6 Termination.\nIf you violate any term of this License, then your rights under this License (including the grant in Section 2.1) will terminate immediately.\n\n## 4. Disclaimer of Warranty.\n\nTHE WORK IS PROVIDED “AS IS” WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON-INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER THIS LICENSE.\n\n## 5. Limitation of Liability.\n\nEXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF OR RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK (INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION, LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY OTHER COMMERCIAL DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES."
  },
  {
    "path": "README.md",
    "content": "## Bi3D &mdash; Official PyTorch Implementation\n\n![Teaser image](data/teaser.png)\n\n**Bi3D: Stereo Depth Estimation via Binary Classifications**<br>\nAbhishek Badki, Alejandro Troccoli, Kihwan Kim, Jan Kautz, Pradeep Sen, and Orazio Gallo<br>\nIEEE CVPR 2020<br>\n\n## Abstract: \n*Stereo-based depth estimation is a cornerstone of computer vision, with state-of-the-art methods delivering accurate results in real time. For several applications such as autonomous navigation, however, it may be useful to trade accuracy for lower latency. We present Bi3D, a method that estimates depth via a series of binary classifications. Rather than testing if objects are* at *a particular depth D, as existing stereo methods do, it classifies them as being* closer *or* farther *than D. This property offers a powerful mechanism to balance accuracy and latency. Given a strict time budget, Bi3D can detect objects closer than a given distance in as little as a few milliseconds, or estimate depth with arbitrarily coarse quantization, with complexity linear with the number of quantization levels. Bi3D can also use the allotted quantization levels to get continuous depth, but in a specific depth range. For standard stereo (i.e., continuous depth on the whole range), our method is close to or on par with state-of-the-art, finely tuned stereo methods.*\n\n\n## Paper:\nhttps://arxiv.org/pdf/2005.07274.pdf<br>\n\n## Videos:<br>\n<a href=\"https://www.youtube.com/watch?v=HuEwjpw5O64&feature=youtu.be\">\n  <img src=\"https://img.youtube.com/vi/HuEwjpw5O64/0.jpg\" width=\"300\"/>\n</a>\n<a href=\"https://www.youtube.com/watch?v=UfvUny4pdMA&feature=youtu.be\">\n  <img src=\"https://img.youtube.com/vi/UfvUny4pdMA/0.jpg\" width=\"300\"/>\n</a>\n<a href=\"https://www.youtube.com/watch?v=Ifgcm6VI3NE&feature=youtu.be\">\n  <img src=\"https://img.youtube.com/vi/Ifgcm6VI3NE/0.jpg\" width=\"300\"/>\n</a>\n\n## Citing Bi3D:\n    @InProceedings{badki2020Bi3D,\n    author = {Badki, Abhishek and Troccoli, Alejandro and Kim, Kihwan and Kautz, Jan and Sen, Pradeep and Gallo, Orazio},\n    title = {{Bi3D}: {S}tereo Depth Estimation via Binary Classifications},\n    booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},\n    year = {2020}\n    }\n\nor the arXiv paper\n\n    @InProceedings{badki2020Bi3D,\n    author = {Badki, Abhishek and Troccoli, Alejandro and Kim, Kihwan and Kautz, Jan and Sen, Pradeep and Gallo, Orazio},\n    title = {{Bi3D}: {S}tereo Depth Estimation via Binary Classifications},\n    booktitle = {arXiv preprint arXiv:2005.07274},\n    year = {2020}\n    }\n\n\n## Code:<br>\n\n### License\n\nCopyright (C) 2020 NVIDIA Corporation.  All rights reserved.\n\nLicensed under the [NVIDIA Source Code License](LICENSE.md)\n\n### Description\n\n\n### Setup\n\nWe offer two ways of setting up your environemnt, through Docker or Conda.\n\n#### Docker\nFor convenience, we provide a Dockerfile to build a container image to run the code. The image will contain the Python dependencies.\n\nSystem requirements:\n\n1. Docker (Tested on version 19.03.11)\n\n2. [NVIDIA Docker](https://github.com/NVIDIA/nvidia-docker/wiki)\n\n3. NVIDIA GPU driver.\n\nBuild the container image:\n```\ndocker build -t bi3d . -f envs/bi3d_pytorch_19_01.DockerFile\n```\nTo launch the container, run the following:\n```\ndocker run --rm -it --gpus=all -v $(pwd):/bi3d -w /bi3d --net=host --ipc=host bi3d:latest /bin/bash\n```\n\n#### Conda\nAll dependencies will be installed automatically using the following:\n```\nconda env create -f envs/bi3d_conda_env.yml \n```\nYou can activate the environment by running:\n```\nconda activate bi3d\n```\n\n### Pre-trained models\nDownload the pre-trained models [here](https://drive.google.com/file/d/1X4Ing9WumtIxonNXXCzKJulJtPgzk61n).\n\n### Run the demo\n\n```\ncd src\n# RUN DEMO FOR SCENEFLOW DATASET \nsh run_demo_sf.sh\n# RUN DEMO FOR KITTI15 DATASET\nsh run_demo_kitti15.sh\n```\n"
  },
  {
    "path": "envs/bi3d_conda_env.yml",
    "content": "name: bi3d\nchannels:\n  - pytorch\n  - soumith\n  - defaults\ndependencies:\n  - _libgcc_mutex=0.1=main\n  - blas=1.0=mkl\n  - ca-certificates=2020.6.24=0\n  - certifi=2020.6.20=py37_0\n  - cudatoolkit=10.0.130=0\n  - freetype=2.10.2=h5ab3b9f_0\n  - intel-openmp=2020.1=217\n  - jpeg=9b=h024ee3a_2\n  - lcms2=2.11=h396b838_0\n  - ld_impl_linux-64=2.33.1=h53a641e_7\n  - libedit=3.1.20191231=h14c3975_1\n  - libffi=3.3=he6710b0_2\n  - libgcc-ng=9.1.0=hdf63c60_0\n  - libgfortran-ng=7.3.0=hdf63c60_0\n  - libpng=1.6.37=hbc83047_0\n  - libstdcxx-ng=9.1.0=hdf63c60_0\n  - libtiff=4.1.0=h2733197_1\n  - lz4-c=1.9.2=he6710b0_0\n  - mkl=2020.1=217\n  - mkl-service=2.3.0=py37he904b0f_0\n  - mkl_fft=1.1.0=py37h23d657b_0\n  - mkl_random=1.1.1=py37h0573a6f_0\n  - ncurses=6.2=he6710b0_1\n  - ninja=1.9.0=py37hfd86e86_0\n  - numpy=1.18.5=py37ha1c710e_0\n  - numpy-base=1.18.5=py37hde5b4d6_0\n  - olefile=0.46=py_0\n  - openssl=1.1.1g=h7b6447c_0\n  - pillow=7.2.0=py37hb39fc2d_0\n  - pip=20.1.1=py37_1\n  - python=3.7.7=hcff3b4d_5\n  - pytorch=1.4.0=py3.7_cuda10.0.130_cudnn7.6.3_0\n  - readline=8.0=h7b6447c_0\n  - setuptools=49.2.0=py37_0\n  - six=1.15.0=py_0\n  - sqlite=3.32.3=h62c20be_0\n  - tk=8.6.10=hbc83047_0\n  - torchvision=0.5.0=py37_cu100\n  - wheel=0.34.2=py37_0\n  - xz=5.2.5=h7b6447c_0\n  - zlib=1.2.11=h7b6447c_3\n  - zstd=1.4.5=h0b5b093_0\n  - pip:\n    - imageio==2.9.0\n    - opencv-python==4.3.0.36\n    - protobuf==3.12.2\n    - tensorboardx==2.1\n\n"
  },
  {
    "path": "envs/bi3d_pytorch_19_01.DockerFile",
    "content": "FROM nvcr.io/nvidia/pytorch:19.01-py3\n\nRUN pip install Pillow\nRUN pip install imageio\nRUN pip install tensorboardX\nRUN pip install opencv-python\n"
  },
  {
    "path": "src/models/Bi3DNet.py",
    "content": "# Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.\n#\n# NVIDIA CORPORATION and its licensors retain all intellectual property\n# and proprietary rights in and to this software, related documentation\n# and any modifications thereto.  Any use, reproduction, disclosure or\n# distribution of this software and related documentation without an express\n# license agreement from NVIDIA CORPORATION is strictly prohibited.\n\nimport numpy as np\nimport torch\nimport torch.nn as nn\nfrom torch.autograd import Variable\nimport torch.nn.functional as F\n\nimport models.FeatExtractNet as FeatNet\nimport models.SegNet2D as SegNet\nimport models.RefineNet2D as RefineNet\nimport models.RefineNet3D as RefineNet3D\n\n\n__all__ = [\"bi3dnet_binary_depth\", \"bi3dnet_continuous_depth_2D\", \"bi3dnet_continuous_depth_3D\"]\n\n\ndef compute_cost_volume(features_left, features_right, disp_ids, max_disp, is_disps_per_example):\n\n    batch_size = features_left.shape[0]\n    feature_size = features_left.shape[1]\n    H = features_left.shape[2]\n    W = features_left.shape[3]\n\n    psv_size = disp_ids.shape[1]\n\n    psv = Variable(features_left.new_zeros(batch_size, psv_size, feature_size * 2, H, W + max_disp)).cuda()\n\n    if is_disps_per_example:\n        for i in range(batch_size):\n            psv[i, 0, :feature_size, :, 0:W] = features_left[i]\n            psv[i, 0, feature_size:, :, disp_ids[i, 0] : W + disp_ids[i, 0]] = features_right[i]\n        psv = psv.contiguous()\n    else:\n        for i in range(psv_size):\n            psv[:, i, :feature_size, :, 0:W] = features_left\n            psv[:, i, feature_size:, :, disp_ids[0, i] : W + disp_ids[0, i]] = features_right\n        psv = psv.contiguous()\n\n    return psv\n\n\n\"\"\"\nBi3DNet for continuous depthmap generation. Doesn't use 3D regularization.\n\"\"\"\n\n\nclass Bi3DNetContinuousDepth2D(nn.Module):\n    def __init__(self, options, featnet_arch, segnet_arch, refinenet_arch=None, max_disparity=192):\n\n        super(Bi3DNetContinuousDepth2D, self).__init__()\n\n        self.max_disparity = max_disparity\n        self.max_disparity_seg = int(self.max_disparity / 3)\n        self.is_disps_per_example = False\n        self.is_save_memory = False\n\n        self.is_refine = True\n        if refinenet_arch == None:\n            self.is_refine = False\n\n        self.featnet = FeatNet.__dict__[featnet_arch](options, data=None)\n        self.segnet = SegNet.__dict__[segnet_arch](options, data=None)\n        if self.is_refine:\n            self.refinenet = RefineNet.__dict__[refinenet_arch](options, data=None)\n\n        return\n\n    def forward(self, img_left, img_right, disp_ids):\n\n        batch_size = img_left.shape[0]\n        psv_size = disp_ids.shape[1]\n\n        if psv_size == 1:\n            self.is_disps_per_example = True\n        else:\n            self.is_disps_per_example = False\n\n        # Feature Extraction\n        features_left = self.featnet(img_left)\n        features_right = self.featnet(img_right)\n        feature_size = features_left.shape[1]\n        H = features_left.shape[2]\n        W = features_left.shape[3]\n\n        # Cost Volume Generation\n        psv = compute_cost_volume(\n            features_left, features_right, disp_ids, self.max_disparity_seg, self.is_disps_per_example\n        )\n\n        psv = psv.view(batch_size * psv_size, feature_size * 2, H, W + self.max_disparity_seg)\n\n        # Segmentation Network\n        seg_raw_low_res = self.segnet(psv)[:, :, :, :W]\n        seg_raw_low_res = seg_raw_low_res.view(batch_size, 1, psv_size, H, W)\n\n        # Upsampling\n        seg_prob_low_res_up = torch.sigmoid(\n            F.interpolate(\n                seg_raw_low_res,\n                size=[psv_size * 3, img_left.size()[-2], img_left.size()[-1]],\n                mode=\"trilinear\",\n                align_corners=False,\n            )\n        )\n        seg_prob_low_res_up = seg_prob_low_res_up[:, 0, 1:-1, :, :]\n\n        # Projection\n        disparity_normalized = torch.mean((seg_prob_low_res_up), dim=1, keepdim=True)\n\n        # Refinement\n        if self.is_refine:\n            refine_net_input = torch.cat((disparity_normalized, img_left), dim=1)\n            disparity_normalized = self.refinenet(refine_net_input)\n\n        return seg_prob_low_res_up, disparity_normalized\n\n\ndef bi3dnet_continuous_depth_2D(options, data=None):\n\n    print(\"==> USING Bi3DNetContinuousDepth2D\")\n    for key in options:\n        if \"bi3dnet\" in key:\n            print(\"{} : {}\".format(key, options[key]))\n\n    model = Bi3DNetContinuousDepth2D(\n        options,\n        featnet_arch=options[\"bi3dnet_featnet_arch\"],\n        segnet_arch=options[\"bi3dnet_segnet_arch\"],\n        refinenet_arch=options[\"bi3dnet_refinenet_arch\"],\n        max_disparity=options[\"bi3dnet_max_disparity\"],\n    )\n\n    if data is not None:\n        model.load_state_dict(data[\"state_dict\"])\n\n    return model\n\n\n\"\"\"\nBi3DNet for continuous depthmap generation. Uses 3D regularization.\n\"\"\"\n\n\nclass Bi3DNetContinuousDepth3D(nn.Module):\n    def __init__(\n        self,\n        options,\n        featnet_arch,\n        segnet_arch,\n        refinenet_arch=None,\n        refinenet3d_arch=None,\n        max_disparity=192,\n    ):\n\n        super(Bi3DNetContinuousDepth3D, self).__init__()\n\n        self.max_disparity = max_disparity\n        self.max_disparity_seg = int(self.max_disparity / 3)\n        self.is_disps_per_example = False\n        self.is_save_memory = False\n\n        self.is_refine = True\n        if refinenet_arch == None:\n            self.is_refine = False\n\n        self.featnet = FeatNet.__dict__[featnet_arch](options, data=None)\n        self.segnet = SegNet.__dict__[segnet_arch](options, data=None)\n        if self.is_refine:\n            self.refinenet = RefineNet.__dict__[refinenet_arch](options, data=None)\n            self.refinenet3d = RefineNet3D.__dict__[refinenet3d_arch](options, data=None)\n\n        return\n\n    def forward(self, img_left, img_right, disp_ids):\n\n        batch_size = img_left.shape[0]\n        psv_size = disp_ids.shape[1]\n\n        if psv_size == 1:\n            self.is_disps_per_example = True\n        else:\n            self.is_disps_per_example = False\n\n        # Feature Extraction\n        features_left = self.featnet(img_left)\n        features_right = self.featnet(img_right)\n        feature_size = features_left.shape[1]\n        H = features_left.shape[2]\n        W = features_left.shape[3]\n\n        # Cost Volume Generation\n        psv = compute_cost_volume(\n            features_left, features_right, disp_ids, self.max_disparity_seg, self.is_disps_per_example\n        )\n\n        psv = psv.view(batch_size * psv_size, feature_size * 2, H, W + self.max_disparity_seg)\n\n        # Segmentation Network\n        seg_raw_low_res = self.segnet(psv)[:, :, :, :W]  # cropped to remove excess boundary\n        seg_raw_low_res = seg_raw_low_res.view(batch_size, 1, psv_size, H, W)\n\n        # Upsampling\n        seg_prob_low_res_up = torch.sigmoid(\n            F.interpolate(\n                seg_raw_low_res,\n                size=[psv_size * 3, img_left.size()[-2], img_left.size()[-1]],\n                mode=\"trilinear\",\n                align_corners=False,\n            )\n        )\n\n        seg_prob_low_res_up = seg_prob_low_res_up[:, 0, 1:-1, :, :]\n\n        # Upsampling after 3D Regularization\n        seg_raw_low_res_refined = seg_raw_low_res\n        seg_raw_low_res_refined[:, :, 1:, :, :] = self.refinenet3d(\n            features_left, seg_raw_low_res_refined[:, :, 1:, :, :]\n        )\n\n        seg_prob_low_res_refined_up = torch.sigmoid(\n            F.interpolate(\n                seg_raw_low_res_refined,\n                size=[psv_size * 3, img_left.size()[-2], img_left.size()[-1]],\n                mode=\"trilinear\",\n                align_corners=False,\n            )\n        )\n\n        seg_prob_low_res_refined_up = seg_prob_low_res_refined_up[:, 0, 1:-1, :, :]\n\n        # Projection\n        disparity_normalized_noisy = torch.mean((seg_prob_low_res_refined_up), dim=1, keepdim=True)\n\n        # Refinement\n        if self.is_refine:\n            refine_net_input = torch.cat((disparity_normalized_noisy, img_left), dim=1)\n            disparity_normalized = self.refinenet(refine_net_input)\n\n        return (\n            seg_prob_low_res_up,\n            seg_prob_low_res_refined_up,\n            disparity_normalized_noisy,\n            disparity_normalized,\n        )\n\n\ndef bi3dnet_continuous_depth_3D(options, data=None):\n\n    print(\"==> USING Bi3DNetContinuousDepth3D\")\n    for key in options:\n        if \"bi3dnet\" in key:\n            print(\"{} : {}\".format(key, options[key]))\n\n    model = Bi3DNetContinuousDepth3D(\n        options,\n        featnet_arch=options[\"bi3dnet_featnet_arch\"],\n        segnet_arch=options[\"bi3dnet_segnet_arch\"],\n        refinenet_arch=options[\"bi3dnet_refinenet_arch\"],\n        refinenet3d_arch=options[\"bi3dnet_regnet_arch\"],\n        max_disparity=options[\"bi3dnet_max_disparity\"],\n    )\n\n    if data is not None:\n        model.load_state_dict(data[\"state_dict\"])\n\n    return model\n\n\n\"\"\"\nBi3DNet for binary depthmap generation.\n\"\"\"\n\n\nclass Bi3DNetBinaryDepth(nn.Module):\n    def __init__(\n        self,\n        options,\n        featnet_arch,\n        segnet_arch,\n        refinenet_arch=None,\n        featnethr_arch=None,\n        max_disparity=192,\n        is_disps_per_example=False,\n    ):\n\n        super(Bi3DNetBinaryDepth, self).__init__()\n\n        self.max_disparity = max_disparity\n        self.max_disparity_seg = int(max_disparity / 3)\n        self.is_disps_per_example = is_disps_per_example\n\n        self.is_refine = True\n        if refinenet_arch == None:\n            self.is_refine = False\n\n        self.featnet = FeatNet.__dict__[featnet_arch](options, data=None)\n        self.featnethr = FeatNet.__dict__[featnethr_arch](options, data=None)\n        self.segnet = SegNet.__dict__[segnet_arch](options, data=None)\n        if self.is_refine:\n            self.refinenet = RefineNet.__dict__[refinenet_arch](options, data=None)\n\n        return\n\n    def forward(self, img_left, img_right, disp_ids):\n\n        batch_size = img_left.shape[0]\n        psv_size = disp_ids.shape[1]\n\n        if psv_size == 1:\n            self.is_disps_per_example = True\n        else:\n            self.is_disps_per_example = False\n\n        # Feature Extraction\n        features = self.featnet(torch.cat((img_left, img_right), dim=0))\n\n        features_left = features[:batch_size, :, :, :]\n        features_right = features[batch_size:, :, :, :]\n\n        if self.is_refine:\n            features_lefthr = self.featnethr(img_left)\n        feature_size = features_left.shape[1]\n        H = features_left.shape[2]\n        W = features_left.shape[3]\n\n        # Cost Volume Generation\n        psv = compute_cost_volume(\n            features_left, features_right, disp_ids, self.max_disparity_seg, self.is_disps_per_example\n        )\n\n        psv = psv.view(batch_size * psv_size, feature_size * 2, H, W + self.max_disparity_seg)\n\n        # Segmentation Network\n        seg_raw_low_res = self.segnet(psv)[:, :, :, :W]  # cropped to remove excess boundary\n        seg_prob_low_res = torch.sigmoid(seg_raw_low_res)\n        seg_prob_low_res = seg_prob_low_res.view(batch_size, psv_size, H, W)\n\n        seg_prob_low_res_up = F.interpolate(\n            seg_prob_low_res, size=img_left.size()[-2:], mode=\"bilinear\", align_corners=False\n        )\n        out = []\n        out.append(seg_prob_low_res_up)\n\n        # Refinement\n        if self.is_refine:\n            seg_raw_high_res = F.interpolate(\n                seg_raw_low_res, size=img_left.size()[-2:], mode=\"bilinear\", align_corners=False\n            )\n            # Refine Net\n            features_left_expand = (\n                features_lefthr[:, None, :, :, :].expand(-1, psv_size, -1, -1, -1).contiguous()\n            )\n            features_left_expand = features_left_expand.view(\n                -1, features_lefthr.size()[1], features_lefthr.size()[2], features_lefthr.size()[3]\n            )\n            refine_net_input = torch.cat((seg_raw_high_res, features_left_expand), dim=1)\n\n            seg_raw_high_res = self.refinenet(refine_net_input)\n\n            seg_prob_high_res = torch.sigmoid(seg_raw_high_res)\n            seg_prob_high_res = seg_prob_high_res.view(\n                batch_size, psv_size, img_left.size()[-2], img_left.size()[-1]\n            )\n            out.append(seg_prob_high_res)\n        else:\n            out.append(seg_prob_low_res_up)\n\n        return out\n\n\ndef bi3dnet_binary_depth(options, data=None):\n\n    print(\"==> USING Bi3DNetBinaryDepth\")\n    for key in options:\n        if \"bi3dnet\" in key:\n            print(\"{} : {}\".format(key, options[key]))\n\n    model = Bi3DNetBinaryDepth(\n        options,\n        featnet_arch=options[\"bi3dnet_featnet_arch\"],\n        segnet_arch=options[\"bi3dnet_segnet_arch\"],\n        refinenet_arch=options[\"bi3dnet_refinenet_arch\"],\n        featnethr_arch=options[\"bi3dnet_featnethr_arch\"],\n        max_disparity=options[\"bi3dnet_max_disparity\"],\n        is_disps_per_example=options[\"bi3dnet_disps_per_example_true\"],\n    )\n\n    if data is not None:\n        model.load_state_dict(data[\"state_dict\"])\n\n    return model\n"
  },
  {
    "path": "src/models/DispRefine2D.py",
    "content": "# MIT License\n#\n# Copyright (c) 2019 Xuanyi Li (xuanyili.edu@gmail.com)\n# Copyright (c) 2020 NVIDIA\n#\n# Permission is hereby granted, free of charge, to any person obtaining a copy\n# of this software and associated documentation files (the \"Software\"), to deal\n# in the Software without restriction, including without limitation the rights\n# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n# copies of the Software, and to permit persons to whom the Software is\n# furnished to do so, subject to the following conditions:\n#\n# The above copyright notice and this permission notice shall be included in all\n# copies or substantial portions of the Software.\n#\n# THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n# SOFTWARE.\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport math\n\nfrom models.PSMNet import conv2d\nfrom models.PSMNet import conv2d_lrelu\n\n\"\"\"\nThe code in this file is adapted\nfrom https://github.com/meteorshowers/StereoNet-ActiveStereoNet\n\"\"\"\n\n\nclass BasicBlock(nn.Module):\n\n    expansion = 1\n\n    def __init__(self, inplanes, planes, stride, downsample, pad, dilation):\n\n        super(BasicBlock, self).__init__()\n\n        self.conv1 = conv2d_lrelu(inplanes, planes, 3, stride, pad, dilation)\n        self.conv2 = conv2d(planes, planes, 3, 1, pad, dilation)\n\n        self.downsample = downsample\n        self.stride = stride\n\n    def forward(self, x):\n\n        out = self.conv1(x)\n        out = self.conv2(out)\n\n        if self.downsample is not None:\n            x = self.downsample(x)\n\n        out += x\n\n        return out\n\n\nclass DispRefineNet(nn.Module):\n    def __init__(self, out_planes=32):\n\n        super(DispRefineNet, self).__init__()\n\n        self.out_planes = out_planes\n\n        self.conv2d_feature = conv2d_lrelu(\n            in_planes=4, out_planes=self.out_planes, kernel_size=3, stride=1, pad=1, dilation=1\n        )\n\n        self.residual_astrous_blocks = nn.ModuleList()\n        astrous_list = [1, 2, 4, 8, 1, 1]\n        for di in astrous_list:\n            self.residual_astrous_blocks.append(\n                BasicBlock(self.out_planes, self.out_planes, stride=1, downsample=None, pad=1, dilation=di)\n            )\n\n        self.conv2d_out = nn.Conv2d(self.out_planes, 1, kernel_size=3, stride=1, padding=1)\n\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels\n                m.weight.data.normal_(0, math.sqrt(2.0 / n))\n            elif isinstance(m, nn.Conv3d):\n                n = m.kernel_size[0] * m.kernel_size[1] * m.kernel_size[2] * m.out_channels\n                m.weight.data.normal_(0, math.sqrt(2.0 / n))\n            elif isinstance(m, nn.BatchNorm2d):\n                m.weight.data.fill_(1)\n                m.bias.data.zero_()\n            elif isinstance(m, nn.BatchNorm3d):\n                m.weight.data.fill_(1)\n                m.bias.data.zero_()\n            elif isinstance(m, nn.Linear):\n                m.bias.data.zero_()\n\n        return\n\n    def forward(self, x):\n\n        disp = x[:, 0, :, :][:, None, :, :]\n        output = self.conv2d_feature(x)\n\n        for astrous_block in self.residual_astrous_blocks:\n            output = astrous_block(output)\n\n        output = self.conv2d_out(output)  # residual disparity\n        output = output + disp  # final disparity\n\n        return output\n"
  },
  {
    "path": "src/models/FeatExtractNet.py",
    "content": "# Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.\n#\n# NVIDIA CORPORATION and its licensors retain all intellectual property\n# and proprietary rights in and to this software, related documentation\n# and any modifications thereto.  Any use, reproduction, disclosure or\n# distribution of this software and related documentation without an express\n# license agreement from NVIDIA CORPORATION is strictly prohibited.\n\nfrom __future__ import print_function\nimport torch\nimport torch.nn as nn\nimport math\n\nfrom models.PSMNet import conv2d\nfrom models.PSMNet import conv2d_relu\nfrom models.PSMNet import FeatExtractNetSPP\n\n__all__ = [\"featextractnetspp\", \"featextractnethr\"]\n\n\n\"\"\"\nFeature extraction network. \nGenerates 16D features at the image resolution.\nUsed for final refinement. \n\"\"\"\n\n\nclass FeatExtractNetHR(nn.Module):\n    def __init__(self, out_planes=16):\n\n        super(FeatExtractNetHR, self).__init__()\n\n        self.conv1 = nn.Sequential(\n            conv2d_relu(3, out_planes, kernel_size=3, stride=1, pad=1, dilation=1),\n            conv2d_relu(out_planes, out_planes, kernel_size=3, stride=1, pad=1, dilation=1),\n            nn.Conv2d(out_planes, out_planes, kernel_size=1, padding=0, stride=1, bias=False),\n        )\n\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels\n                m.weight.data.normal_(0, math.sqrt(2.0 / n))\n            elif isinstance(m, nn.Conv3d):\n                n = m.kernel_size[0] * m.kernel_size[1] * m.kernel_size[2] * m.out_channels\n                m.weight.data.normal_(0, math.sqrt(2.0 / n))\n            elif isinstance(m, nn.BatchNorm2d):\n                m.weight.data.fill_(1)\n                m.bias.data.zero_()\n            elif isinstance(m, nn.BatchNorm3d):\n                m.weight.data.fill_(1)\n                m.bias.data.zero_()\n            elif isinstance(m, nn.Linear):\n                m.bias.data.zero_()\n\n        return\n\n    def forward(self, input):\n\n        output = self.conv1(input)\n        return output\n\n\ndef featextractnethr(options, data=None):\n\n    print(\"==> USING FeatExtractNetHR\")\n    for key in options:\n        if \"featextractnethr\" in key:\n            print(\"{} : {}\".format(key, options[key]))\n\n    model = FeatExtractNetHR(out_planes=options[\"featextractnethr_out_planes\"])\n\n    if data is not None:\n        model.load_state_dict(data[\"state_dict\"])\n\n    return model\n\n\n\"\"\"\nFeature extraction network. \nGenerates 32D features at 3x less resolution.\nUses Spatial Pyramid Pooling inspired by PSMNet.\n\"\"\"\n\n\ndef featextractnetspp(options, data=None):\n\n    print(\"==> USING FeatExtractNetSPP\")\n    for key in options:\n        if \"feat\" in key:\n            print(\"{} : {}\".format(key, options[key]))\n\n    model = FeatExtractNetSPP()\n\n    if data is not None:\n        model.load_state_dict(data[\"state_dict\"])\n\n    return model\n"
  },
  {
    "path": "src/models/GCNet.py",
    "content": "# Copyright (c) 2018 Wang Yufeng\n# Copyright (c) 2020 NVIDIA\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\nimport torch\nimport torch.nn as nn\n\n\"\"\"\nThe code in this file is adapted from https://github.com/wyf2017/DSMnet\n\"\"\"\n\n\ndef conv3d_relu(in_planes, out_planes, kernel_size=3, stride=1, activefun=nn.ReLU(inplace=True)):\n\n    return nn.Sequential(\n        nn.Conv3d(in_planes, out_planes, kernel_size, stride, padding=(kernel_size - 1) // 2, bias=True),\n        activefun,\n    )\n\n\ndef deconv3d_relu(in_planes, out_planes, kernel_size=4, stride=2, activefun=nn.ReLU(inplace=True)):\n\n    assert stride > 1\n    p = (kernel_size - 1) // 2\n    op = stride - (kernel_size - 2 * p)\n    return nn.Sequential(\n        nn.ConvTranspose3d(\n            in_planes, out_planes, kernel_size, stride, padding=p, output_padding=op, bias=True\n        ),\n        activefun,\n    )\n\n\n\"\"\"\nGCNet style 3D regularization network\n\"\"\"\n\n\nclass feature3d(nn.Module):\n    def __init__(self, num_F):\n\n        super(feature3d, self).__init__()\n        self.F = num_F\n\n        self.l19 = conv3d_relu(self.F + 32, self.F, kernel_size=3, stride=1)\n        self.l20 = conv3d_relu(self.F, self.F, kernel_size=3, stride=1)\n\n        self.l21 = conv3d_relu(self.F + 32, self.F * 2, kernel_size=3, stride=2)\n        self.l22 = conv3d_relu(self.F * 2, self.F * 2, kernel_size=3, stride=1)\n        self.l23 = conv3d_relu(self.F * 2, self.F * 2, kernel_size=3, stride=1)\n\n        self.l24 = conv3d_relu(self.F * 2, self.F * 2, kernel_size=3, stride=2)\n        self.l25 = conv3d_relu(self.F * 2, self.F * 2, kernel_size=3, stride=1)\n        self.l26 = conv3d_relu(self.F * 2, self.F * 2, kernel_size=3, stride=1)\n\n        self.l27 = conv3d_relu(self.F * 2, self.F * 2, kernel_size=3, stride=2)\n        self.l28 = conv3d_relu(self.F * 2, self.F * 2, kernel_size=3, stride=1)\n        self.l29 = conv3d_relu(self.F * 2, self.F * 2, kernel_size=3, stride=1)\n\n        self.l30 = conv3d_relu(self.F * 2, self.F * 4, kernel_size=3, stride=2)\n        self.l31 = conv3d_relu(self.F * 4, self.F * 4, kernel_size=3, stride=1)\n        self.l32 = conv3d_relu(self.F * 4, self.F * 4, kernel_size=3, stride=1)\n\n        self.l33 = deconv3d_relu(self.F * 4, self.F * 2, kernel_size=3, stride=2)\n        self.l34 = deconv3d_relu(self.F * 2, self.F * 2, kernel_size=3, stride=2)\n        self.l35 = deconv3d_relu(self.F * 2, self.F * 2, kernel_size=3, stride=2)\n        self.l36 = deconv3d_relu(self.F * 2, self.F, kernel_size=3, stride=2)\n\n        self.l37 = nn.Conv3d(self.F, 1, kernel_size=3, stride=1, padding=1, bias=True)\n\n    def forward(self, x):\n\n        x18 = x\n        x21 = self.l21(x18)\n        x24 = self.l24(x21)\n        x27 = self.l27(x24)\n        x30 = self.l30(x27)\n        x31 = self.l31(x30)\n        x32 = self.l32(x31)\n\n        x29 = self.l29(self.l28(x27))\n        x33 = self.l33(x32) + x29\n\n        x26 = self.l26(self.l25(x24))\n        x34 = self.l34(x33) + x26\n\n        x23 = self.l23(self.l22(x21))\n        x35 = self.l35(x34) + x23\n\n        x20 = self.l20(self.l19(x18))\n        x36 = self.l36(x35) + x20\n\n        x37 = self.l37(x36)\n\n        conf_volume_wo_sig = x37\n\n        return conf_volume_wo_sig\n"
  },
  {
    "path": "src/models/PSMNet.py",
    "content": "# MIT License\n#\n# Copyright (c) 2018 Jia-Ren Chang\n# Copyright (c) 2020 NVIDIA\n#\n# Permission is hereby granted, free of charge, to any person obtaining a copy\n# of this software and associated documentation files (the \"Software\"), to deal\n# in the Software without restriction, including without limitation the rights\n# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n# copies of the Software, and to permit persons to whom the Software is\n# furnished to do so, subject to the following conditions:\n#\n# The above copyright notice and this permission notice shall be included in all\n# copies or substantial portions of the Software.\n#\n# THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n# SOFTWARE.\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport math\n\n\"\"\"\nThe code in this file is adapted from https://github.com/JiaRenChang/PSMNet\n\"\"\"\n\n\ndef conv2d(in_planes, out_planes, kernel_size, stride, pad, dilation):\n\n    return nn.Sequential(\n        nn.Conv2d(\n            in_planes,\n            out_planes,\n            kernel_size=kernel_size,\n            stride=stride,\n            padding=dilation if dilation > 1 else pad,\n            dilation=dilation,\n            bias=True,\n        )\n    )\n\n\ndef conv2d_relu(in_planes, out_planes, kernel_size, stride, pad, dilation):\n\n    return nn.Sequential(\n        nn.Conv2d(\n            in_planes,\n            out_planes,\n            kernel_size=kernel_size,\n            stride=stride,\n            padding=dilation if dilation > 1 else pad,\n            dilation=dilation,\n            bias=True,\n        ),\n        nn.ReLU(inplace=True),\n    )\n\n\ndef conv2d_lrelu(in_planes, out_planes, kernel_size, stride, pad, dilation=1):\n\n    return nn.Sequential(\n        nn.Conv2d(\n            in_planes,\n            out_planes,\n            kernel_size=kernel_size,\n            stride=stride,\n            padding=dilation if dilation > 1 else pad,\n            dilation=dilation,\n            bias=True,\n        ),\n        nn.LeakyReLU(0.1, inplace=True),\n    )\n\n\nclass BasicBlock(nn.Module):\n\n    expansion = 1\n\n    def __init__(self, inplanes, planes, stride, downsample, pad, dilation):\n\n        super(BasicBlock, self).__init__()\n\n        self.conv1 = conv2d_relu(inplanes, planes, 3, stride, pad, dilation)\n        self.conv2 = conv2d(planes, planes, 3, 1, pad, dilation)\n\n        self.downsample = downsample\n        self.stride = stride\n\n    def forward(self, x):\n\n        out = self.conv1(x)\n        out = self.conv2(out)\n\n        if self.downsample is not None:\n            x = self.downsample(x)\n\n        out += x\n\n        return out\n\n\nclass FeatExtractNetSPP(nn.Module):\n    def __init__(self):\n\n        super(FeatExtractNetSPP, self).__init__()\n\n        self.align_corners = False\n        self.inplanes = 32\n\n        self.firstconv = nn.Sequential(\n            conv2d_relu(3, 32, 3, 3, 1, 1), conv2d_relu(32, 32, 3, 1, 1, 1), conv2d_relu(32, 32, 3, 1, 1, 1)\n        )\n\n        self.layer1 = self._make_layer(BasicBlock, 32, 2, 1, 1, 2)\n\n        self.branch1 = nn.Sequential(nn.AvgPool2d((64, 64), stride=(64, 64)), conv2d_relu(32, 32, 1, 1, 0, 1))\n\n        self.branch2 = nn.Sequential(nn.AvgPool2d((32, 32), stride=(32, 32)), conv2d_relu(32, 32, 1, 1, 0, 1))\n\n        self.branch3 = nn.Sequential(nn.AvgPool2d((16, 16), stride=(16, 16)), conv2d_relu(32, 32, 1, 1, 0, 1))\n\n        self.branch4 = nn.Sequential(nn.AvgPool2d((8, 8), stride=(8, 8)), conv2d_relu(32, 32, 1, 1, 0, 1))\n\n        self.lastconv = nn.Sequential(\n            conv2d_relu(160, 64, 3, 1, 1, 1),\n            nn.Conv2d(64, 32, kernel_size=1, padding=0, stride=1, bias=False),\n        )\n\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels\n                m.weight.data.normal_(0, math.sqrt(2.0 / n))\n            elif isinstance(m, nn.Conv3d):\n                n = m.kernel_size[0] * m.kernel_size[1] * m.kernel_size[2] * m.out_channels\n                m.weight.data.normal_(0, math.sqrt(2.0 / n))\n            elif isinstance(m, nn.BatchNorm2d):\n                m.weight.data.fill_(1)\n                m.bias.data.zero_()\n            elif isinstance(m, nn.BatchNorm3d):\n                m.weight.data.fill_(1)\n                m.bias.data.zero_()\n            elif isinstance(m, nn.Linear):\n                m.bias.data.zero_()\n\n    def _make_layer(self, block, planes, blocks, stride, pad, dilation):\n        downsample = None\n        if stride != 1 or self.inplanes != planes * block.expansion:\n            downsample = nn.Sequential(\n                nn.Conv2d(self.inplanes, planes * block.expansion, kernel_size=1, stride=stride, bias=False),\n                nn.BatchNorm2d(planes * block.expansion),\n            )\n\n        layers = []\n        layers.append(block(self.inplanes, planes, stride, downsample, pad, dilation))\n        self.inplanes = planes * block.expansion\n        for i in range(1, blocks):\n            layers.append(block(self.inplanes, planes, 1, None, pad, dilation))\n\n        return nn.Sequential(*layers)\n\n    def forward(self, input):\n\n        output0 = self.firstconv(input)\n        output1 = self.layer1(output0)\n\n        output_branch1 = self.branch1(output1)\n        output_branch1 = F.interpolate(\n            output_branch1,\n            (output1.size()[2], output1.size()[3]),\n            mode=\"bilinear\",\n            align_corners=self.align_corners,\n        )\n\n        output_branch2 = self.branch2(output1)\n        output_branch2 = F.interpolate(\n            output_branch2,\n            (output1.size()[2], output1.size()[3]),\n            mode=\"bilinear\",\n            align_corners=self.align_corners,\n        )\n\n        output_branch3 = self.branch3(output1)\n        output_branch3 = F.interpolate(\n            output_branch3,\n            (output1.size()[2], output1.size()[3]),\n            mode=\"bilinear\",\n            align_corners=self.align_corners,\n        )\n\n        output_branch4 = self.branch4(output1)\n        output_branch4 = F.interpolate(\n            output_branch4,\n            (output1.size()[2], output1.size()[3]),\n            mode=\"bilinear\",\n            align_corners=self.align_corners,\n        )\n\n        output_feature = torch.cat(\n            (output1, output_branch4, output_branch3, output_branch2, output_branch1), 1\n        )\n\n        output_feature = self.lastconv(output_feature)\n\n        return output_feature\n"
  },
  {
    "path": "src/models/RefineNet2D.py",
    "content": "# Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.\n#\n# NVIDIA CORPORATION and its licensors retain all intellectual property\n# and proprietary rights in and to this software, related documentation\n# and any modifications thereto.  Any use, reproduction, disclosure or\n# distribution of this software and related documentation without an express\n# license agreement from NVIDIA CORPORATION is strictly prohibited.\n\nfrom __future__ import print_function\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport math\nimport argparse\nimport time\nimport torch.backends.cudnn as cudnn\n\nfrom models.PSMNet import conv2d\nfrom models.PSMNet import conv2d_lrelu\n\nfrom models.DispRefine2D import DispRefineNet\n\n__all__ = [\"disprefinenet\", \"segrefinenet\"]\n\n\n\"\"\"\nDisparity refinement network.\nTakes concatenated input image and the disparity map to generate refined disparity map.\nGenerates refined output using input image as guide.\n\"\"\"\n\n\ndef disprefinenet(options, data=None):\n\n    print(\"==> USING DispRefineNet\")\n    for key in options:\n        if \"disprefinenet\" in key:\n            print(\"{} : {}\".format(key, options[key]))\n\n    model = DispRefineNet(out_planes=options[\"disprefinenet_out_planes\"])\n\n    if data is not None:\n        model.load_state_dict(data[\"state_dict\"])\n\n    return model\n\n\n\"\"\"\nBinary segmentation refinement network.\nTakes as input high resolution features of input image and the disparity map.\nGenerates refined output using input image as guide.\n\"\"\"\n\n\nclass SegRefineNet(nn.Module):\n    def __init__(self, in_planes=17, out_planes=8):\n\n        super(SegRefineNet, self).__init__()\n\n        self.conv1 = nn.Sequential(conv2d_lrelu(in_planes, out_planes, kernel_size=3, stride=1, pad=1))\n\n        self.classif1 = nn.Conv2d(out_planes, 1, kernel_size=3, padding=1, stride=1, bias=False)\n\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels\n                m.weight.data.normal_(0, math.sqrt(2.0 / n))\n            elif isinstance(m, nn.Conv3d):\n                n = m.kernel_size[0] * m.kernel_size[1] * m.kernel_size[2] * m.out_channels\n                m.weight.data.normal_(0, math.sqrt(2.0 / n))\n            elif isinstance(m, nn.BatchNorm2d):\n                m.weight.data.fill_(1)\n                m.bias.data.zero_()\n            elif isinstance(m, nn.BatchNorm3d):\n                m.weight.data.fill_(1)\n                m.bias.data.zero_()\n            elif isinstance(m, nn.Linear):\n                m.bias.data.zero_()\n\n    def forward(self, input):\n\n        output0 = self.conv1(input)\n        output = self.classif1(output0)\n\n        return output\n\n\ndef segrefinenet(options, data=None):\n\n    print(\"==> USING SegRefineNet\")\n    for key in options:\n        if \"segrefinenet\" in key:\n            print(\"{} : {}\".format(key, options[key]))\n\n    model = SegRefineNet(\n        in_planes=options[\"segrefinenet_in_planes\"], out_planes=options[\"segrefinenet_out_planes\"]\n    )\n\n    if data is not None:\n        model.load_state_dict(data[\"state_dict\"])\n\n    return model\n"
  },
  {
    "path": "src/models/RefineNet3D.py",
    "content": "# Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.\n#\n# NVIDIA CORPORATION and its licensors retain all intellectual property\n# and proprietary rights in and to this software, related documentation\n# and any modifications thereto.  Any use, reproduction, disclosure or\n# distribution of this software and related documentation without an express\n# license agreement from NVIDIA CORPORATION is strictly prohibited.\n\nimport torch\nimport torch.nn as nn\nimport numpy as np\n\n__all__ = [\"segregnet3d\"]\n\nfrom models.GCNet import conv3d_relu\nfrom models.GCNet import deconv3d_relu\nfrom models.GCNet import feature3d\n\n\ndef net_init(net):\n\n    for m in net.modules():\n        if isinstance(m, nn.Linear):\n            m.weight.data = fanin_init(m.weight.data.size())\n        elif isinstance(m, nn.Conv3d):\n            n = m.kernel_size[0] * m.kernel_size[1] * m.kernel_size[2] * m.out_channels\n            m.weight.data.normal_(0, np.sqrt(2.0 / n))\n        elif isinstance(m, nn.Conv2d):\n            n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels\n            m.weight.data.normal_(0, np.sqrt(2.0 / n))\n        elif isinstance(m, nn.Conv1d):\n            n = m.kernel_size[0] * m.out_channels\n            m.weight.data.normal_(0, np.sqrt(2.0 / n))\n        elif isinstance(m, nn.BatchNorm3d):\n            m.weight.data.fill_(1)\n            m.bias.data.zero_()\n        elif isinstance(m, nn.BatchNorm2d):\n            m.weight.data.fill_(1)\n            m.bias.data.zero_()\n        elif isinstance(m, nn.BatchNorm1d):\n            m.weight.data.fill_(1)\n            m.bias.data.zero_()\n\n\nclass SegRegNet3D(nn.Module):\n    def __init__(self, F=16):\n\n        super(SegRegNet3D, self).__init__()\n\n        self.conf_preprocess = conv3d_relu(1, F, kernel_size=3, stride=1)\n        self.layer3d = feature3d(F)\n\n        net_init(self)\n\n    def forward(self, fL, conf_volume):\n\n        fL_stack = fL[:, :, None, :, :].repeat(1, 1, int(conf_volume.shape[2]), 1, 1)\n        conf_vol_preprocess = self.conf_preprocess(conf_volume)\n        input_volume = torch.cat((fL_stack, conf_vol_preprocess), dim=1)\n        oL = self.layer3d(input_volume)\n\n        return oL\n\n\ndef segregnet3d(options, data=None):\n\n    print(\"==> USING SegRegNet3D\")\n    for key in options:\n        if \"regnet\" in key:\n            print(\"{} : {}\".format(key, options[key]))\n\n    model = SegRegNet3D(F=options[\"regnet_out_planes\"])\n    if data is not None:\n        model.load_state_dict(data[\"state_dict\"])\n\n    return model\n"
  },
  {
    "path": "src/models/SegNet2D.py",
    "content": "# Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.\n#\n# NVIDIA CORPORATION and its licensors retain all intellectual property\n# and proprietary rights in and to this software, related documentation\n# and any modifications thereto.  Any use, reproduction, disclosure or\n# distribution of this software and related documentation without an express\n# license agreement from NVIDIA CORPORATION is strictly prohibited.\n\nimport torch\nimport torch.nn as nn\nimport argparse\nimport math\nimport torch.nn.functional as F\nimport torch.backends.cudnn as cudnn\nimport time\n\n__all__ = [\"segnet2d\"]\n\n# Util Functions\ndef conv(in_planes, out_planes, kernel_size=3, stride=1, activefun=nn.LeakyReLU(0.1, inplace=True)):\n\n    return nn.Sequential(\n        nn.Conv2d(\n            in_planes,\n            out_planes,\n            kernel_size=kernel_size,\n            stride=stride,\n            padding=(kernel_size - 1) // 2,\n            bias=True,\n        ),\n        activefun,\n    )\n\n\ndef deconv(in_planes, out_planes, kernel_size=4, stride=2, activefun=nn.LeakyReLU(0.1, inplace=True)):\n\n    return nn.Sequential(\n        nn.ConvTranspose2d(\n            in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=1, bias=True\n        ),\n        activefun,\n    )\n\n\nclass SegNet2D(nn.Module):\n    def __init__(self):\n\n        super(SegNet2D, self).__init__()\n\n        self.activefun = nn.LeakyReLU(0.1, inplace=True)\n\n        cps = [64, 128, 256, 512, 512, 512]\n        dps = [512, 512, 256, 128, 64]\n\n        # Encoder\n        self.conv1 = conv(cps[0], cps[1], kernel_size=3, stride=2, activefun=self.activefun)\n        self.conv1_1 = conv(cps[1], cps[1], kernel_size=3, stride=1, activefun=self.activefun)\n\n        self.conv2 = conv(cps[1], cps[2], kernel_size=3, stride=2, activefun=self.activefun)\n        self.conv2_1 = conv(cps[2], cps[2], kernel_size=3, stride=1, activefun=self.activefun)\n\n        self.conv3 = conv(cps[2], cps[3], kernel_size=3, stride=2, activefun=self.activefun)\n        self.conv3_1 = conv(cps[3], cps[3], kernel_size=3, stride=1, activefun=self.activefun)\n\n        self.conv4 = conv(cps[3], cps[4], kernel_size=3, stride=2, activefun=self.activefun)\n        self.conv4_1 = conv(cps[4], cps[4], kernel_size=3, stride=1, activefun=self.activefun)\n\n        self.conv5 = conv(cps[4], cps[5], kernel_size=3, stride=2, activefun=self.activefun)\n        self.conv5_1 = conv(cps[5], cps[5], kernel_size=3, stride=1, activefun=self.activefun)\n\n        # Decoder\n        self.deconv5 = deconv(cps[5], dps[0], kernel_size=4, stride=2, activefun=self.activefun)\n        self.deconv5_1 = conv(dps[0] + cps[4], dps[0], kernel_size=3, stride=1, activefun=self.activefun)\n\n        self.deconv4 = deconv(cps[4], dps[1], kernel_size=4, stride=2, activefun=self.activefun)\n        self.deconv4_1 = conv(dps[1] + cps[3], dps[1], kernel_size=3, stride=1, activefun=self.activefun)\n\n        self.deconv3 = deconv(dps[1], dps[2], kernel_size=4, stride=2, activefun=self.activefun)\n        self.deconv3_1 = conv(dps[2] + cps[2], dps[2], kernel_size=3, stride=1, activefun=self.activefun)\n\n        self.deconv2 = deconv(dps[2], dps[3], kernel_size=4, stride=2, activefun=self.activefun)\n        self.deconv2_1 = conv(dps[3] + cps[1], dps[3], kernel_size=3, stride=1, activefun=self.activefun)\n\n        self.deconv1 = deconv(dps[3], dps[4], kernel_size=4, stride=2, activefun=self.activefun)\n        self.deconv1_1 = conv(dps[4] + cps[0], dps[4], kernel_size=3, stride=1, activefun=self.activefun)\n\n        self.last_conv = nn.Conv2d(dps[4], 1, kernel_size=3, stride=1, padding=1, bias=True)\n\n        # Init\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels\n                m.weight.data.normal_(0, math.sqrt(2.0 / n))\n            elif isinstance(m, nn.Conv3d):\n                n = m.kernel_size[0] * m.kernel_size[1] * m.kernel_size[2] * m.out_channels\n                m.weight.data.normal_(0, math.sqrt(2.0 / n))\n            elif isinstance(m, nn.BatchNorm2d):\n                m.weight.data.fill_(1)\n                m.bias.data.zero_()\n            elif isinstance(m, nn.BatchNorm3d):\n                m.weight.data.fill_(1)\n                m.bias.data.zero_()\n            elif isinstance(m, nn.Linear):\n                m.bias.data.zero_()\n\n        return\n\n    def forward(self, x):\n\n        out_conv0 = x\n        out_conv1 = self.conv1_1(self.conv1(out_conv0))\n        out_conv2 = self.conv2_1(self.conv2(out_conv1))\n        out_conv3 = self.conv3_1(self.conv3(out_conv2))\n        out_conv4 = self.conv4_1(self.conv4(out_conv3))\n        out_conv5 = self.conv5_1(self.conv5(out_conv4))\n\n        out_deconv5 = self.deconv5(out_conv5)\n        out_deconv5_1 = self.deconv5_1(torch.cat((out_conv4, out_deconv5), 1))\n\n        out_deconv4 = self.deconv4(out_deconv5_1)\n        out_deconv4_1 = self.deconv4_1(torch.cat((out_conv3, out_deconv4), 1))\n\n        out_deconv3 = self.deconv3(out_deconv4_1)\n        out_deconv3_1 = self.deconv3_1(torch.cat((out_conv2, out_deconv3), 1))\n\n        out_deconv2 = self.deconv2(out_deconv3_1)\n        out_deconv2_1 = self.deconv2_1(torch.cat((out_conv1, out_deconv2), 1))\n\n        out_deconv1 = self.deconv1(out_deconv2_1)\n        out_deconv1_1 = self.deconv1_1(torch.cat((out_conv0, out_deconv1), 1))\n\n        raw_seg = self.last_conv(out_deconv1_1)\n\n        return raw_seg\n\n\ndef segnet2d(options, data=None):\n\n    print(\"==> USING SegNet2D\")\n    for key in options:\n        if \"segnet2d\" in key:\n            print(\"{} : {}\".format(key, options[key]))\n\n    model = SegNet2D()\n\n    if data is not None:\n        model.load_state_dict(data[\"state_dict\"])\n\n    return model\n"
  },
  {
    "path": "src/models/__init__.py",
    "content": "from .Bi3DNet import *\nfrom .FeatExtractNet import *\nfrom .SegNet2D import *\nfrom .RefineNet2D import *\nfrom .RefineNet3D import *\nfrom .PSMNet import *\nfrom .GCNet import *\nfrom .DispRefine2D import *\n\n"
  },
  {
    "path": "src/project.toml",
    "content": "[tool.black]\nline-length = 110\ntarget-version = ['py37']"
  },
  {
    "path": "src/run_binary_depth_estimation.py",
    "content": "# Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.\n#\n# NVIDIA CORPORATION and its licensors retain all intellectual property\n# and proprietary rights in and to this software, related documentation\n# and any modifications thereto.  Any use, reproduction, disclosure or\n# distribution of this software and related documentation without an express\n# license agreement from NVIDIA CORPORATION is strictly prohibited.\n\nimport argparse\nimport os\nimport torch\nimport torchvision.transforms as transforms\nfrom PIL import Image\n\nimport models\nimport cv2\nimport numpy as np\n\nfrom util import disp2rgb, str2bool\nimport random\n\nmodel_names = sorted(name for name in models.__dict__ if name.islower() and not name.startswith(\"__\"))\n\n\n# Parse arguments\nparser = argparse.ArgumentParser(allow_abbrev=False)\n\n# Model\nparser.add_argument(\"--arch\", type=str, default=\"bi3dnet_binary_depth\")\n\nparser.add_argument(\"--bi3dnet_featnet_arch\", type=str, default=\"featextractnetspp\")\nparser.add_argument(\"--bi3dnet_featnethr_arch\", type=str, default=\"featextractnethr\")\nparser.add_argument(\"--bi3dnet_segnet_arch\", type=str, default=\"segnet2d\")\nparser.add_argument(\"--bi3dnet_refinenet_arch\", type=str, default=\"segrefinenet\")\nparser.add_argument(\"--bi3dnet_max_disparity\", type=int, default=192)\nparser.add_argument(\"--bi3dnet_disps_per_example_true\", type=str2bool, default=True)\n\nparser.add_argument(\"--featextractnethr_out_planes\", type=int, default=16)\nparser.add_argument(\"--segrefinenet_in_planes\", type=int, default=17)\nparser.add_argument(\"--segrefinenet_out_planes\", type=int, default=8)\n\n# Input\nparser.add_argument(\"--pretrained\", type=str)\nparser.add_argument(\"--img_left\", type=str)\nparser.add_argument(\"--img_right\", type=str)\nparser.add_argument(\"--disp_vals\", type=float, nargs=\"*\")\nparser.add_argument(\"--crop_height\", type=int)\nparser.add_argument(\"--crop_width\", type=int)\n\nargs, unknown = parser.parse_known_args()\n\n####################################################################################################\ndef main():\n\n    options = vars(args)\n    print(\"==> ALL PARAMETERS\")\n    for key in options:\n        print(\"{} : {}\".format(key, options[key]))\n\n    out_dir = \"out\"\n    if not os.path.isdir(out_dir):\n        os.mkdir(out_dir)\n\n    base_name = os.path.splitext(os.path.basename(args.img_left))[0]\n\n    # Model\n    network_data = torch.load(args.pretrained)\n    print(\"=> using pre-trained model '{}'\".format(args.arch))\n    model = models.__dict__[args.arch](options, network_data).cuda()\n\n    # Inputs\n    img_left = Image.open(args.img_left).convert(\"RGB\")\n    img_left = transforms.functional.to_tensor(img_left)\n    img_left = transforms.functional.normalize(img_left, [0.5, 0.5, 0.5], [0.5, 0.5, 0.5])\n    img_left = img_left.type(torch.cuda.FloatTensor)[None, :, :, :]\n    img_right = Image.open(args.img_right).convert(\"RGB\")\n    img_right = transforms.functional.to_tensor(img_right)\n    img_right = transforms.functional.normalize(img_right, [0.5, 0.5, 0.5], [0.5, 0.5, 0.5])\n    img_right = img_right.type(torch.cuda.FloatTensor)[None, :, :, :]\n\n    segs = []\n    for disp_val in args.disp_vals:\n\n        assert disp_val % 3 == 0, \"disparity value should be a multiple of 3 as we downsample the image by 3\"\n        disp_long = torch.Tensor([[disp_val / 3]]).type(torch.LongTensor).cuda()\n\n        # Pad inputs\n        tw = args.crop_width\n        th = args.crop_height\n        assert tw % 96 == 0, \"image dimensions should be a multiple of 96\"\n        assert th % 96 == 0, \"image dimensions should be a multiple of 96\"\n        h = img_left.shape[2]\n        w = img_left.shape[3]\n        x1 = random.randint(0, max(0, w - tw))\n        y1 = random.randint(0, max(0, h - th))\n        pad_w = tw - w if tw - w > 0 else 0\n        pad_h = th - h if th - h > 0 else 0\n        pad_opr = torch.nn.ZeroPad2d((pad_w, 0, pad_h, 0))\n        img_left = img_left[:, :, y1 : y1 + min(th, h), x1 : x1 + min(tw, w)]\n        img_right = img_right[:, :, y1 : y1 + min(th, h), x1 : x1 + min(tw, w)]\n        img_left_pad = pad_opr(img_left)\n        img_right_pad = pad_opr(img_right)\n\n        # Inference\n        model.eval()\n        with torch.no_grad():\n            output = model(img_left_pad, img_right_pad, disp_long)[1][:, :, pad_h:, pad_w:]\n\n        # Write binary depth results\n        seg_img = output[0, 0][None, :, :].clone().cpu().detach().numpy()\n        seg_img = np.transpose(seg_img * 255.0, (1, 2, 0))\n        cv2.imwrite(\n            os.path.join(out_dir, \"%s_%s_seg_confidence_%d.png\" % (base_name, args.arch, disp_val)), seg_img\n        )\n\n        segs.append(output[0, 0][None, :, :].clone().cpu().detach().numpy())\n\n    # Generate quantized depth results\n    segs = np.concatenate(segs, axis=0)\n    segs = np.insert(segs, 0, np.ones((1, h, w), dtype=np.float32), axis=0)\n    segs = np.append(segs, np.zeros((1, h, w), dtype=np.float32), axis=0)\n\n    segs = 1.0 - segs\n\n    # Get the pdf values for each segmented region\n    pdf_method = segs[1:, :, :] - segs[:-1, :, :]\n\n    # Get the labels\n    labels_method = np.argmax(pdf_method, axis=0).astype(np.int)\n    disp_map = labels_method.astype(np.float32)\n\n    disp_vals = args.disp_vals\n    disp_vals.insert(0, 0)\n    disp_vals.append(args.bi3dnet_max_disparity)\n\n    for i in range(len(disp_vals) - 1):\n        min_disp = disp_vals[i]\n        max_disp = disp_vals[i + 1]\n        mid_disp = 0.5 * (min_disp + max_disp)\n        disp_map[labels_method == i] = mid_disp\n\n    disp_vals_str_list = [\"%d\" % disp_val for disp_val in disp_vals]\n    disp_vals_str = \"-\".join(disp_vals_str_list)\n\n    img_disp = np.clip(disp_map, 0, args.bi3dnet_max_disparity)\n    img_disp = img_disp / args.bi3dnet_max_disparity\n    img_disp = (disp2rgb(img_disp) * 255.0).astype(np.uint8)\n\n    cv2.imwrite(\n        os.path.join(out_dir, \"%s_%s_quant_depth_%s.png\" % (base_name, args.arch, disp_vals_str)), img_disp\n    )\n\n    return\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "src/run_continuous_depth_estimation.py",
    "content": "# Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.\n#\n# NVIDIA CORPORATION and its licensors retain all intellectual property\n# and proprietary rights in and to this software, related documentation\n# and any modifications thereto.  Any use, reproduction, disclosure or\n# distribution of this software and related documentation without an express\n# license agreement from NVIDIA CORPORATION is strictly prohibited.\n\nimport argparse\nimport os\nimport time\nimport torch\nimport torchvision.transforms as transforms\nfrom PIL import Image\n\nimport models\nimport cv2\nimport numpy as np\nfrom util import disp2rgb, str2bool\n\nimport random\n\nmodel_names = sorted(name for name in models.__dict__ if name.islower() and not name.startswith(\"__\"))\n\n\n# Parse Arguments\nparser = argparse.ArgumentParser(allow_abbrev=False)\n\n# Experiment Type\nparser.add_argument(\"--arch\", type=str, default=\"bi3dnet_continuous_depth_2D\")\n\nparser.add_argument(\"--bi3dnet_featnet_arch\", type=str, default=\"featextractnetspp\")\nparser.add_argument(\"--bi3dnet_segnet_arch\", type=str, default=\"segnet2d\")\nparser.add_argument(\"--bi3dnet_refinenet_arch\", type=str, default=\"disprefinenet\")\nparser.add_argument(\"--bi3dnet_regnet_arch\", type=str, default=\"segregnet3d\")\nparser.add_argument(\"--bi3dnet_max_disparity\", type=int, default=192)\nparser.add_argument(\"--regnet_out_planes\", type=int, default=16)\nparser.add_argument(\"--disprefinenet_out_planes\", type=int, default=32)\nparser.add_argument(\"--bi3dnet_disps_per_example_true\", type=str2bool, default=True)\n\n# Input\nparser.add_argument(\"--pretrained\", type=str)\nparser.add_argument(\"--img_left\", type=str)\nparser.add_argument(\"--img_right\", type=str)\nparser.add_argument(\"--disp_range_min\", type=int)\nparser.add_argument(\"--disp_range_max\", type=int)\nparser.add_argument(\"--crop_height\", type=int)\nparser.add_argument(\"--crop_width\", type=int)\n\nargs, unknown = parser.parse_known_args()\n\n##############################################################################################################\ndef main():\n\n    options = vars(args)\n    print(\"==> ALL PARAMETERS\")\n    for key in options:\n        print(\"{} : {}\".format(key, options[key]))\n\n    out_dir = \"out\"\n    if not os.path.isdir(out_dir):\n        os.mkdir(out_dir)\n\n    base_name = os.path.splitext(os.path.basename(args.img_left))[0]\n\n    # Model\n    if args.pretrained:\n        network_data = torch.load(args.pretrained)\n    else:\n        print(\"Need an input model\")\n        exit()\n\n    print(\"=> using pre-trained model '{}'\".format(args.arch))\n    model = models.__dict__[args.arch](options, network_data).cuda()\n\n    # Inputs\n    img_left = Image.open(args.img_left).convert(\"RGB\")\n    img_right = Image.open(args.img_right).convert(\"RGB\")\n    img_left = transforms.functional.to_tensor(img_left)\n    img_right = transforms.functional.to_tensor(img_right)\n    img_left = transforms.functional.normalize(img_left, [0.5, 0.5, 0.5], [0.5, 0.5, 0.5])\n    img_right = transforms.functional.normalize(img_right, [0.5, 0.5, 0.5], [0.5, 0.5, 0.5])\n    img_left = img_left.type(torch.cuda.FloatTensor)[None, :, :, :]\n    img_right = img_right.type(torch.cuda.FloatTensor)[None, :, :, :]\n\n    # Prepare Disparities\n    max_disparity = args.disp_range_max\n    min_disparity = args.disp_range_min\n\n    assert max_disparity % 3 == 0 and min_disparity % 3 == 0, \"disparities should be divisible by 3\"\n\n    if args.arch == \"bi3dnet_continuous_depth_3D\":\n        assert (\n            max_disparity - min_disparity\n        ) % 48 == 0, \"for 3D regularization the difference in disparities should be divisible by 48\"\n\n    max_disp_levels = (max_disparity - min_disparity) + 1\n\n    max_disparity_3x = int(max_disparity / 3)\n    min_disparity_3x = int(min_disparity / 3)\n    max_disp_levels_3x = (max_disparity_3x - min_disparity_3x) + 1\n    disp_3x = np.linspace(min_disparity_3x, max_disparity_3x, max_disp_levels_3x, dtype=np.int32)\n    disp_long_3x_main = torch.from_numpy(disp_3x).type(torch.LongTensor).cuda()\n    disp_float_main = np.linspace(min_disparity, max_disparity, max_disp_levels, dtype=np.float32)\n    disp_float_main = torch.from_numpy(disp_float_main).type(torch.float32).cuda()\n    delta = 1\n    d_min_GT = min_disparity - 0.5 * delta\n    d_max_GT = max_disparity + 0.5 * delta\n    disp_long_3x = disp_long_3x_main[None, :].expand(img_left.shape[0], -1)\n    disp_float = disp_float_main[None, :].expand(img_left.shape[0], -1)\n\n    # Pad Inputs\n    tw = args.crop_width\n    th = args.crop_height\n    assert tw % 96 == 0, \"image dimensions should be multiple of 96\"\n    assert th % 96 == 0, \"image dimensions should be multiple of 96\"\n    h = img_left.shape[2]\n    w = img_left.shape[3]\n    x1 = random.randint(0, max(0, w - tw))\n    y1 = random.randint(0, max(0, h - th))\n    pad_w = tw - w if tw - w > 0 else 0\n    pad_h = th - h if th - h > 0 else 0\n    pad_opr = torch.nn.ZeroPad2d((pad_w, 0, pad_h, 0))\n    img_left = img_left[:, :, y1 : y1 + min(th, h), x1 : x1 + min(tw, w)]\n    img_right = img_right[:, :, y1 : y1 + min(th, h), x1 : x1 + min(tw, w)]\n    img_left_pad = pad_opr(img_left)\n    img_right_pad = pad_opr(img_right)\n\n    # Inference\n    model.eval()\n    with torch.no_grad():\n        if args.arch == \"bi3dnet_continuous_depth_2D\":\n            output_seg_low_res_upsample, output_disp_normalized = model(\n                img_left_pad, img_right_pad, disp_long_3x\n            )\n            output_seg = output_seg_low_res_upsample\n        else:\n            (\n                output_seg_low_res_upsample,\n                output_seg_low_res_upsample_refined,\n                output_disp_normalized_no_reg,\n                output_disp_normalized,\n            ) = model(img_left_pad, img_right_pad, disp_long_3x)\n            output_seg = output_seg_low_res_upsample_refined\n\n        output_seg = output_seg[:, :, pad_h:, pad_w:]\n        output_disp_normalized = output_disp_normalized[:, :, pad_h:, pad_w:]\n        output_disp = torch.clamp(\n            output_disp_normalized * delta * max_disp_levels + d_min_GT, min=d_min_GT, max=d_max_GT\n        )\n\n    # Write Results\n    max_disparity_color = 192\n    output_disp_clamp = output_disp[0, 0, :, :].cpu().clone().numpy()\n    output_disp_clamp[output_disp_clamp < min_disparity] = 0\n    output_disp_clamp[output_disp_clamp > max_disparity] = max_disparity_color\n    disp_np_ours_color = disp2rgb(output_disp_clamp / max_disparity_color) * 255.0\n    cv2.imwrite(\n        os.path.join(out_dir, \"%s_%s_%d_%d.png\" % (base_name, args.arch, min_disparity, max_disparity)),\n        disp_np_ours_color,\n    )\n\n    return\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "src/run_demo_kitti15.sh",
    "content": "#!/usr/bin/env bash\n\n# GENERATE BINARY DEPTH SEGMENTATIONS AND COMBINE THEM TO GENERATE QUANTIZED DEPTH\nCUDA_VISIBLE_DEVICES=0 python run_binary_depth_estimation.py \\\n    --arch bi3dnet_binary_depth \\\n    --bi3dnet_featnet_arch featextractnetspp \\\n    --bi3dnet_featnethr_arch featextractnethr \\\n    --bi3dnet_segnet_arch segnet2d \\\n    --bi3dnet_refinenet_arch segrefinenet \\\n    --featextractnethr_out_planes 16 \\\n    --segrefinenet_in_planes 17 \\\n    --segrefinenet_out_planes 8 \\\n    --crop_height 384 --crop_width 1248 \\\n    --disp_vals 12 21 30 39 48 \\\n    --img_left  '../data/kitti15_img_left.jpg' \\\n    --img_right '../data/kitti15_img_right.jpg' \\\n    --pretrained '../model_weights/kitti15_binary_depth.pth.tar'\n\n\n# FULL RANGE CONTINOUS DEPTH ESTIMATION WITHOUT 3D REGULARIZATION\nCUDA_VISIBLE_DEVICES=0 python run_continuous_depth_estimation.py \\\n    --arch bi3dnet_continuous_depth_2D \\\n    --bi3dnet_featnet_arch featextractnetspp \\\n    --bi3dnet_segnet_arch segnet2d \\\n    --bi3dnet_refinenet_arch disprefinenet \\\n    --disprefinenet_out_planes 32 \\\n    --crop_height 384 --crop_width 1248 \\\n    --disp_range_min 0 \\\n    --disp_range_max 192 \\\n    --bi3dnet_max_disparity 192 \\\n    --img_left  '../data/kitti15_img_left.jpg' \\\n    --img_right '../data/kitti15_img_right.jpg' \\\n    --pretrained '../model_weights/kitti15_continuous_depth_no_conf_reg.pth.tar'\n\n\n# SELECTIVE RANGE CONTINOUS DEPTH ESTIMATION WITHOUT 3D REGULARIZATION\nCUDA_VISIBLE_DEVICES=0 python run_continuous_depth_estimation.py \\\n    --arch bi3dnet_continuous_depth_2D \\\n    --bi3dnet_featnet_arch featextractnetspp \\\n    --bi3dnet_segnet_arch segnet2d \\\n    --bi3dnet_refinenet_arch disprefinenet \\\n    --disprefinenet_out_planes 32 \\\n    --crop_height 384 --crop_width 1248 \\\n    --disp_range_min 12 \\\n    --disp_range_max 48 \\\n    --bi3dnet_max_disparity 192 \\\n    --img_left  '../data/kitti15_img_left.jpg' \\\n    --img_right '../data/kitti15_img_right.jpg' \\\n    --pretrained '../model_weights/kitti15_continuous_depth_no_conf_reg.pth.tar'\n\n\n# FULL RANGE CONTINOUS DEPTH ESTIMATION WITH 3D REGULARIZATION \nCUDA_VISIBLE_DEVICES=0 python run_continuous_depth_estimation.py \\\n    --arch bi3dnet_continuous_depth_3D \\\n    --bi3dnet_featnet_arch featextractnetspp \\\n    --bi3dnet_segnet_arch segnet2d \\\n    --bi3dnet_refinenet_arch disprefinenet \\\n    --bi3dnet_regnet_arch segregnet3d \\\n    --disprefinenet_out_planes 32 \\\n    --regnet_out_planes 16 \\\n    --crop_height 384 --crop_width 1248 \\\n    --disp_range_min 0 \\\n    --disp_range_max 192 \\\n    --bi3dnet_max_disparity 192 \\\n    --img_left  '../data/kitti15_img_left.jpg' \\\n    --img_right '../data/kitti15_img_right.jpg' \\\n    --pretrained '../model_weights/kitti15_continuous_depth_conf_reg.pth.tar'\n    "
  },
  {
    "path": "src/run_demo_sf.sh",
    "content": "#!/usr/bin/env bash\n\n# GENERATE BINARY DEPTH SEGMENTATIONS AND COMBINE THEM TO GENERATE QUANTIZED DEPTH\nCUDA_VISIBLE_DEVICES=0 python run_binary_depth_estimation.py \\\n    --arch bi3dnet_binary_depth \\\n    --bi3dnet_featnet_arch featextractnetspp \\\n    --bi3dnet_featnethr_arch featextractnethr \\\n    --bi3dnet_segnet_arch segnet2d \\\n    --bi3dnet_refinenet_arch segrefinenet \\\n    --featextractnethr_out_planes 16 \\\n    --segrefinenet_in_planes 17 \\\n    --segrefinenet_out_planes 8 \\\n    --crop_height 576 --crop_width 960 \\\n    --disp_vals 24 36 54 96 144 \\\n    --img_left  '../data/sf_img_left.jpg' \\\n    --img_right '../data/sf_img_right.jpg' \\\n    --pretrained '../model_weights/sf_binary_depth.pth.tar'\n\n\n# FULL RANGE CONTINOUS DEPTH ESTIMATION WITHOUT 3D REGULARIZATION\nCUDA_VISIBLE_DEVICES=0 python run_continuous_depth_estimation.py \\\n    --arch bi3dnet_continuous_depth_2D \\\n    --bi3dnet_featnet_arch featextractnetspp \\\n    --bi3dnet_segnet_arch segnet2d \\\n    --bi3dnet_refinenet_arch disprefinenet \\\n    --disprefinenet_out_planes 32 \\\n    --crop_height 576 --crop_width 960 \\\n    --disp_range_min 0 \\\n    --disp_range_max 192 \\\n    --bi3dnet_max_disparity 192 \\\n    --img_left  '../data/sf_img_left.jpg' \\\n    --img_right '../data/sf_img_right.jpg' \\\n    --pretrained '../model_weights/sf_continuous_depth_no_conf_reg.pth.tar'\n\n\n# SELECTIVE RANGE CONTINOUS DEPTH ESTIMATION WITHOUT 3D REGULARIZATION\nCUDA_VISIBLE_DEVICES=0 python run_continuous_depth_estimation.py \\\n    --arch bi3dnet_continuous_depth_2D \\\n    --bi3dnet_featnet_arch featextractnetspp \\\n    --bi3dnet_segnet_arch segnet2d \\\n    --bi3dnet_refinenet_arch disprefinenet \\\n    --disprefinenet_out_planes 32 \\\n    --crop_height 576 --crop_width 960 \\\n    --disp_range_min 18 \\\n    --disp_range_max 60 \\\n    --bi3dnet_max_disparity 192 \\\n    --img_left  '../data/sf_img_left.jpg' \\\n    --img_right '../data/sf_img_right.jpg' \\\n    --pretrained '../model_weights/sf_continuous_depth_no_conf_reg.pth.tar'\n\n\n# FULL RANGE CONTINOUS DEPTH ESTIMATION WITH 3D REGULARIZATION \nCUDA_VISIBLE_DEVICES=0 python run_continuous_depth_estimation.py \\\n    --arch bi3dnet_continuous_depth_3D \\\n    --bi3dnet_featnet_arch featextractnetspp \\\n    --bi3dnet_segnet_arch segnet2d \\\n    --bi3dnet_refinenet_arch disprefinenet \\\n    --bi3dnet_regnet_arch segregnet3d \\\n    --disprefinenet_out_planes 32 \\\n    --regnet_out_planes 16 \\\n    --crop_height 576 --crop_width 960 \\\n    --disp_range_min 0 \\\n    --disp_range_max 192 \\\n    --bi3dnet_max_disparity 192 \\\n    --img_left  '../data/sf_img_left.jpg' \\\n    --img_right '../data/sf_img_right.jpg' \\\n    --pretrained '../model_weights/sf_continuous_depth_conf_reg.pth.tar'\n"
  },
  {
    "path": "src/util.py",
    "content": "# Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.\n#\n# NVIDIA CORPORATION and its licensors retain all intellectual property\n# and proprietary rights in and to this software, related documentation\n# and any modifications thereto.  Any use, reproduction, disclosure or\n# distribution of this software and related documentation without an express\n# license agreement from NVIDIA CORPORATION is strictly prohibited.\n\nimport os\nimport numpy as np\n\n\ndef disp2rgb(disp):\n    H = disp.shape[0]\n    W = disp.shape[1]\n\n    I = disp.flatten()\n\n    map = np.array(\n        [\n            [0, 0, 0, 114],\n            [0, 0, 1, 185],\n            [1, 0, 0, 114],\n            [1, 0, 1, 174],\n            [0, 1, 0, 114],\n            [0, 1, 1, 185],\n            [1, 1, 0, 114],\n            [1, 1, 1, 0],\n        ]\n    )\n    bins = map[:-1, 3]\n    cbins = np.cumsum(bins)\n    bins = bins / cbins[-1]\n    cbins = cbins[:-1] / cbins[-1]\n\n    ind = np.minimum(\n        np.sum(np.repeat(I[None, :], 6, axis=0) > np.repeat(cbins[:, None], I.shape[0], axis=1), axis=0), 6\n    )\n    bins = np.reciprocal(bins)\n    cbins = np.append(np.array([[0]]), cbins[:, None])\n\n    I = np.multiply(I - cbins[ind], bins[ind])\n    I = np.minimum(\n        np.maximum(\n            np.multiply(map[ind, 0:3], np.repeat(1 - I[:, None], 3, axis=1))\n            + np.multiply(map[ind + 1, 0:3], np.repeat(I[:, None], 3, axis=1)),\n            0,\n        ),\n        1,\n    )\n\n    I = np.reshape(I, [H, W, 3]).astype(np.float32)\n\n    return I\n\n\ndef str2bool(bool_input_string):\n    if isinstance(bool_input_string, bool):\n        return bool_input_string\n    if bool_input_string.lower() in (\"true\"):\n        return True\n    elif bool_input_string.lower() in (\"false\"):\n        return False\n    else:\n        raise NameError(\"Please provide boolean type.\")\n"
  }
]