[
  {
    "path": ".gitignore",
    "content": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packaging\n.Python\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n.eggs/\nlib/\nlib64/\nparts/\nsdist/\nvar/\nwheels/\npip-wheel-metadata/\nshare/python-wheels/\n*.egg-info/\n.installed.cfg\n*.egg\nMANIFEST\n\n# PyInstaller\n#  Usually these files are written by a python script from a template\n#  before PyInstaller builds the exe, so as to inject date/other infos into it.\n*.manifest\n*.spec\n\n# Installer logs\npip-log.txt\npip-delete-this-directory.txt\n\n# Unit test / coverage reports\nhtmlcov/\n.tox/\n.nox/\n.coverage\n.coverage.*\n.cache\nnosetests.xml\ncoverage.xml\n*.cover\n*.py,cover\n.hypothesis/\n.pytest_cache/\n\n# Translations\n*.mo\n*.pot\n\n# Django stuff:\n*.log\nlocal_settings.py\ndb.sqlite3\ndb.sqlite3-journal\n\n# Flask stuff:\ninstance/\n.webassets-cache\n\n# Scrapy stuff:\n.scrapy\n\n# Sphinx documentation\ndocs/_build/\n\n# PyBuilder\ntarget/\n\n# Jupyter Notebook\n.ipynb_checkpoints\n\n# IPython\nprofile_default/\nipython_config.py\n\n# pyenv\n.python-version\n\n# pipenv\n#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.\n#   However, in case of collaboration, if having platform-specific dependencies or dependencies\n#   having no cross-platform support, pipenv may install dependencies that don't work, or not\n#   install all needed dependencies.\n#Pipfile.lock\n\n# PEP 582; used by e.g. github.com/David-OConnor/pyflow\n__pypackages__/\n\n# Celery stuff\ncelerybeat-schedule\ncelerybeat.pid\n\n# SageMath parsed files\n*.sage.py\n\n# Environments\n.env\n.venv\nenv/\nvenv/\nENV/\nenv.bak/\nvenv.bak/\n\n# Spyder project settings\n.spyderproject\n.spyproject\n\n# Rope project settings\n.ropeproject\n\n# mkdocs documentation\n/site\n\n# mypy\n.mypy_cache/\n.dmypy.json\ndmypy.json\n\n# Pyre type checker\n.pyre/\n\n# I/O\ninput/\noutput/\nweights/"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2023 Intelligent Systems Lab Org\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# PROJECT NOT UNDER ACTIVE MANAGEMENT #  \nThis project will no longer be maintained by Intel.  \nIntel will not provide or guarantee development of or support for this project, including but not limited to, maintenance, bug fixes, new releases or updates.  \nPatches to this project are no longer accepted by Intel.  \n If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the community, please create your own fork of the project.  \n  \n# Monocular Visual-Inertial Depth Estimation\n\nThis repository contains code and models for our paper:\n\n> [Monocular Visual-Inertial Depth Estimation](https://arxiv.org/abs/2303.12134)  \n> Diana Wofk, René Ranftl, Matthias Müller, Vladlen Koltun\n\nFor a quick overview of the work you can watch the [short talk](https://youtu.be/Ja4Nic3YYCg) and [teaser](https://youtu.be/IMwiKwSpshQ) on YouTube.\n\n## Introduction\n\n![Methodology Diagram](figures/methodology_diagram.png)\n\nWe present a visual-inertial depth estimation pipeline that integrates monocular depth estimation and visual-inertial odometry to produce dense depth estimates with metric scale. Our approach consists of three stages: (1) input processing, where RGB and IMU data feed into monocular depth estimation alongside visual-inertial odometry, (2) global scale and shift alignment, where monocular depth estimates are fitted to sparse depth from VIO in a least-squares manner, and (3) learning-based dense scale alignment, where globally-aligned depth is locally realigned using a dense scale map regressed by the ScaleMapLearner (SML). The images at the bottom in the diagram above illustrate a VOID sample being processed through our pipeline; from left to right: the input RGB, ground truth depth, sparse depth from VIO, globally-aligned depth, scale map scaffolding, dense scale map regressed by SML, final depth output.\n\n![Teaser Figure](figures/teaser_figure.png)\n\n## Setup\n\n1) Setup dependencies:\n\n    ```shell\n    conda env create -f environment.yaml\n    conda activate vi-depth\n    ```\n\n2) Pick one or more ScaleMapLearner (SML) models and download the corresponding weights to the `weights` folder.\n\n    | Depth Predictor   |  SML on VOID 150  |  SML on VOID 500  | SML on VOID 1500 |\n    | :---              |       :----:      |       :----:      |      :----:      |\n    | DPT-BEiT-Large    | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_beit_large_512.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_beit_large_512.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_beit_large_512.nsamples.1500.ckpt) |\n    | DPT-SwinV2-Large  | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_large_384.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_large_384.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_large_384.nsamples.1500.ckpt) |\n    | DPT-Large         | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_large.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_large.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_large.nsamples.1500.ckpt) |\n    | DPT-Hybrid        | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_hybrid.nsamples.150.ckpt)* | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_hybrid.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_hybrid.nsamples.1500.ckpt) |\n    | DPT-SwinV2-Tiny   | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_tiny_256.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_tiny_256.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_tiny_256.nsamples.1500.ckpt) |\n    | DPT-LeViT         | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_levit_224.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_levit_224.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_levit_224.nsamples.1500.ckpt) |\n    | MiDaS-small       | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.midas_small.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.midas_small.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.midas_small.nsamples.1500.ckpt) |\n\n    *Also available with pretraining on TartanAir: [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_hybrid.nsamples.150.pretrained.ckpt)\n\n## Inference\n\n1) Place inputs into the `input` folder. An input image and corresponding sparse metric depth map are expected:\n\n    ```bash\n    input\n    ├── image                   # RGB image\n    │   ├── <timestamp>.png\n    │   └── ...\n    └── sparse_depth            # sparse metric depth map\n        ├── <timestamp>.png     # as 16b PNG\n        └── ...\n    ```\n\n    The `load_sparse_depth` function in `run.py` may need to be modified depending on the format in which sparse depth is stored. By default, the depth storage method [used in the VOID dataset](https://github.com/alexklwong/void-dataset/blob/master/src/data_utils.py) is assumed.\n\n2) Run the `run.py` script as follows:\n\n    ```bash\n    DEPTH_PREDICTOR=\"dpt_beit_large_512\"\n    NSAMPLES=150\n    SML_MODEL_PATH=\"weights/sml_model.dpredictor.${DEPTH_PREDICTOR}.nsamples.${NSAMPLES}.ckpt\"\n\n    python run.py -dp $DEPTH_PREDICTOR -ns $NSAMPLES -sm $SML_MODEL_PATH --save-output\n    ```\n\n3) The `--save-output` flag enables saving outputs to the `output` folder. By default, the following outputs will be saved per sample:\n\n    ```bash\n    output\n    ├── ga_depth                # metric depth map after global alignment\n    │   ├── <timestamp>.pfm     # as PFM\n    │   ├── <timestamp>.png     # as 16b PNG\n    │   └── ...\n    └── sml_depth               # metric depth map output by SML\n        ├── <timestamp>.pfm     # as PFM\n        ├── <timestamp>.png     # as 16b PNG\n        └── ...\n    ```\n\n## Evaluation\n\nModels provided in this repo were trained on the VOID dataset. \n1) Download the VOID dataset following [the instructions in the VOID dataset repo](https://github.com/alexklwong/void-dataset#downloading-void).\n2) To evaluate on VOID test sets, run the `evaluate.py` script as follows:\n\n    ```bash\n    DATASET_PATH=\"/path/to/void_release/\"\n\n    DEPTH_PREDICTOR=\"dpt_beit_large_512\"\n    NSAMPLES=150\n    SML_MODEL_PATH=\"weights/sml_model.dpredictor.${DEPTH_PREDICTOR}.nsamples.${NSAMPLES}.ckpt\"\n\n    python evaluate.py -ds $DATASET_PATH -dp $DEPTH_PREDICTOR -ns $NSAMPLES -sm $SML_MODEL_PATH\n    ```\n\n    Results for the example shown above:\n\n    ```\n    Averaging metrics for globally-aligned depth over 800 samples\n    Averaging metrics for SML-aligned depth over 800 samples\n    +---------+----------+----------+\n    |  metric | GA Only  |  GA+SML  |\n    +---------+----------+----------+\n    |   RMSE  |  191.36  |  142.85  |\n    |   MAE   |  115.84  |   76.95  |\n    |  AbsRel |    0.069 |    0.046 |\n    |  iRMSE  |   72.70  |   57.13  |\n    |   iMAE  |   49.32  |   34.25  |\n    | iAbsRel |    0.071 |    0.048 |\n    +---------+----------+----------+\n    ```\n    \n    To evaluate on VOID test sets at different densities (void_150, void_500, void_1500), change the `NSAMPLES` argument above accordingly.\n\n## Citation\n\nIf you reference our work, please consider citing the following:\n\n```bib\n@inproceedings{wofk2023videpth,\n    author      = {{Wofk, Diana and Ranftl, Ren\\'{e} and M{\\\"u}ller, Matthias and Koltun, Vladlen}},\n    title       = {{Monocular Visual-Inertial Depth Estimation}},\n    booktitle   = {{IEEE International Conference on Robotics and Automation (ICRA)}},\n    year        = {{2023}}\n}\n```\n\n## Acknowledgements\n\nOur work builds on and uses code from [MiDaS](https://github.com/isl-org/MiDaS), [timm](https://github.com/rwightman/pytorch-image-models), and [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/). We'd like to thank the authors for making these libraries and frameworks available.\n\n_Last revisited: August 2024_\n"
  },
  {
    "path": "environment.yaml",
    "content": "name: vi-depth\nchannels:\n  - pytorch\n  - defaults\ndependencies:\n  - nvidia::cudatoolkit=11.7\n  - python=3.10.8\n  - pytorch::pytorch=1.13.0\n  - torchvision=0.14.0\n  - pip=22.3.1\n  - numpy=1.23.4\n  - pip:\n    - opencv-python==4.6.0.66\n    - scipy==1.10.1\n    - timm==0.6.12\n    - pytorch-lightning==1.9.0\n    - imageio==2.25.0\n    - prettytable==3.6.0"
  },
  {
    "path": "evaluate.py",
    "content": "import os\nimport argparse\n\nimport torch\nimport imageio\nimport numpy as np\n\nfrom tqdm import tqdm\nfrom PIL import Image\n\nimport modules.midas.utils as utils\n\nimport pipeline\nimport metrics\n\ndef evaluate(dataset_path, depth_predictor, nsamples, sml_model_path):\n    device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n    print(\"device: %s\" % device)\n\n    # ranges for VOID\n    min_depth, max_depth = 0.2, 5.0\n    min_pred, max_pred = 0.1, 8.0\n\n    # instantiate method\n    method = pipeline.VIDepth(\n        depth_predictor, nsamples, sml_model_path, \n        min_pred, max_pred, min_depth, max_depth, device\n    )\n\n    # get inputs\n    with open(f\"{dataset_path}/void_{nsamples}/test_image.txt\") as f: \n        test_image_list = [line.rstrip() for line in f]\n        \n    # initialize error aggregators\n    avg_error_w_int_depth = metrics.ErrorMetricsAverager()\n    avg_error_w_pred = metrics.ErrorMetricsAverager()\n\n    # iterate through inputs list\n    for i in tqdm(range(len(test_image_list))):\n        \n        # image\n        input_image_fp = os.path.join(dataset_path, test_image_list[i])\n        input_image = utils.read_image(input_image_fp)\n\n        # sparse depth\n        input_sparse_depth_fp = input_image_fp.replace(\"image\", \"sparse_depth\")\n        input_sparse_depth = np.array(Image.open(input_sparse_depth_fp), dtype=np.float32) / 256.0\n        input_sparse_depth[input_sparse_depth <= 0] = 0.0\n\n        # sparse depth validity map\n        validity_map_fp = input_image_fp.replace(\"image\", \"validity_map\")\n        validity_map = np.array(Image.open(validity_map_fp), dtype=np.float32)\n        assert(np.all(np.unique(validity_map) == [0, 256]))\n        validity_map[validity_map > 0] = 1\n\n        # target (ground truth) depth\n        target_depth_fp = input_image_fp.replace(\"image\", \"ground_truth\")\n        target_depth = np.array(Image.open(target_depth_fp), dtype=np.float32) / 256.0\n        target_depth[target_depth <= 0] = 0.0\n\n        # target depth valid/mask\n        mask = (target_depth < max_depth)\n        if min_depth is not None:\n            mask *= (target_depth > min_depth)\n        target_depth[~mask] = np.inf  # set invalid depth\n        target_depth = 1.0 / target_depth\n\n        # run pipeline\n        output = method.run(input_image, input_sparse_depth, validity_map, device)\n\n        # compute error metrics using intermediate (globally aligned) depth\n        error_w_int_depth = metrics.ErrorMetrics()\n        error_w_int_depth.compute(\n            estimate = output[\"ga_depth\"], \n            target = target_depth, \n            valid = mask.astype(np.bool),\n        )\n\n        # compute error metrics using SML output depth\n        error_w_pred = metrics.ErrorMetrics()\n        error_w_pred.compute(\n            estimate = output[\"sml_depth\"], \n            target = target_depth, \n            valid = mask.astype(np.bool),\n        )\n\n        # accumulate error metrics\n        avg_error_w_int_depth.accumulate(error_w_int_depth)\n        avg_error_w_pred.accumulate(error_w_pred)\n\n\n    # compute average error metrics\n    print(\"Averaging metrics for globally-aligned depth over {} samples\".format(\n        avg_error_w_int_depth.total_count\n    ))\n    avg_error_w_int_depth.average()\n\n    print(\"Averaging metrics for SML-aligned depth over {} samples\".format(\n        avg_error_w_pred.total_count\n    ))\n    avg_error_w_pred.average()\n\n    from prettytable import PrettyTable\n    summary_tb = PrettyTable()\n    summary_tb.field_names = [\"metric\", \"GA Only\", \"GA+SML\"]\n\n    summary_tb.add_row([\"RMSE\", f\"{avg_error_w_int_depth.rmse_avg:7.2f}\", f\"{avg_error_w_pred.rmse_avg:7.2f}\"])\n    summary_tb.add_row([\"MAE\", f\"{avg_error_w_int_depth.mae_avg:7.2f}\", f\"{avg_error_w_pred.mae_avg:7.2f}\"])\n    summary_tb.add_row([\"AbsRel\", f\"{avg_error_w_int_depth.absrel_avg:8.3f}\", f\"{avg_error_w_pred.absrel_avg:8.3f}\"])\n    summary_tb.add_row([\"iRMSE\", f\"{avg_error_w_int_depth.inv_rmse_avg:7.2f}\", f\"{avg_error_w_pred.inv_rmse_avg:7.2f}\"])\n    summary_tb.add_row([\"iMAE\", f\"{avg_error_w_int_depth.inv_mae_avg:7.2f}\", f\"{avg_error_w_pred.inv_mae_avg:7.2f}\"])\n    summary_tb.add_row([\"iAbsRel\", f\"{avg_error_w_int_depth.inv_absrel_avg:8.3f}\", f\"{avg_error_w_pred.inv_absrel_avg:8.3f}\"])\n    \n    print(summary_tb)\n\n\nif __name__==\"__main__\":\n\n    parser = argparse.ArgumentParser()\n\n    parser.add_argument('-ds', '--dataset-path', type=str, default='/path/to/void_release/',\n                        help='Path to VOID release dataset.')\n    parser.add_argument('-dp', '--depth-predictor', type=str, default='midas_small', \n                        help='Name of depth predictor to use in pipeline.')\n    parser.add_argument('-ns', '--nsamples', type=int, default=150, \n                        help='Number of sparse metric depth samples available.')\n    parser.add_argument('-sm', '--sml-model-path', type=str, default='', \n                        help='Path to trained SML model weights.')\n\n    args = parser.parse_args()\n    print(args)\n    \n    evaluate(\n        args.dataset_path,\n        args.depth_predictor, \n        args.nsamples, \n        args.sml_model_path,\n    )"
  },
  {
    "path": "metrics.py",
    "content": "import numpy as np\nimport torch\n\ndef rmse(estimate, target):\n    return np.sqrt(np.mean((estimate - target) ** 2))\n\ndef mae(estimate, target):\n    return np.mean(np.abs(estimate - target))\n\ndef absrel(estimate, target):\n    return np.mean(np.abs(estimate - target) / target)\n\ndef inv_rmse(estimate, target):\n    return np.sqrt(np.mean((1.0/estimate - 1.0/target) ** 2))\n\ndef inv_mae(estimate, target):\n    return np.mean(np.abs(1.0/estimate - 1.0/target))\n\ndef inv_absrel(estimate, target):\n    return np.mean((np.abs(1.0/estimate - 1.0/target)) / (1.0/target))\n\nclass ErrorMetrics(object):\n    def __init__(self):\n        # initialize by setting to worst values\n        self.rmse, self.mae, self.absrel = np.inf, np.inf, np.inf\n        self.inv_rmse, self.inv_mae, self.inv_absrel = np.inf, np.inf, np.inf\n\n    def compute(self, estimate, target, valid):\n        # apply valid masks\n        estimate = estimate[valid]\n        target = target[valid]\n\n        # estimate and target will be in inverse space, convert to regular\n        estimate = 1.0/estimate\n        target = 1.0/target\n\n        # depth error, estimate in meters, convert units to mm\n        self.rmse = rmse(1000.0*estimate, 1000.0*target)\n        self.mae = mae(1000.0*estimate, 1000.0*target)\n        self.absrel = absrel(1000.0*estimate, 1000.0*target)\n\n        # inverse depth error, estimate in meters, convert units to 1/km\n        self.inv_rmse = inv_rmse(0.001*estimate, 0.001*target)\n        self.inv_mae = inv_mae(0.001*estimate, 0.001*target)\n        self.inv_absrel = inv_absrel(0.001*estimate, 0.001*target)\n\nclass ErrorMetricsAverager(object):\n    def __init__(self):\n        # initialize avg accumulators to zero\n        self.rmse_avg, self.mae_avg, self.absrel_avg = 0, 0, 0\n        self.inv_rmse_avg, self.inv_mae_avg, self.inv_absrel_avg = 0, 0, 0\n        self.total_count = 0\n\n    def accumulate(self, error_metrics):\n        # adds to accumulators from ErrorMetrics object\n        assert isinstance(error_metrics, ErrorMetrics)\n\n        self.rmse_avg += error_metrics.rmse\n        self.mae_avg += error_metrics.mae\n        self.absrel_avg += error_metrics.absrel\n\n        self.inv_rmse_avg += error_metrics.inv_rmse\n        self.inv_mae_avg += error_metrics.inv_mae\n        self.inv_absrel_avg += error_metrics.inv_absrel\n\n        self.total_count += 1\n\n    def average(self):\n        # print(f\"Averaging depth metrics over {self.total_count} samples\")\n        self.rmse_avg = self.rmse_avg / self.total_count\n        self.mae_avg = self.mae_avg / self.total_count\n        self.absrel_avg = self.absrel_avg / self.total_count\n        # print(f\"Averaging inv depth metrics over {self.total_count} samples\")\n        self.inv_rmse_avg = self.inv_rmse_avg / self.total_count\n        self.inv_mae_avg = self.inv_mae_avg / self.total_count\n        self.inv_absrel_avg = self.inv_absrel_avg / self.total_count"
  },
  {
    "path": "modules/estimator.py",
    "content": "import numpy as np\n\ndef compute_scale_and_shift_ls(prediction, target, mask):\n    # tuple specifying with axes to sum\n    sum_axes = (0, 1)\n\n    # system matrix: A = [[a_00, a_01], [a_10, a_11]]\n    a_00 = np.sum(mask * prediction * prediction, sum_axes)\n    a_01 = np.sum(mask * prediction, sum_axes)\n    a_11 = np.sum(mask, sum_axes)\n\n    # right hand side: b = [b_0, b_1]\n    b_0 = np.sum(mask * prediction * target, sum_axes)\n    b_1 = np.sum(mask * target, sum_axes)\n\n    # solution: x = A^-1 . b = [[a_11, -a_01], [-a_10, a_00]] / (a_00 * a_11 - a_01 * a_10) . b\n    x_0 = np.zeros_like(b_0)\n    x_1 = np.zeros_like(b_1)\n\n    det = a_00 * a_11 - a_01 * a_01\n    # A needs to be a positive definite matrix.\n    valid = det > 0\n\n    x_0[valid] = (a_11[valid] * b_0[valid] - a_01[valid] * b_1[valid]) / det[valid]\n    x_1[valid] = (-a_01[valid] * b_0[valid] + a_00[valid] * b_1[valid]) / det[valid]\n\n    return x_0, x_1\n\nclass LeastSquaresEstimator(object):\n    def __init__(self, estimate, target, valid):\n        self.estimate = estimate\n        self.target = target\n        self.valid = valid\n\n        # to be computed\n        self.scale = 1.0\n        self.shift = 0.0\n        self.output = None\n\n    def compute_scale_and_shift(self):\n        self.scale, self.shift = compute_scale_and_shift_ls(self.estimate, self.target, self.valid)\n\n    def apply_scale_and_shift(self):\n        self.output = self.estimate * self.scale + self.shift\n\n    def clamp_min_max(self, clamp_min=None, clamp_max=None):\n        if clamp_min is not None:\n            if clamp_min > 0:\n                clamp_min_inv = 1.0/clamp_min\n                self.output[self.output > clamp_min_inv] = clamp_min_inv\n                assert np.max(self.output) <= clamp_min_inv\n            else: # divide by zero, so skip\n                pass\n        if clamp_max is not None:\n            clamp_max_inv = 1.0/clamp_max\n            self.output[self.output < clamp_max_inv] = clamp_max_inv\n            # print(np.min(self.output), clamp_max_inv)\n            assert np.min(self.output) >= clamp_max_inv\n        # check for nonzero range\n        # assert np.min(self.output) != np.max(self.output)"
  },
  {
    "path": "modules/interpolator.py",
    "content": "import numpy as np\nnp.set_printoptions(suppress=True)\n\nfrom scipy.interpolate import griddata\n\n\ndef interpolate_knots(map_size, knot_coords, knot_values, interpolate, fill_corners):\n    grid_x, grid_y = np.mgrid[0:map_size[0], 0:map_size[1]]\n\n    interpolated_map = griddata(\n        points=knot_coords.T,\n        values=knot_values,\n        xi=(grid_y, grid_x),\n        method=interpolate,\n        fill_value=1.0)\n\n    return interpolated_map\n\n\nclass Interpolator2D(object):\n    def __init__(self, pred_inv, sparse_depth_inv, valid):\n        self.pred_inv = pred_inv\n        self.sparse_depth_inv = sparse_depth_inv\n        self.valid = valid\n\n        self.map_size = np.shape(pred_inv)\n        self.num_knots = np.sum(valid)\n        nonzero_y_loc = np.nonzero(valid)[0]\n        nonzero_x_loc = np.nonzero(valid)[1]\n        self.knot_coords = np.stack((nonzero_x_loc, nonzero_y_loc))\n        self.knot_scales = sparse_depth_inv[valid] / pred_inv[valid]\n        self.knot_shifts = sparse_depth_inv[valid] - pred_inv[valid]\n\n        self.knot_list = []\n        for i in range(self.num_knots):\n            self.knot_list.append((int(self.knot_coords[0,i]), int(self.knot_coords[1,i])))\n\n        # to be computed\n        self.interpolated_map = None\n        self.confidence_map = None\n        self.output = None\n\n    def generate_interpolated_scale_map(self, interpolate_method, fill_corners=False):\n        self.interpolated_scale_map = interpolate_knots(\n            map_size=self.map_size, \n            knot_coords=self.knot_coords, \n            knot_values=self.knot_scales,\n            interpolate=interpolate_method,\n            fill_corners=fill_corners\n        ).astype(np.float32)"
  },
  {
    "path": "modules/midas/base_model.py",
    "content": "import torch\n\n\nclass BaseModel(torch.nn.Module):\n    def load(self, path):\n        \"\"\"Load model from file.\n\n        Args:\n            path (str): file path\n        \"\"\"\n        parameters = torch.load(path, map_location=torch.device('cpu'))\n\n        if \"optimizer\" in parameters:\n            parameters = parameters[\"model\"]\n\n        if \"state_dict\" in parameters:\n            state_dict = parameters[\"state_dict\"]\n            new_state_dict = {}\n            for key in state_dict.keys():\n                if key[0:6] == \"model.\":\n                    new_state_dict[key[6:]] = state_dict[key]\n\n            self.load_state_dict(new_state_dict)\n\n        else:\n            self.load_state_dict(parameters)\n"
  },
  {
    "path": "modules/midas/blocks.py",
    "content": "import torch\nimport torch.nn as nn\n\ndef _make_encoder(backbone, features, use_pretrained, groups=1, expand=False, exportable=True):\n    if backbone == \"efficientnet_lite3\":\n        pretrained = _make_pretrained_efficientnet_lite3(use_pretrained, exportable=exportable)\n        scratch = _make_scratch([32, 48, 136, 384], features, groups=groups, expand=expand)  # efficientnet_lite3     \n    else:\n        print(f\"Backbone '{backbone}' not implemented\")\n        assert False\n        \n    return pretrained, scratch\n\n\ndef _make_scratch(in_shape, out_shape, groups=1, expand=False):\n    scratch = nn.Module()\n\n    out_shape1 = out_shape\n    out_shape2 = out_shape\n    out_shape3 = out_shape\n    out_shape4 = out_shape\n    if expand==True:\n        out_shape1 = out_shape\n        out_shape2 = out_shape*2\n        out_shape3 = out_shape*4\n        out_shape4 = out_shape*8\n\n    scratch.layer1_rn = nn.Conv2d(\n        in_shape[0], out_shape1, kernel_size=3, stride=1, padding=1, bias=False, groups=groups\n    )\n    scratch.layer2_rn = nn.Conv2d(\n        in_shape[1], out_shape2, kernel_size=3, stride=1, padding=1, bias=False, groups=groups\n    )\n    scratch.layer3_rn = nn.Conv2d(\n        in_shape[2], out_shape3, kernel_size=3, stride=1, padding=1, bias=False, groups=groups\n    )\n    scratch.layer4_rn = nn.Conv2d(\n        in_shape[3], out_shape4, kernel_size=3, stride=1, padding=1, bias=False, groups=groups\n    )\n\n    return scratch\n\n\ndef _make_pretrained_efficientnet_lite3(use_pretrained, exportable=False):\n    efficientnet = torch.hub.load(\n        \"rwightman/gen-efficientnet-pytorch\",\n        \"tf_efficientnet_lite3\",\n        pretrained=use_pretrained,\n        exportable=exportable\n    )\n    return _make_efficientnet_backbone(efficientnet)\n\n\ndef _make_efficientnet_backbone(effnet):\n    pretrained = nn.Module()\n\n    pretrained.layer1 = nn.Sequential(\n        effnet.conv_stem, effnet.bn1, effnet.act1, *effnet.blocks[0:2]\n    )\n    pretrained.layer2 = nn.Sequential(*effnet.blocks[2:3])\n    pretrained.layer3 = nn.Sequential(*effnet.blocks[3:5])\n    pretrained.layer4 = nn.Sequential(*effnet.blocks[5:9])\n\n    return pretrained\n\n\nclass ResidualConvUnit_custom(nn.Module):\n    \"\"\"Residual convolution module.\n    \"\"\"\n\n    def __init__(self, features, activation, bn):\n        \"\"\"Init.\n\n        Args:\n            features (int): number of features\n        \"\"\"\n        super().__init__()\n\n        self.bn = bn\n\n        self.groups=1\n\n        self.conv1 = nn.Conv2d(\n            features, features, kernel_size=3, stride=1, padding=1, bias=True, groups=self.groups\n        )\n        \n        self.conv2 = nn.Conv2d(\n            features, features, kernel_size=3, stride=1, padding=1, bias=True, groups=self.groups\n        )\n\n        if self.bn==True:\n            self.bn1 = nn.BatchNorm2d(features)\n            self.bn2 = nn.BatchNorm2d(features)\n\n        self.activation = activation\n\n        self.skip_add = nn.quantized.FloatFunctional()\n\n    def forward(self, x):\n        \"\"\"Forward pass.\n\n        Args:\n            x (tensor): input\n\n        Returns:\n            tensor: output\n        \"\"\"\n        \n        out = self.activation(x)\n        out = self.conv1(out)\n        if self.bn==True:\n            out = self.bn1(out)\n       \n        out = self.activation(out)\n        out = self.conv2(out)\n        if self.bn==True:\n            out = self.bn2(out)\n\n        if self.groups > 1:\n            out = self.conv_merge(out)\n\n        return self.skip_add.add(out, x)\n\n\nclass FeatureFusionBlock_custom(nn.Module):\n    \"\"\"Feature fusion block.\n    \"\"\"\n\n    def __init__(self, features, activation, deconv=False, bn=False, expand=False, align_corners=True):\n        \"\"\"Init.\n\n        Args:\n            features (int): number of features\n        \"\"\"\n        super(FeatureFusionBlock_custom, self).__init__()\n\n        self.deconv = deconv\n        self.align_corners = align_corners\n\n        self.groups=1\n\n        self.expand = expand\n        out_features = features\n        if self.expand==True:\n            out_features = features//2\n        \n        self.out_conv = nn.Conv2d(features, out_features, kernel_size=1, stride=1, padding=0, bias=True, groups=1)\n\n        self.resConfUnit1 = ResidualConvUnit_custom(features, activation, bn)\n        self.resConfUnit2 = ResidualConvUnit_custom(features, activation, bn)\n        \n        self.skip_add = nn.quantized.FloatFunctional()\n\n    def forward(self, *xs):\n        \"\"\"Forward pass.\n\n        Returns:\n            tensor: output\n        \"\"\"\n        output = xs[0]\n\n        if len(xs) == 2:\n            res = self.resConfUnit1(xs[1])\n            output = self.skip_add.add(output, res)\n\n        output = self.resConfUnit2(output)\n\n        output = nn.functional.interpolate(\n            output, scale_factor=2, mode=\"bilinear\", align_corners=self.align_corners\n        )\n\n        output = self.out_conv(output)\n\n        return output\n\n\nclass OutputConv(nn.Module):\n    \"\"\"Output conv block.\n    \"\"\"\n\n    def __init__(self, features, groups, activation, non_negative):\n\n        super(OutputConv, self).__init__()\n\n        self.output_conv = nn.Sequential(\n            nn.Conv2d(features, features//2, kernel_size=3, stride=1, padding=1, groups=groups),\n            nn.Upsample(scale_factor=2, mode=\"bilinear\"),\n            nn.Conv2d(features//2, 32, kernel_size=3, stride=1, padding=1),\n            activation,\n            nn.Conv2d(32, 1, kernel_size=1, stride=1, padding=0),\n            nn.ReLU(True) if non_negative else nn.Identity(),\n            nn.Identity(),\n        )\n\n    def forward(self, x):        \n        return self.output_conv(x)"
  },
  {
    "path": "modules/midas/midas_net_custom.py",
    "content": "\"\"\"MidashNet: Network for monocular depth estimation trained by mixing several datasets.\nThis file contains code that is adapted from\nhttps://github.com/thomasjpfan/pytorch_refinenet/blob/master/pytorch_refinenet/refinenet/refinenet_4cascade.py\n\"\"\"\nimport torch\nimport torch.nn as nn\n\nfrom torch.nn import functional as F\n\nfrom .base_model import BaseModel\nfrom .blocks import FeatureFusionBlock_custom, _make_encoder, OutputConv\n\ndef weights_init(m):\n    import math\n    # initialize from normal (Gaussian) distribution\n    if isinstance(m, nn.Conv2d):\n        n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels\n        m.weight.data.normal_(0, math.sqrt(2.0 / n))\n        if m.bias is not None:\n            m.bias.data.zero_()\n    elif isinstance(m, nn.BatchNorm2d):\n        m.weight.data.fill_(1)\n        m.bias.data.zero_()\n\n\nclass MidasNet_small_videpth(BaseModel):\n    \"\"\"Network for monocular depth estimation.\n    \"\"\"\n\n    def __init__(self, path=None, features=64, backbone=\"efficientnet_lite3\", non_negative=False, exportable=True, channels_last=False, align_corners=True,\n        blocks={'expand': True}, in_channels=2, regress='r', min_pred=None, max_pred=None):\n        \"\"\"Init.\n\n        Args:\n            path (str, optional): Path to saved model. Defaults to None.\n            features (int, optional): Number of features. Defaults to 64.\n            backbone (str, optional): Backbone network for encoder. Defaults to efficientnet_lite3.\n        \"\"\"\n        print(\"Loading weights: \", path)\n\n        super(MidasNet_small_videpth, self).__init__()\n\n        use_pretrained = False if path else True\n                \n        self.channels_last = channels_last\n        self.blocks = blocks\n        self.backbone = backbone\n\n        self.groups = 1\n\n        # for model output\n        self.regress = regress\n        self.min_pred = min_pred\n        self.max_pred = max_pred\n\n        features1=features\n        features2=features\n        features3=features\n        features4=features\n        self.expand = False\n        if \"expand\" in self.blocks and self.blocks['expand'] == True:\n            self.expand = True\n            features1=features\n            features2=features*2\n            features3=features*4\n            features4=features*8\n\n        self.first = nn.Sequential(\n            nn.Conv2d(in_channels, 3, kernel_size=3, stride=1, padding=1),\n            nn.BatchNorm2d(3),\n            nn.ReLU(inplace=True)\n        )\n        self.first.apply(weights_init)\n\n        self.pretrained, self.scratch = _make_encoder(self.backbone, features, use_pretrained, groups=self.groups, expand=self.expand, exportable=exportable)\n\n        self.scratch.activation = nn.ReLU(False)    \n\n        self.scratch.refinenet4 = FeatureFusionBlock_custom(features4, self.scratch.activation, deconv=False, bn=False, expand=self.expand, align_corners=align_corners)\n        self.scratch.refinenet3 = FeatureFusionBlock_custom(features3, self.scratch.activation, deconv=False, bn=False, expand=self.expand, align_corners=align_corners)\n        self.scratch.refinenet2 = FeatureFusionBlock_custom(features2, self.scratch.activation, deconv=False, bn=False, expand=self.expand, align_corners=align_corners)\n        self.scratch.refinenet1 = FeatureFusionBlock_custom(features1, self.scratch.activation, deconv=False, bn=False, align_corners=align_corners)\n\n        self.scratch.output_conv = OutputConv(features, self.groups, self.scratch.activation, non_negative)\n        \n        if path:\n            self.load(path)\n\n\n    def forward(self, x, d):\n        \"\"\"Forward pass.\n\n        Args:\n            x (tensor): input data (image)\n            d (tensor): unalterated input depth\n\n        Returns:\n            tensor: depth\n        \"\"\"\n        if self.channels_last==True:\n            print(\"self.channels_last = \", self.channels_last)\n            x.contiguous(memory_format=torch.channels_last)\n\n        layer_0 = self.first(x)\n\n        layer_1 = self.pretrained.layer1(layer_0)\n        layer_2 = self.pretrained.layer2(layer_1)\n        layer_3 = self.pretrained.layer3(layer_2)\n        layer_4 = self.pretrained.layer4(layer_3)\n        \n        layer_1_rn = self.scratch.layer1_rn(layer_1)\n        layer_2_rn = self.scratch.layer2_rn(layer_2)\n        layer_3_rn = self.scratch.layer3_rn(layer_3)\n        layer_4_rn = self.scratch.layer4_rn(layer_4)\n\n        path_4 = self.scratch.refinenet4(layer_4_rn)\n        path_3 = self.scratch.refinenet3(path_4, layer_3_rn)\n        path_2 = self.scratch.refinenet2(path_3, layer_2_rn)\n        path_1 = self.scratch.refinenet1(path_2, layer_1_rn)\n        \n        out = self.scratch.output_conv(path_1)\n\n        scales = F.relu(1.0 + out)\n        pred = d * scales\n\n        # clamp pred to min and max\n        if self.min_pred is not None:\n            min_pred_inv = 1.0/self.min_pred\n            pred[pred > min_pred_inv] = min_pred_inv\n        if self.max_pred is not None:\n            max_pred_inv = 1.0/self.max_pred\n            pred[pred < max_pred_inv] = max_pred_inv\n\n        # also return scales\n        return (pred, scales)"
  },
  {
    "path": "modules/midas/normalization.py",
    "content": "VOID_INTERMEDIATE = {\n\n    \"dpt_beit_large_512\" : {\n        \"void_150\" : { \n            \"mean\" : {\"int_depth\" : 0.730, \"int_scales\" : 0.380}, \n            \"std\" : {\"int_depth\" : 0.226, \"int_scales\" : 0.102},\n        },\n        \"void_500\" : { \n            \"mean\" : {\"int_depth\" : 0.736, \"int_scales\" : 0.366}, \n            \"std\" : {\"int_depth\" : 0.232, \"int_scales\" : 0.099},\n        },\n        \"void_1500\" : { \n            \"mean\" : {\"int_depth\" : 0.730, \"int_scales\" : 0.355}, \n            \"std\" : {\"int_depth\" : 0.232, \"int_scales\" : 0.096},\n        },\n    },\n\n    \"dpt_swin2_large_384\" : {\n        \"void_150\" : { \n            \"mean\" : {\"int_depth\" : 0.730, \"int_scales\" : 0.402}, \n            \"std\" : {\"int_depth\" : 0.219, \"int_scales\" : 0.107},\n        },\n        \"void_500\" : { \n            \"mean\" : {\"int_depth\" : 0.736, \"int_scales\" : 0.389}, \n            \"std\" : {\"int_depth\" : 0.224, \"int_scales\" : 0.106},\n        },\n        \"void_1500\" : { \n            \"mean\" : {\"int_depth\" : 0.730, \"int_scales\" : 0.377}, \n            \"std\" : {\"int_depth\" : 0.226, \"int_scales\" : 0.103},\n        },\n    },\n\n    \"dpt_large\" : {\n        \"void_150\" : { \n            \"mean\" : {\"int_depth\" : 0.729, \"int_scales\" : 0.403}, \n            \"std\" : {\"int_depth\" : 0.213, \"int_scales\" : 0.116},\n        },\n        \"void_500\" : { \n            \"mean\" : {\"int_depth\" : 0.735, \"int_scales\" : 0.390}, \n            \"std\" : {\"int_depth\" : 0.219, \"int_scales\" : 0.116},\n        },\n        \"void_1500\" : { \n            \"mean\" : {\"int_depth\" : 0.730, \"int_scales\" : 0.380}, \n            \"std\" : {\"int_depth\" : 0.221, \"int_scales\" : 0.116},\n        },\n    },\n\n    \"dpt_hybrid\": {\n        \"void_150\" : { \n            \"mean\" : {\"int_depth\" : 0.729, \"int_scales\" : 0.404}, \n            \"std\" : {\"int_depth\" : 0.210, \"int_scales\" : 0.117},\n        },\n        \"void_500\" : { \n            \"mean\" : {\"int_depth\" : 0.735, \"int_scales\" : 0.392}, \n            \"std\" : {\"int_depth\" : 0.215, \"int_scales\" : 0.118},\n        },\n        \"void_1500\" : { \n            \"mean\" : {\"int_depth\" : 0.730, \"int_scales\" : 0.381}, \n            \"std\" : {\"int_depth\" : 0.218, \"int_scales\" : 0.117},\n        },\n    },\n\n    \"dpt_swin2_tiny_256\" : {\n        \"void_150\" : { \n            \"mean\" : {\"int_depth\" : 0.735, \"int_scales\" : 0.419}, \n            \"std\" : {\"int_depth\" : 0.207, \"int_scales\" : 0.122},\n        },\n        \"void_500\" : { \n            \"mean\" : {\"int_depth\" : 0.741, \"int_scales\" : 0.406}, \n            \"std\" : {\"int_depth\" : 0.212, \"int_scales\" : 0.124},\n        },\n        \"void_1500\" : { \n            \"mean\" : {\"int_depth\" : 0.733, \"int_scales\" : 0.396}, \n            \"std\" : {\"int_depth\" : 0.213, \"int_scales\" : 0.125},\n        },\n    },\n\n    \"dpt_levit_224\" : {\n        \"void_150\" : { \n            \"mean\" : {\"int_depth\" : 0.734, \"int_scales\" : 0.421}, \n            \"std\" : {\"int_depth\" : 0.198, \"int_scales\" : 0.129},\n        },\n        \"void_500\" : { \n            \"mean\" : {\"int_depth\" : 0.740, \"int_scales\" : 0.410}, \n            \"std\" : {\"int_depth\" : 0.202, \"int_scales\" : 0.134},\n        },\n        \"void_1500\" : { \n            \"mean\" : {\"int_depth\" : 0.734, \"int_scales\" : 0.400}, \n            \"std\" : {\"int_depth\" : 0.204, \"int_scales\" : 0.137},\n        },\n    },\n\n    \"midas_small\" : {\n        \"void_150\" : { \n            \"mean\" : {\"int_depth\" : 0.723, \"int_scales\" : 0.402}, \n            \"std\" : {\"int_depth\" : 0.190, \"int_scales\" : 0.132},\n        },\n        \"void_500\" : { \n            \"mean\" : {\"int_depth\" : 0.731, \"int_scales\" : 0.393}, \n            \"std\" : {\"int_depth\" : 0.196, \"int_scales\" : 0.136},\n        },\n        \"void_1500\" : { \n            \"mean\" : {\"int_depth\" : 0.728, \"int_scales\" : 0.385}, \n            \"std\" : {\"int_depth\" : 0.199, \"int_scales\" : 0.140},\n        },\n    },\n\n}\n\n"
  },
  {
    "path": "modules/midas/transforms.py",
    "content": "import numpy as np\nimport cv2\nimport math\nimport torch\nimport torchvision.transforms as transforms\n\nfrom modules.midas.utils import normalize_unit_range\nimport modules.midas.normalization as normalization\n\nclass Resize(object):\n    \"\"\"Resize sample to given size (width, height).\n    \"\"\"\n\n    def __init__(\n        self,\n        width,\n        height,\n        resize_target=True,\n        keep_aspect_ratio=False,\n        ensure_multiple_of=1,\n        resize_method=\"lower_bound\",\n        image_interpolation_method=cv2.INTER_AREA,\n    ):\n        \"\"\"Init.\n\n        Args:\n            width (int): desired output width\n            height (int): desired output height\n            resize_target (bool, optional):\n                True: Resize the full sample (image, mask, target).\n                False: Resize image only.\n                Defaults to True.\n            keep_aspect_ratio (bool, optional):\n                True: Keep the aspect ratio of the input sample.\n                Output sample might not have the given width and height, and\n                resize behaviour depends on the parameter 'resize_method'.\n                Defaults to False.\n            ensure_multiple_of (int, optional):\n                Output width and height is constrained to be multiple of this parameter.\n                Defaults to 1.\n            resize_method (str, optional):\n                \"lower_bound\": Output will be at least as large as the given size.\n                \"upper_bound\": Output will be at max as large as the given size. (Output size might be smaller than given size.)\n                \"minimal\": Scale as least as possible.  (Output size might be smaller than given size.)\n                Defaults to \"lower_bound\".\n        \"\"\"\n        self.__width = width\n        self.__height = height\n\n        self.__resize_target = resize_target\n        self.__keep_aspect_ratio = keep_aspect_ratio\n        self.__multiple_of = ensure_multiple_of\n        self.__resize_method = resize_method\n        self.__image_interpolation_method = image_interpolation_method\n\n    def constrain_to_multiple_of(self, x, min_val=0, max_val=None):\n        y = (np.round(x / self.__multiple_of) * self.__multiple_of).astype(int)\n\n        if max_val is not None and y > max_val:\n            y = (np.floor(x / self.__multiple_of) * self.__multiple_of).astype(int)\n\n        if y < min_val:\n            y = (np.ceil(x / self.__multiple_of) * self.__multiple_of).astype(int)\n\n        return y\n\n    def get_size(self, width, height):\n        # determine new height and width\n        scale_height = self.__height / height\n        scale_width = self.__width / width\n\n        if self.__keep_aspect_ratio:\n            if self.__resize_method == \"lower_bound\":\n                # scale such that output size is lower bound\n                if scale_width > scale_height:\n                    # fit width\n                    scale_height = scale_width\n                else:\n                    # fit height\n                    scale_width = scale_height\n            elif self.__resize_method == \"upper_bound\":\n                # scale such that output size is upper bound\n                if scale_width < scale_height:\n                    # fit width\n                    scale_height = scale_width\n                else:\n                    # fit height\n                    scale_width = scale_height\n            elif self.__resize_method == \"minimal\":\n                # scale as least as possbile\n                if abs(1 - scale_width) < abs(1 - scale_height):\n                    # fit width\n                    scale_height = scale_width\n                else:\n                    # fit height\n                    scale_width = scale_height\n            else:\n                raise ValueError(\n                    f\"resize_method {self.__resize_method} not implemented\"\n                )\n\n        if self.__resize_method == \"lower_bound\":\n            new_height = self.constrain_to_multiple_of(\n                scale_height * height, min_val=self.__height\n            )\n            new_width = self.constrain_to_multiple_of(\n                scale_width * width, min_val=self.__width\n            )\n        elif self.__resize_method == \"upper_bound\":\n            new_height = self.constrain_to_multiple_of(\n                scale_height * height, max_val=self.__height\n            )\n            new_width = self.constrain_to_multiple_of(\n                scale_width * width, max_val=self.__width\n            )\n        elif self.__resize_method == \"minimal\":\n            new_height = self.constrain_to_multiple_of(scale_height * height)\n            new_width = self.constrain_to_multiple_of(scale_width * width)\n        else:\n            raise ValueError(f\"resize_method {self.__resize_method} not implemented\")\n\n        return (new_width, new_height)\n\n    def __call__(self, sample):\n        width, height = self.get_size(\n            sample[\"image\"].shape[1], sample[\"image\"].shape[0]\n        )\n\n        # resize sample\n        for item in sample.keys():\n            interpolation_method = self.__image_interpolation_method\n            sample[item] = cv2.resize(\n                sample[item],\n                (width, height),\n                interpolation=interpolation_method,\n            )\n\n        if self.__resize_target:\n\n            if \"depth\" in sample:\n                sample[\"depth\"] = cv2.resize(\n                    sample[\"depth\"], \n                    (width, height), \n                    interpolation=cv2.INTER_NEAREST\n                )\n\n            if \"mask\" in sample:\n                sample[\"mask\"] = cv2.resize(\n                    sample[\"mask\"].astype(np.float32),\n                    (width, height),\n                    interpolation=cv2.INTER_NEAREST,\n                )\n                sample[\"mask\"] = sample[\"mask\"].astype(bool)\n\n        return sample\n\n\nclass NormalizeImage(object):\n    \"\"\"Normalize image by given mean and std.\n    \"\"\"\n\n    def __init__(self, mean, std):\n        self.__mean = mean\n        self.__std = std\n\n    def __call__(self, sample):\n        sample[\"image\"] = (sample[\"image\"] - self.__mean) / self.__std\n\n        return sample\n\nclass NormalizeIntermediate(object):\n    \"\"\"Normalize intermediate data by given mean and std.\n    \"\"\"\n\n    def __init__(self, mean, std):\n\n        self.__int_depth_mean = mean[\"int_depth\"]\n        self.__int_depth_std = std[\"int_depth\"]\n\n        self.__int_scales_mean = mean[\"int_scales\"]\n        self.__int_scales_std = std[\"int_scales\"]\n\n    def __call__(self, sample):\n\n        if \"int_depth\" in sample and sample[\"int_depth\"] is not None:\n            sample[\"int_depth\"] = (sample[\"int_depth\"] - self.__int_depth_mean) / self.__int_depth_std\n\n        if \"int_scales\" in sample and sample[\"int_scales\"] is not None:\n            sample[\"int_scales\"] = (sample[\"int_scales\"] - self.__int_scales_mean) / self.__int_scales_std\n\n        return sample\n\nclass PrepareForNet(object):\n    \"\"\"Prepare sample for usage as network input.\n    \"\"\"\n\n    def __init__(self):\n        pass\n\n    def __call__(self, sample):\n\n        for item in sample.keys():\n\n            if sample[item] is None:\n                pass\n            elif item == \"image\":\n                image = np.transpose(sample[\"image\"], (2, 0, 1))\n                sample[\"image\"] = np.ascontiguousarray(image).astype(np.float32)\n            else:\n                array = sample[item].astype(np.float32)\n                array = np.expand_dims(array, axis=0) # add channel dim\n                sample[item] = np.ascontiguousarray(array)\n\n        return sample\n\n\nclass Tensorize(object):\n    \"\"\"Convert sample to tensor.\n    \"\"\"\n\n    def __init__(self):\n        pass\n\n    def __call__(self, sample):\n\n        for item in sample.keys():\n\n            if sample[item] is None:\n                pass\n            else:\n                # before tensorizing, verify that data is clean\n                assert not np.any(np.isnan(sample[item])) \n                sample[item] = torch.Tensor(sample[item])\n\n        return sample\n\n\ndef get_transforms(depth_predictor, sparsifier, nsamples):\n\n    image_mean_dict = {\n        \"dpt_beit_large_512\"    : [0.5, 0.5, 0.5],\n        \"dpt_swin2_large_384\"   : [0.5, 0.5, 0.5],\n        \"dpt_large\"             : [0.5, 0.5, 0.5], \n        \"dpt_hybrid\"            : [0.5, 0.5, 0.5],\n        \"dpt_swin2_tiny_256\"    : [0.5, 0.5, 0.5],\n        \"dpt_levit_224\"         : [0.5, 0.5, 0.5],\n        \"midas_small\"           : [0.485, 0.456, 0.406],\n    }\n\n    image_std_dict = {\n        \"dpt_beit_large_512\"    : [0.5, 0.5, 0.5],\n        \"dpt_swin2_large_384\"   : [0.5, 0.5, 0.5],\n        \"dpt_large\"             : [0.5, 0.5, 0.5], \n        \"dpt_hybrid\"            : [0.5, 0.5, 0.5],\n        \"dpt_swin2_tiny_256\"    : [0.5, 0.5, 0.5],\n        \"dpt_levit_224\"         : [0.5, 0.5, 0.5],\n        \"midas_small\"           : [0.229, 0.224, 0.225],\n    }\n\n    resize_method_dict = {\n        \"dpt_beit_large_512\"    : \"minimal\", \n        \"dpt_swin2_large_384\"   : \"minimal\",\n        \"dpt_large\"             : \"minimal\", \n        \"dpt_hybrid\"            : \"minimal\", \n        \"dpt_swin2_tiny_256\"    : \"minimal\",\n        \"dpt_levit_224\"         : \"minimal\",\n        \"midas_small\"           : \"upper_bound\",\n    }\n\n    resize_dict = {\n        \"dpt_beit_large_512\"    : 384,\n        \"dpt_swin2_large_384\"   : 384,\n        \"dpt_large\"             : 384,\n        \"dpt_hybrid\"            : 384,\n        \"dpt_swin2_tiny_256\"    : 256,\n        \"dpt_levit_224\"         : 224,\n        \"midas_small\"           : 384,\n    }\n\n    keep_aspect_ratio = True\n    if \"swin2\" in depth_predictor or \"levit\" in depth_predictor:\n        keep_aspect_ratio = False\n\n    depth_model_transform_steps = [\n        Resize(\n            width=resize_dict[depth_predictor],\n            height=resize_dict[depth_predictor],\n            resize_target=False,\n            keep_aspect_ratio=keep_aspect_ratio,\n            ensure_multiple_of=32,\n            resize_method=resize_method_dict[depth_predictor],\n            image_interpolation_method=cv2.INTER_CUBIC,\n        ),\n        NormalizeImage(\n            mean=image_mean_dict[depth_predictor], \n            std=image_std_dict[depth_predictor]\n        ),\n        PrepareForNet(),\n        Tensorize(),     \n    ]\n\n    sml_model_transform_steps = [\n        Resize(\n            width=384,\n            height=384,\n            resize_target=False,\n            keep_aspect_ratio=True,\n            ensure_multiple_of=32,\n            resize_method=resize_method_dict[\"midas_small\"],\n            image_interpolation_method=cv2.INTER_CUBIC,\n        ),\n        NormalizeIntermediate(\n            mean=normalization.VOID_INTERMEDIATE[depth_predictor][f\"{sparsifier}_{nsamples}\"][\"mean\"], \n            std=normalization.VOID_INTERMEDIATE[depth_predictor][f\"{sparsifier}_{nsamples}\"][\"std\"],\n        ),\n        PrepareForNet(),\n        Tensorize(),\n    ]\n\n    return {\n        \"depth_model\" : transforms.Compose(depth_model_transform_steps),\n        \"sml_model\"   : transforms.Compose(sml_model_transform_steps),\n    }\n"
  },
  {
    "path": "modules/midas/utils.py",
    "content": "\"\"\"Utils for monoDepth.\n\"\"\"\nimport sys\nimport re\nimport numpy as np\nimport cv2\nimport torch\n\n\ndef read_pfm(path):\n    \"\"\"Read pfm file.\n\n    Args:\n        path (str): path to file\n\n    Returns:\n        tuple: (data, scale)\n    \"\"\"\n    with open(path, \"rb\") as file:\n\n        color = None\n        width = None\n        height = None\n        scale = None\n        endian = None\n\n        header = file.readline().rstrip()\n        if header.decode(\"ascii\") == \"PF\":\n            color = True\n        elif header.decode(\"ascii\") == \"Pf\":\n            color = False\n        else:\n            raise Exception(\"Not a PFM file: \" + path)\n\n        dim_match = re.match(r\"^(\\d+)\\s(\\d+)\\s$\", file.readline().decode(\"ascii\"))\n        if dim_match:\n            width, height = list(map(int, dim_match.groups()))\n        else:\n            raise Exception(\"Malformed PFM header.\")\n\n        scale = float(file.readline().decode(\"ascii\").rstrip())\n        if scale < 0:\n            # little-endian\n            endian = \"<\"\n            scale = -scale\n        else:\n            # big-endian\n            endian = \">\"\n\n        data = np.fromfile(file, endian + \"f\")\n        shape = (height, width, 3) if color else (height, width)\n\n        data = np.reshape(data, shape)\n        data = np.flipud(data)\n\n        return data, scale\n\n\ndef write_pfm(path, image, scale=1):\n    \"\"\"Write pfm file.\n\n    Args:\n        path (str): pathto file\n        image (array): data\n        scale (int, optional): Scale. Defaults to 1.\n    \"\"\"\n\n    with open(path, \"wb\") as file:\n        color = None\n\n        if image.dtype.name != \"float32\":\n            raise Exception(\"Image dtype must be float32.\")\n\n        image = np.flipud(image)\n\n        if len(image.shape) == 3 and image.shape[2] == 3:  # color image\n            color = True\n        elif (\n            len(image.shape) == 2 or len(image.shape) == 3 and image.shape[2] == 1\n        ):  # greyscale\n            color = False\n        else:\n            raise Exception(\"Image must have H x W x 3, H x W x 1 or H x W dimensions.\")\n\n        file.write(\"PF\\n\" if color else \"Pf\\n\".encode())\n        file.write(\"%d %d\\n\".encode() % (image.shape[1], image.shape[0]))\n\n        endian = image.dtype.byteorder\n\n        if endian == \"<\" or endian == \"=\" and sys.byteorder == \"little\":\n            scale = -scale\n\n        file.write(\"%f\\n\".encode() % scale)\n\n        image.tofile(file)\n\n\ndef read_image(path):\n    \"\"\"Read image and output RGB image (0-1).\n\n    Args:\n        path (str): path to file\n\n    Returns:\n        array: RGB image (0-1)\n    \"\"\"\n    img = cv2.imread(path)\n\n    if img.ndim == 2:\n        img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)\n\n    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) / 255.0\n\n    return img\n\n\ndef resize_image(img):\n    \"\"\"Resize image and make it fit for network.\n\n    Args:\n        img (array): image\n\n    Returns:\n        tensor: data ready for network\n    \"\"\"\n    height_orig = img.shape[0]\n    width_orig = img.shape[1]\n\n    if width_orig > height_orig:\n        scale = width_orig / 384\n    else:\n        scale = height_orig / 384\n\n    height = (np.ceil(height_orig / scale / 32) * 32).astype(int)\n    width = (np.ceil(width_orig / scale / 32) * 32).astype(int)\n\n    img_resized = cv2.resize(img, (width, height), interpolation=cv2.INTER_AREA)\n\n    img_resized = (\n        torch.from_numpy(np.transpose(img_resized, (2, 0, 1))).contiguous().float()\n    )\n    img_resized = img_resized.unsqueeze(0)\n\n    return img_resized\n\n\ndef resize_depth(depth, width, height):\n    \"\"\"Resize depth map and bring to CPU (numpy).\n\n    Args:\n        depth (tensor): depth\n        width (int): image width\n        height (int): image height\n\n    Returns:\n        array: processed depth\n    \"\"\"\n    depth = torch.squeeze(depth[0, :, :, :]).to(\"cpu\")\n\n    depth_resized = cv2.resize(\n        depth.numpy(), (width, height), interpolation=cv2.INTER_CUBIC\n    )\n\n    return depth_resized\n\n\ndef write_depth(path, depth, bits=1):\n    \"\"\"Write depth map to pfm and png file.\n\n    Args:\n        path (str): filepath without extension\n        depth (array): depth\n    \"\"\"\n    write_pfm(path + \".pfm\", depth.astype(np.float32))\n\n    depth_min = depth.min()\n    depth_max = depth.max()\n\n    max_val = (2**(8*bits))-1\n\n    if depth_max - depth_min > np.finfo(\"float\").eps:\n        out = max_val * (depth - depth_min) / (depth_max - depth_min)\n    else:\n        out = np.zeros(depth.shape, dtype=depth.type)\n\n    if bits == 1:\n        cv2.imwrite(path + \".png\", out.astype(\"uint8\"))\n    elif bits == 2:\n        cv2.imwrite(path + \".png\", out.astype(\"uint16\"))\n\n    return\n\n\ndef write_png(path, array, bits=2):\n    \"\"\"Write array to png file.\n\n    Args:\n        path (str): filepath without extension\n        array (array): array to be saved\n    \"\"\"\n\n    array_min = np.min(array)\n    array_max = np.max(array)\n\n    max_val = (2**(8*bits))-1\n\n    if array_max - array_min > np.finfo(\"float\").eps:\n        out = max_val * (array - array_min) / (array_max - array_min)\n    else:\n        print(f\"zero array not being saved at {path}\")\n        return\n\n    if bits == 1:\n        cv2.imwrite(path + \".png\", out.astype(\"uint8\"))\n    elif bits == 2:\n        cv2.imwrite(path + \".png\", out.astype(\"uint16\"))\n\n    return\n\n\ndef normalize_unit_range(data):\n    \"\"\"Normalize data array to [0, 1] range.\n\n    Args:\n        data (array): input array\n\n    Returns:\n        array: normalized array\n    \"\"\"\n    if np.max(data) - np.min(data) > np.finfo(\"float\").eps:\n        normalized = (data - np.min(data)) / (np.max(data) - np.min(data))\n    else:\n        raise ValueError(\"cannot normalize array, max-min range is 0\")\n    \n    return normalized"
  },
  {
    "path": "pipeline.py",
    "content": "import torch\nimport numpy as np\n\nfrom modules.midas.midas_net_custom import MidasNet_small_videpth\nfrom modules.estimator import LeastSquaresEstimator\nfrom modules.interpolator import Interpolator2D\n\nimport modules.midas.transforms as transforms\nimport modules.midas.utils as utils\n\nclass VIDepth(object):\n    def __init__(self, depth_predictor, nsamples, sml_model_path, \n                min_pred, max_pred, min_depth, max_depth, device):\n\n        # get transforms\n        model_transforms = transforms.get_transforms(depth_predictor, \"void\", str(nsamples))\n        self.depth_model_transform = model_transforms[\"depth_model\"]\n        self.ScaleMapLearner_transform = model_transforms[\"sml_model\"]\n\n        # define depth model\n        if depth_predictor == \"dpt_beit_large_512\":\n            self.DepthModel = torch.hub.load(\"intel-isl/MiDaS\", \"DPT_BEiT_L_512\")\n        elif depth_predictor == \"dpt_swin2_large_384\":\n            self.DepthModel = torch.hub.load(\"intel-isl/MiDaS\", \"DPT_SwinV2_L_384\")\n        elif depth_predictor == \"dpt_large\":\n            self.DepthModel = torch.hub.load(\"intel-isl/MiDaS\", \"DPT_Large\")\n        elif depth_predictor == \"dpt_hybrid\":\n            self.DepthModel = torch.hub.load(\"intel-isl/MiDaS\", \"DPT_Hybrid\")\n        elif depth_predictor == \"dpt_swin2_tiny_256\":\n            self.DepthModel = torch.hub.load(\"intel-isl/MiDaS\", \"DPT_SwinV2_T_256\")\n        elif depth_predictor == \"dpt_levit_224\":\n            self.DepthModel = torch.hub.load(\"intel-isl/MiDaS\", \"DPT_LeViT_224\")\n        elif depth_predictor == \"midas_small\":\n            self.DepthModel = torch.hub.load(\"intel-isl/MiDaS\", \"MiDaS_small\")\n        else:\n            self.DepthModel = None\n\n        # define SML model\n        self.ScaleMapLearner = MidasNet_small_videpth(\n            path=sml_model_path,\n            min_pred=min_pred,\n            max_pred=max_pred,\n        )\n\n        # depth prediction ranges\n        self.min_pred, self.max_pred = min_pred, max_pred\n\n        # depth evaluation ranges\n        self.min_depth, self.max_depth = min_depth, max_depth\n\n        # eval mode\n        self.DepthModel.eval()\n        self.DepthModel.to(device)\n\n        # eval mode\n        self.ScaleMapLearner.eval()\n        self.ScaleMapLearner.to(device)\n\n\n    def run(self, input_image, input_sparse_depth, validity_map, device):\n\n        input_height, input_width = np.shape(input_image)[0], np.shape(input_image)[1]\n        \n        sample = {\"image\" : input_image}\n        sample = self.depth_model_transform(sample)\n        im = sample[\"image\"].to(device)\n\n        input_sparse_depth_valid = (input_sparse_depth < self.max_depth) * (input_sparse_depth > self.min_depth)\n        if validity_map is not None:\n            input_sparse_depth_valid *= validity_map.astype(np.bool)\n\n        input_sparse_depth_valid = input_sparse_depth_valid.astype(bool)\n        input_sparse_depth[~input_sparse_depth_valid] = np.inf # set invalid depth\n        input_sparse_depth = 1.0 / input_sparse_depth\n\n        # run depth model\n        with torch.no_grad():\n            depth_pred = self.DepthModel.forward(im.unsqueeze(0))\n            depth_pred = (\n                torch.nn.functional.interpolate(\n                    depth_pred.unsqueeze(1),\n                    size=(input_height, input_width),\n                    mode=\"bicubic\",\n                    align_corners=False,\n                )\n                .squeeze()\n                .cpu()\n                .numpy()\n            )\n\n        # global scale and shift alignment\n        GlobalAlignment = LeastSquaresEstimator(\n            estimate=depth_pred,\n            target=input_sparse_depth,\n            valid=input_sparse_depth_valid\n        )\n        GlobalAlignment.compute_scale_and_shift()\n        GlobalAlignment.apply_scale_and_shift()\n        GlobalAlignment.clamp_min_max(clamp_min=self.min_pred, clamp_max=self.max_pred)\n        int_depth = GlobalAlignment.output.astype(np.float32)\n\n        # interpolation of scale map\n        assert (np.sum(input_sparse_depth_valid) >= 3), \"not enough valid sparse points\"\n        ScaleMapInterpolator = Interpolator2D(\n            pred_inv = int_depth,\n            sparse_depth_inv = input_sparse_depth,\n            valid = input_sparse_depth_valid,\n        )\n        ScaleMapInterpolator.generate_interpolated_scale_map(\n            interpolate_method='linear', \n            fill_corners=False\n        )\n        int_scales = ScaleMapInterpolator.interpolated_scale_map.astype(np.float32)\n        int_scales = utils.normalize_unit_range(int_scales)\n\n        sample = {\"image\" : input_image, \"int_depth\" : int_depth, \"int_scales\" : int_scales, \"int_depth_no_tf\" : int_depth}\n        sample = self.ScaleMapLearner_transform(sample)\n        x = torch.cat([sample[\"int_depth\"], sample[\"int_scales\"]], 0)\n        x = x.to(device)\n        d = sample[\"int_depth_no_tf\"].to(device)\n\n        # run SML model\n        with torch.no_grad():\n            sml_pred, sml_scales = self.ScaleMapLearner.forward(x.unsqueeze(0), d.unsqueeze(0))\n            sml_pred = (\n                torch.nn.functional.interpolate(\n                    sml_pred,\n                    size=(input_height, input_width),\n                    mode=\"bicubic\",\n                    align_corners=False,\n                )\n                .squeeze()\n                .cpu()\n                .numpy()\n            )\n\n        output = {\n            \"ga_depth\"  : int_depth, \n            \"sml_depth\" : sml_pred, \n        }\n        return output"
  },
  {
    "path": "run.py",
    "content": "import os\nimport argparse\nimport glob\n\nimport torch\nimport numpy as np\n\nfrom PIL import Image\n\nimport modules.midas.utils as utils\n\nimport pipeline\n\n\ndef load_input_image(input_image_fp):\n    return utils.read_image(input_image_fp)\n\n\ndef load_sparse_depth(input_sparse_depth_fp):\n    input_sparse_depth = np.array(Image.open(input_sparse_depth_fp), dtype=np.float32) / 256.0\n    input_sparse_depth[input_sparse_depth <= 0] = 0.0\n    return input_sparse_depth\n\n\ndef run(depth_predictor, nsamples, sml_model_path, \n        min_pred, max_pred, min_depth, max_depth, \n        input_path, output_path, save_output):\n    \n    device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n    print(\"device: %s\" % device)\n\n    # instantiate method\n    method = pipeline.VIDepth(\n        depth_predictor, nsamples, sml_model_path, \n        min_pred, max_pred, min_depth, max_depth, device\n    )\n\n    # get inputs\n    img_names = glob.glob(os.path.join(input_path, \"image\", \"*\"))\n    num_images = len(img_names)\n\n    # create output folders\n    if save_output:\n        os.makedirs(os.path.join(output_path, 'ga_depth'), exist_ok=True)\n        os.makedirs(os.path.join(output_path, 'sml_depth'), exist_ok=True)\n\n    for ind, input_image_fp in enumerate(img_names):\n        if os.path.isdir(input_image_fp):\n            continue\n\n        print(\"  processing {} ({}/{})\".format(input_image_fp, ind + 1, num_images))\n\n        input_image = load_input_image(input_image_fp)\n\n        input_sparse_depth_fp = input_image_fp.replace(\"image\", \"sparse_depth\")\n        input_sparse_depth = load_sparse_depth(input_sparse_depth_fp)\n\n        # values in the [min_depth, max_depth] range are considered valid;\n        # an additional validity map may be specified\n        validity_map = None\n\n        # run method\n        output = method.run(input_image, input_sparse_depth, validity_map, device)\n\n        if save_output:\n            basename = os.path.splitext(os.path.basename(input_image_fp))[0]\n\n            # saving depth map after global alignment\n            utils.write_depth(\n                os.path.join(output_path, 'ga_depth', basename), \n                output[\"ga_depth\"], bits=2\n            )\n\n            # saving depth map after local alignment with SML\n            utils.write_depth(\n                os.path.join(output_path, 'sml_depth', basename), \n                output[\"sml_depth\"], bits=2\n            )\n\nif __name__==\"__main__\":\n\n    parser = argparse.ArgumentParser()\n\n    # model parameters\n    parser.add_argument('-dp', '--depth-predictor', type=str, default='dpt_hybrid', \n                            help='Name of depth predictor to use in pipeline.')\n    parser.add_argument('-ns', '--nsamples', type=int, default=150, \n                            help='Number of sparse metric depth samples available.')\n    parser.add_argument('-sm', '--sml-model-path', type=str, default='', \n                            help='Path to trained SML model weights.')\n\n    # depth parameters\n    parser.add_argument('--min-pred', type=float, default=0.1, \n                            help='Min bound for predicted depth values.')\n    parser.add_argument('--max-pred', type=float, default=8.0, \n                            help='Max bound for predicted depth values.')\n    parser.add_argument('--min-depth', type=float, default=0.2, \n                            help='Min valid depth when evaluating.')\n    parser.add_argument('--max-depth', type=float, default=5.0, \n                            help='Max valid depth when evaluating.')\n\n    # I/O paths\n    parser.add_argument('-i', '--input-path', type=str, default='./input', \n                            help='Path to inputs.')\n    parser.add_argument('-o', '--output-path', type=str, default='./output', \n                            help='Path to outputs.')\n    parser.add_argument('--save-output', dest='save_output', action='store_true', \n                            help='Save output depth map.')\n    parser.set_defaults(save_output=False)\n\n    args = parser.parse_args()\n    print(args)\n    \n    run(\n        args.depth_predictor, \n        args.nsamples, \n        args.sml_model_path, \n        args.min_pred,\n        args.max_pred, \n        args.min_depth, \n        args.max_depth,\n        args.input_path,\n        args.output_path,\n        args.save_output\n    )"
  }
]