Repository: isl-org/VI-Depth
Branch: main
Commit: 2b4cf6eab369
Files: 16
Total size: 64.0 KB
Directory structure:
gitextract_9b323q_9/
├── .gitignore
├── LICENSE
├── README.md
├── environment.yaml
├── evaluate.py
├── metrics.py
├── modules/
│ ├── estimator.py
│ ├── interpolator.py
│ └── midas/
│ ├── base_model.py
│ ├── blocks.py
│ ├── midas_net_custom.py
│ ├── normalization.py
│ ├── transforms.py
│ └── utils.py
├── pipeline.py
└── run.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
.python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# I/O
input/
output/
weights/
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2023 Intelligent Systems Lab Org
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
# PROJECT NOT UNDER ACTIVE MANAGEMENT #
This project will no longer be maintained by Intel.
Intel will not provide or guarantee development of or support for this project, including but not limited to, maintenance, bug fixes, new releases or updates.
Patches to this project are no longer accepted by Intel.
If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the community, please create your own fork of the project.
# Monocular Visual-Inertial Depth Estimation
This repository contains code and models for our paper:
> [Monocular Visual-Inertial Depth Estimation](https://arxiv.org/abs/2303.12134)
> Diana Wofk, René Ranftl, Matthias Müller, Vladlen Koltun
For a quick overview of the work you can watch the [short talk](https://youtu.be/Ja4Nic3YYCg) and [teaser](https://youtu.be/IMwiKwSpshQ) on YouTube.
## Introduction

We present a visual-inertial depth estimation pipeline that integrates monocular depth estimation and visual-inertial odometry to produce dense depth estimates with metric scale. Our approach consists of three stages: (1) input processing, where RGB and IMU data feed into monocular depth estimation alongside visual-inertial odometry, (2) global scale and shift alignment, where monocular depth estimates are fitted to sparse depth from VIO in a least-squares manner, and (3) learning-based dense scale alignment, where globally-aligned depth is locally realigned using a dense scale map regressed by the ScaleMapLearner (SML). The images at the bottom in the diagram above illustrate a VOID sample being processed through our pipeline; from left to right: the input RGB, ground truth depth, sparse depth from VIO, globally-aligned depth, scale map scaffolding, dense scale map regressed by SML, final depth output.

## Setup
1) Setup dependencies:
```shell
conda env create -f environment.yaml
conda activate vi-depth
```
2) Pick one or more ScaleMapLearner (SML) models and download the corresponding weights to the `weights` folder.
| Depth Predictor | SML on VOID 150 | SML on VOID 500 | SML on VOID 1500 |
| :--- | :----: | :----: | :----: |
| DPT-BEiT-Large | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_beit_large_512.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_beit_large_512.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_beit_large_512.nsamples.1500.ckpt) |
| DPT-SwinV2-Large | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_large_384.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_large_384.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_large_384.nsamples.1500.ckpt) |
| DPT-Large | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_large.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_large.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_large.nsamples.1500.ckpt) |
| DPT-Hybrid | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_hybrid.nsamples.150.ckpt)* | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_hybrid.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_hybrid.nsamples.1500.ckpt) |
| DPT-SwinV2-Tiny | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_tiny_256.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_tiny_256.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_tiny_256.nsamples.1500.ckpt) |
| DPT-LeViT | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_levit_224.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_levit_224.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_levit_224.nsamples.1500.ckpt) |
| MiDaS-small | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.midas_small.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.midas_small.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.midas_small.nsamples.1500.ckpt) |
*Also available with pretraining on TartanAir: [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_hybrid.nsamples.150.pretrained.ckpt)
## Inference
1) Place inputs into the `input` folder. An input image and corresponding sparse metric depth map are expected:
```bash
input
├── image # RGB image
│ ├── <timestamp>.png
│ └── ...
└── sparse_depth # sparse metric depth map
├── <timestamp>.png # as 16b PNG
└── ...
```
The `load_sparse_depth` function in `run.py` may need to be modified depending on the format in which sparse depth is stored. By default, the depth storage method [used in the VOID dataset](https://github.com/alexklwong/void-dataset/blob/master/src/data_utils.py) is assumed.
2) Run the `run.py` script as follows:
```bash
DEPTH_PREDICTOR="dpt_beit_large_512"
NSAMPLES=150
SML_MODEL_PATH="weights/sml_model.dpredictor.${DEPTH_PREDICTOR}.nsamples.${NSAMPLES}.ckpt"
python run.py -dp $DEPTH_PREDICTOR -ns $NSAMPLES -sm $SML_MODEL_PATH --save-output
```
3) The `--save-output` flag enables saving outputs to the `output` folder. By default, the following outputs will be saved per sample:
```bash
output
├── ga_depth # metric depth map after global alignment
│ ├── <timestamp>.pfm # as PFM
│ ├── <timestamp>.png # as 16b PNG
│ └── ...
└── sml_depth # metric depth map output by SML
├── <timestamp>.pfm # as PFM
├── <timestamp>.png # as 16b PNG
└── ...
```
## Evaluation
Models provided in this repo were trained on the VOID dataset.
1) Download the VOID dataset following [the instructions in the VOID dataset repo](https://github.com/alexklwong/void-dataset#downloading-void).
2) To evaluate on VOID test sets, run the `evaluate.py` script as follows:
```bash
DATASET_PATH="/path/to/void_release/"
DEPTH_PREDICTOR="dpt_beit_large_512"
NSAMPLES=150
SML_MODEL_PATH="weights/sml_model.dpredictor.${DEPTH_PREDICTOR}.nsamples.${NSAMPLES}.ckpt"
python evaluate.py -ds $DATASET_PATH -dp $DEPTH_PREDICTOR -ns $NSAMPLES -sm $SML_MODEL_PATH
```
Results for the example shown above:
```
Averaging metrics for globally-aligned depth over 800 samples
Averaging metrics for SML-aligned depth over 800 samples
+---------+----------+----------+
| metric | GA Only | GA+SML |
+---------+----------+----------+
| RMSE | 191.36 | 142.85 |
| MAE | 115.84 | 76.95 |
| AbsRel | 0.069 | 0.046 |
| iRMSE | 72.70 | 57.13 |
| iMAE | 49.32 | 34.25 |
| iAbsRel | 0.071 | 0.048 |
+---------+----------+----------+
```
To evaluate on VOID test sets at different densities (void_150, void_500, void_1500), change the `NSAMPLES` argument above accordingly.
## Citation
If you reference our work, please consider citing the following:
```bib
@inproceedings{wofk2023videpth,
author = {{Wofk, Diana and Ranftl, Ren\'{e} and M{\"u}ller, Matthias and Koltun, Vladlen}},
title = {{Monocular Visual-Inertial Depth Estimation}},
booktitle = {{IEEE International Conference on Robotics and Automation (ICRA)}},
year = {{2023}}
}
```
## Acknowledgements
Our work builds on and uses code from [MiDaS](https://github.com/isl-org/MiDaS), [timm](https://github.com/rwightman/pytorch-image-models), and [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/). We'd like to thank the authors for making these libraries and frameworks available.
_Last revisited: August 2024_
================================================
FILE: environment.yaml
================================================
name: vi-depth
channels:
- pytorch
- defaults
dependencies:
- nvidia::cudatoolkit=11.7
- python=3.10.8
- pytorch::pytorch=1.13.0
- torchvision=0.14.0
- pip=22.3.1
- numpy=1.23.4
- pip:
- opencv-python==4.6.0.66
- scipy==1.10.1
- timm==0.6.12
- pytorch-lightning==1.9.0
- imageio==2.25.0
- prettytable==3.6.0
================================================
FILE: evaluate.py
================================================
import os
import argparse
import torch
import imageio
import numpy as np
from tqdm import tqdm
from PIL import Image
import modules.midas.utils as utils
import pipeline
import metrics
def evaluate(dataset_path, depth_predictor, nsamples, sml_model_path):
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("device: %s" % device)
# ranges for VOID
min_depth, max_depth = 0.2, 5.0
min_pred, max_pred = 0.1, 8.0
# instantiate method
method = pipeline.VIDepth(
depth_predictor, nsamples, sml_model_path,
min_pred, max_pred, min_depth, max_depth, device
)
# get inputs
with open(f"{dataset_path}/void_{nsamples}/test_image.txt") as f:
test_image_list = [line.rstrip() for line in f]
# initialize error aggregators
avg_error_w_int_depth = metrics.ErrorMetricsAverager()
avg_error_w_pred = metrics.ErrorMetricsAverager()
# iterate through inputs list
for i in tqdm(range(len(test_image_list))):
# image
input_image_fp = os.path.join(dataset_path, test_image_list[i])
input_image = utils.read_image(input_image_fp)
# sparse depth
input_sparse_depth_fp = input_image_fp.replace("image", "sparse_depth")
input_sparse_depth = np.array(Image.open(input_sparse_depth_fp), dtype=np.float32) / 256.0
input_sparse_depth[input_sparse_depth <= 0] = 0.0
# sparse depth validity map
validity_map_fp = input_image_fp.replace("image", "validity_map")
validity_map = np.array(Image.open(validity_map_fp), dtype=np.float32)
assert(np.all(np.unique(validity_map) == [0, 256]))
validity_map[validity_map > 0] = 1
# target (ground truth) depth
target_depth_fp = input_image_fp.replace("image", "ground_truth")
target_depth = np.array(Image.open(target_depth_fp), dtype=np.float32) / 256.0
target_depth[target_depth <= 0] = 0.0
# target depth valid/mask
mask = (target_depth < max_depth)
if min_depth is not None:
mask *= (target_depth > min_depth)
target_depth[~mask] = np.inf # set invalid depth
target_depth = 1.0 / target_depth
# run pipeline
output = method.run(input_image, input_sparse_depth, validity_map, device)
# compute error metrics using intermediate (globally aligned) depth
error_w_int_depth = metrics.ErrorMetrics()
error_w_int_depth.compute(
estimate = output["ga_depth"],
target = target_depth,
valid = mask.astype(np.bool),
)
# compute error metrics using SML output depth
error_w_pred = metrics.ErrorMetrics()
error_w_pred.compute(
estimate = output["sml_depth"],
target = target_depth,
valid = mask.astype(np.bool),
)
# accumulate error metrics
avg_error_w_int_depth.accumulate(error_w_int_depth)
avg_error_w_pred.accumulate(error_w_pred)
# compute average error metrics
print("Averaging metrics for globally-aligned depth over {} samples".format(
avg_error_w_int_depth.total_count
))
avg_error_w_int_depth.average()
print("Averaging metrics for SML-aligned depth over {} samples".format(
avg_error_w_pred.total_count
))
avg_error_w_pred.average()
from prettytable import PrettyTable
summary_tb = PrettyTable()
summary_tb.field_names = ["metric", "GA Only", "GA+SML"]
summary_tb.add_row(["RMSE", f"{avg_error_w_int_depth.rmse_avg:7.2f}", f"{avg_error_w_pred.rmse_avg:7.2f}"])
summary_tb.add_row(["MAE", f"{avg_error_w_int_depth.mae_avg:7.2f}", f"{avg_error_w_pred.mae_avg:7.2f}"])
summary_tb.add_row(["AbsRel", f"{avg_error_w_int_depth.absrel_avg:8.3f}", f"{avg_error_w_pred.absrel_avg:8.3f}"])
summary_tb.add_row(["iRMSE", f"{avg_error_w_int_depth.inv_rmse_avg:7.2f}", f"{avg_error_w_pred.inv_rmse_avg:7.2f}"])
summary_tb.add_row(["iMAE", f"{avg_error_w_int_depth.inv_mae_avg:7.2f}", f"{avg_error_w_pred.inv_mae_avg:7.2f}"])
summary_tb.add_row(["iAbsRel", f"{avg_error_w_int_depth.inv_absrel_avg:8.3f}", f"{avg_error_w_pred.inv_absrel_avg:8.3f}"])
print(summary_tb)
if __name__=="__main__":
parser = argparse.ArgumentParser()
parser.add_argument('-ds', '--dataset-path', type=str, default='/path/to/void_release/',
help='Path to VOID release dataset.')
parser.add_argument('-dp', '--depth-predictor', type=str, default='midas_small',
help='Name of depth predictor to use in pipeline.')
parser.add_argument('-ns', '--nsamples', type=int, default=150,
help='Number of sparse metric depth samples available.')
parser.add_argument('-sm', '--sml-model-path', type=str, default='',
help='Path to trained SML model weights.')
args = parser.parse_args()
print(args)
evaluate(
args.dataset_path,
args.depth_predictor,
args.nsamples,
args.sml_model_path,
)
================================================
FILE: metrics.py
================================================
import numpy as np
import torch
def rmse(estimate, target):
return np.sqrt(np.mean((estimate - target) ** 2))
def mae(estimate, target):
return np.mean(np.abs(estimate - target))
def absrel(estimate, target):
return np.mean(np.abs(estimate - target) / target)
def inv_rmse(estimate, target):
return np.sqrt(np.mean((1.0/estimate - 1.0/target) ** 2))
def inv_mae(estimate, target):
return np.mean(np.abs(1.0/estimate - 1.0/target))
def inv_absrel(estimate, target):
return np.mean((np.abs(1.0/estimate - 1.0/target)) / (1.0/target))
class ErrorMetrics(object):
def __init__(self):
# initialize by setting to worst values
self.rmse, self.mae, self.absrel = np.inf, np.inf, np.inf
self.inv_rmse, self.inv_mae, self.inv_absrel = np.inf, np.inf, np.inf
def compute(self, estimate, target, valid):
# apply valid masks
estimate = estimate[valid]
target = target[valid]
# estimate and target will be in inverse space, convert to regular
estimate = 1.0/estimate
target = 1.0/target
# depth error, estimate in meters, convert units to mm
self.rmse = rmse(1000.0*estimate, 1000.0*target)
self.mae = mae(1000.0*estimate, 1000.0*target)
self.absrel = absrel(1000.0*estimate, 1000.0*target)
# inverse depth error, estimate in meters, convert units to 1/km
self.inv_rmse = inv_rmse(0.001*estimate, 0.001*target)
self.inv_mae = inv_mae(0.001*estimate, 0.001*target)
self.inv_absrel = inv_absrel(0.001*estimate, 0.001*target)
class ErrorMetricsAverager(object):
def __init__(self):
# initialize avg accumulators to zero
self.rmse_avg, self.mae_avg, self.absrel_avg = 0, 0, 0
self.inv_rmse_avg, self.inv_mae_avg, self.inv_absrel_avg = 0, 0, 0
self.total_count = 0
def accumulate(self, error_metrics):
# adds to accumulators from ErrorMetrics object
assert isinstance(error_metrics, ErrorMetrics)
self.rmse_avg += error_metrics.rmse
self.mae_avg += error_metrics.mae
self.absrel_avg += error_metrics.absrel
self.inv_rmse_avg += error_metrics.inv_rmse
self.inv_mae_avg += error_metrics.inv_mae
self.inv_absrel_avg += error_metrics.inv_absrel
self.total_count += 1
def average(self):
# print(f"Averaging depth metrics over {self.total_count} samples")
self.rmse_avg = self.rmse_avg / self.total_count
self.mae_avg = self.mae_avg / self.total_count
self.absrel_avg = self.absrel_avg / self.total_count
# print(f"Averaging inv depth metrics over {self.total_count} samples")
self.inv_rmse_avg = self.inv_rmse_avg / self.total_count
self.inv_mae_avg = self.inv_mae_avg / self.total_count
self.inv_absrel_avg = self.inv_absrel_avg / self.total_count
================================================
FILE: modules/estimator.py
================================================
import numpy as np
def compute_scale_and_shift_ls(prediction, target, mask):
# tuple specifying with axes to sum
sum_axes = (0, 1)
# system matrix: A = [[a_00, a_01], [a_10, a_11]]
a_00 = np.sum(mask * prediction * prediction, sum_axes)
a_01 = np.sum(mask * prediction, sum_axes)
a_11 = np.sum(mask, sum_axes)
# right hand side: b = [b_0, b_1]
b_0 = np.sum(mask * prediction * target, sum_axes)
b_1 = np.sum(mask * target, sum_axes)
# solution: x = A^-1 . b = [[a_11, -a_01], [-a_10, a_00]] / (a_00 * a_11 - a_01 * a_10) . b
x_0 = np.zeros_like(b_0)
x_1 = np.zeros_like(b_1)
det = a_00 * a_11 - a_01 * a_01
# A needs to be a positive definite matrix.
valid = det > 0
x_0[valid] = (a_11[valid] * b_0[valid] - a_01[valid] * b_1[valid]) / det[valid]
x_1[valid] = (-a_01[valid] * b_0[valid] + a_00[valid] * b_1[valid]) / det[valid]
return x_0, x_1
class LeastSquaresEstimator(object):
def __init__(self, estimate, target, valid):
self.estimate = estimate
self.target = target
self.valid = valid
# to be computed
self.scale = 1.0
self.shift = 0.0
self.output = None
def compute_scale_and_shift(self):
self.scale, self.shift = compute_scale_and_shift_ls(self.estimate, self.target, self.valid)
def apply_scale_and_shift(self):
self.output = self.estimate * self.scale + self.shift
def clamp_min_max(self, clamp_min=None, clamp_max=None):
if clamp_min is not None:
if clamp_min > 0:
clamp_min_inv = 1.0/clamp_min
self.output[self.output > clamp_min_inv] = clamp_min_inv
assert np.max(self.output) <= clamp_min_inv
else: # divide by zero, so skip
pass
if clamp_max is not None:
clamp_max_inv = 1.0/clamp_max
self.output[self.output < clamp_max_inv] = clamp_max_inv
# print(np.min(self.output), clamp_max_inv)
assert np.min(self.output) >= clamp_max_inv
# check for nonzero range
# assert np.min(self.output) != np.max(self.output)
================================================
FILE: modules/interpolator.py
================================================
import numpy as np
np.set_printoptions(suppress=True)
from scipy.interpolate import griddata
def interpolate_knots(map_size, knot_coords, knot_values, interpolate, fill_corners):
grid_x, grid_y = np.mgrid[0:map_size[0], 0:map_size[1]]
interpolated_map = griddata(
points=knot_coords.T,
values=knot_values,
xi=(grid_y, grid_x),
method=interpolate,
fill_value=1.0)
return interpolated_map
class Interpolator2D(object):
def __init__(self, pred_inv, sparse_depth_inv, valid):
self.pred_inv = pred_inv
self.sparse_depth_inv = sparse_depth_inv
self.valid = valid
self.map_size = np.shape(pred_inv)
self.num_knots = np.sum(valid)
nonzero_y_loc = np.nonzero(valid)[0]
nonzero_x_loc = np.nonzero(valid)[1]
self.knot_coords = np.stack((nonzero_x_loc, nonzero_y_loc))
self.knot_scales = sparse_depth_inv[valid] / pred_inv[valid]
self.knot_shifts = sparse_depth_inv[valid] - pred_inv[valid]
self.knot_list = []
for i in range(self.num_knots):
self.knot_list.append((int(self.knot_coords[0,i]), int(self.knot_coords[1,i])))
# to be computed
self.interpolated_map = None
self.confidence_map = None
self.output = None
def generate_interpolated_scale_map(self, interpolate_method, fill_corners=False):
self.interpolated_scale_map = interpolate_knots(
map_size=self.map_size,
knot_coords=self.knot_coords,
knot_values=self.knot_scales,
interpolate=interpolate_method,
fill_corners=fill_corners
).astype(np.float32)
================================================
FILE: modules/midas/base_model.py
================================================
import torch
class BaseModel(torch.nn.Module):
def load(self, path):
"""Load model from file.
Args:
path (str): file path
"""
parameters = torch.load(path, map_location=torch.device('cpu'))
if "optimizer" in parameters:
parameters = parameters["model"]
if "state_dict" in parameters:
state_dict = parameters["state_dict"]
new_state_dict = {}
for key in state_dict.keys():
if key[0:6] == "model.":
new_state_dict[key[6:]] = state_dict[key]
self.load_state_dict(new_state_dict)
else:
self.load_state_dict(parameters)
================================================
FILE: modules/midas/blocks.py
================================================
import torch
import torch.nn as nn
def _make_encoder(backbone, features, use_pretrained, groups=1, expand=False, exportable=True):
if backbone == "efficientnet_lite3":
pretrained = _make_pretrained_efficientnet_lite3(use_pretrained, exportable=exportable)
scratch = _make_scratch([32, 48, 136, 384], features, groups=groups, expand=expand) # efficientnet_lite3
else:
print(f"Backbone '{backbone}' not implemented")
assert False
return pretrained, scratch
def _make_scratch(in_shape, out_shape, groups=1, expand=False):
scratch = nn.Module()
out_shape1 = out_shape
out_shape2 = out_shape
out_shape3 = out_shape
out_shape4 = out_shape
if expand==True:
out_shape1 = out_shape
out_shape2 = out_shape*2
out_shape3 = out_shape*4
out_shape4 = out_shape*8
scratch.layer1_rn = nn.Conv2d(
in_shape[0], out_shape1, kernel_size=3, stride=1, padding=1, bias=False, groups=groups
)
scratch.layer2_rn = nn.Conv2d(
in_shape[1], out_shape2, kernel_size=3, stride=1, padding=1, bias=False, groups=groups
)
scratch.layer3_rn = nn.Conv2d(
in_shape[2], out_shape3, kernel_size=3, stride=1, padding=1, bias=False, groups=groups
)
scratch.layer4_rn = nn.Conv2d(
in_shape[3], out_shape4, kernel_size=3, stride=1, padding=1, bias=False, groups=groups
)
return scratch
def _make_pretrained_efficientnet_lite3(use_pretrained, exportable=False):
efficientnet = torch.hub.load(
"rwightman/gen-efficientnet-pytorch",
"tf_efficientnet_lite3",
pretrained=use_pretrained,
exportable=exportable
)
return _make_efficientnet_backbone(efficientnet)
def _make_efficientnet_backbone(effnet):
pretrained = nn.Module()
pretrained.layer1 = nn.Sequential(
effnet.conv_stem, effnet.bn1, effnet.act1, *effnet.blocks[0:2]
)
pretrained.layer2 = nn.Sequential(*effnet.blocks[2:3])
pretrained.layer3 = nn.Sequential(*effnet.blocks[3:5])
pretrained.layer4 = nn.Sequential(*effnet.blocks[5:9])
return pretrained
class ResidualConvUnit_custom(nn.Module):
"""Residual convolution module.
"""
def __init__(self, features, activation, bn):
"""Init.
Args:
features (int): number of features
"""
super().__init__()
self.bn = bn
self.groups=1
self.conv1 = nn.Conv2d(
features, features, kernel_size=3, stride=1, padding=1, bias=True, groups=self.groups
)
self.conv2 = nn.Conv2d(
features, features, kernel_size=3, stride=1, padding=1, bias=True, groups=self.groups
)
if self.bn==True:
self.bn1 = nn.BatchNorm2d(features)
self.bn2 = nn.BatchNorm2d(features)
self.activation = activation
self.skip_add = nn.quantized.FloatFunctional()
def forward(self, x):
"""Forward pass.
Args:
x (tensor): input
Returns:
tensor: output
"""
out = self.activation(x)
out = self.conv1(out)
if self.bn==True:
out = self.bn1(out)
out = self.activation(out)
out = self.conv2(out)
if self.bn==True:
out = self.bn2(out)
if self.groups > 1:
out = self.conv_merge(out)
return self.skip_add.add(out, x)
class FeatureFusionBlock_custom(nn.Module):
"""Feature fusion block.
"""
def __init__(self, features, activation, deconv=False, bn=False, expand=False, align_corners=True):
"""Init.
Args:
features (int): number of features
"""
super(FeatureFusionBlock_custom, self).__init__()
self.deconv = deconv
self.align_corners = align_corners
self.groups=1
self.expand = expand
out_features = features
if self.expand==True:
out_features = features//2
self.out_conv = nn.Conv2d(features, out_features, kernel_size=1, stride=1, padding=0, bias=True, groups=1)
self.resConfUnit1 = ResidualConvUnit_custom(features, activation, bn)
self.resConfUnit2 = ResidualConvUnit_custom(features, activation, bn)
self.skip_add = nn.quantized.FloatFunctional()
def forward(self, *xs):
"""Forward pass.
Returns:
tensor: output
"""
output = xs[0]
if len(xs) == 2:
res = self.resConfUnit1(xs[1])
output = self.skip_add.add(output, res)
output = self.resConfUnit2(output)
output = nn.functional.interpolate(
output, scale_factor=2, mode="bilinear", align_corners=self.align_corners
)
output = self.out_conv(output)
return output
class OutputConv(nn.Module):
"""Output conv block.
"""
def __init__(self, features, groups, activation, non_negative):
super(OutputConv, self).__init__()
self.output_conv = nn.Sequential(
nn.Conv2d(features, features//2, kernel_size=3, stride=1, padding=1, groups=groups),
nn.Upsample(scale_factor=2, mode="bilinear"),
nn.Conv2d(features//2, 32, kernel_size=3, stride=1, padding=1),
activation,
nn.Conv2d(32, 1, kernel_size=1, stride=1, padding=0),
nn.ReLU(True) if non_negative else nn.Identity(),
nn.Identity(),
)
def forward(self, x):
return self.output_conv(x)
================================================
FILE: modules/midas/midas_net_custom.py
================================================
"""MidashNet: Network for monocular depth estimation trained by mixing several datasets.
This file contains code that is adapted from
https://github.com/thomasjpfan/pytorch_refinenet/blob/master/pytorch_refinenet/refinenet/refinenet_4cascade.py
"""
import torch
import torch.nn as nn
from torch.nn import functional as F
from .base_model import BaseModel
from .blocks import FeatureFusionBlock_custom, _make_encoder, OutputConv
def weights_init(m):
import math
# initialize from normal (Gaussian) distribution
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2.0 / n))
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
class MidasNet_small_videpth(BaseModel):
"""Network for monocular depth estimation.
"""
def __init__(self, path=None, features=64, backbone="efficientnet_lite3", non_negative=False, exportable=True, channels_last=False, align_corners=True,
blocks={'expand': True}, in_channels=2, regress='r', min_pred=None, max_pred=None):
"""Init.
Args:
path (str, optional): Path to saved model. Defaults to None.
features (int, optional): Number of features. Defaults to 64.
backbone (str, optional): Backbone network for encoder. Defaults to efficientnet_lite3.
"""
print("Loading weights: ", path)
super(MidasNet_small_videpth, self).__init__()
use_pretrained = False if path else True
self.channels_last = channels_last
self.blocks = blocks
self.backbone = backbone
self.groups = 1
# for model output
self.regress = regress
self.min_pred = min_pred
self.max_pred = max_pred
features1=features
features2=features
features3=features
features4=features
self.expand = False
if "expand" in self.blocks and self.blocks['expand'] == True:
self.expand = True
features1=features
features2=features*2
features3=features*4
features4=features*8
self.first = nn.Sequential(
nn.Conv2d(in_channels, 3, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(3),
nn.ReLU(inplace=True)
)
self.first.apply(weights_init)
self.pretrained, self.scratch = _make_encoder(self.backbone, features, use_pretrained, groups=self.groups, expand=self.expand, exportable=exportable)
self.scratch.activation = nn.ReLU(False)
self.scratch.refinenet4 = FeatureFusionBlock_custom(features4, self.scratch.activation, deconv=False, bn=False, expand=self.expand, align_corners=align_corners)
self.scratch.refinenet3 = FeatureFusionBlock_custom(features3, self.scratch.activation, deconv=False, bn=False, expand=self.expand, align_corners=align_corners)
self.scratch.refinenet2 = FeatureFusionBlock_custom(features2, self.scratch.activation, deconv=False, bn=False, expand=self.expand, align_corners=align_corners)
self.scratch.refinenet1 = FeatureFusionBlock_custom(features1, self.scratch.activation, deconv=False, bn=False, align_corners=align_corners)
self.scratch.output_conv = OutputConv(features, self.groups, self.scratch.activation, non_negative)
if path:
self.load(path)
def forward(self, x, d):
"""Forward pass.
Args:
x (tensor): input data (image)
d (tensor): unalterated input depth
Returns:
tensor: depth
"""
if self.channels_last==True:
print("self.channels_last = ", self.channels_last)
x.contiguous(memory_format=torch.channels_last)
layer_0 = self.first(x)
layer_1 = self.pretrained.layer1(layer_0)
layer_2 = self.pretrained.layer2(layer_1)
layer_3 = self.pretrained.layer3(layer_2)
layer_4 = self.pretrained.layer4(layer_3)
layer_1_rn = self.scratch.layer1_rn(layer_1)
layer_2_rn = self.scratch.layer2_rn(layer_2)
layer_3_rn = self.scratch.layer3_rn(layer_3)
layer_4_rn = self.scratch.layer4_rn(layer_4)
path_4 = self.scratch.refinenet4(layer_4_rn)
path_3 = self.scratch.refinenet3(path_4, layer_3_rn)
path_2 = self.scratch.refinenet2(path_3, layer_2_rn)
path_1 = self.scratch.refinenet1(path_2, layer_1_rn)
out = self.scratch.output_conv(path_1)
scales = F.relu(1.0 + out)
pred = d * scales
# clamp pred to min and max
if self.min_pred is not None:
min_pred_inv = 1.0/self.min_pred
pred[pred > min_pred_inv] = min_pred_inv
if self.max_pred is not None:
max_pred_inv = 1.0/self.max_pred
pred[pred < max_pred_inv] = max_pred_inv
# also return scales
return (pred, scales)
================================================
FILE: modules/midas/normalization.py
================================================
VOID_INTERMEDIATE = {
"dpt_beit_large_512" : {
"void_150" : {
"mean" : {"int_depth" : 0.730, "int_scales" : 0.380},
"std" : {"int_depth" : 0.226, "int_scales" : 0.102},
},
"void_500" : {
"mean" : {"int_depth" : 0.736, "int_scales" : 0.366},
"std" : {"int_depth" : 0.232, "int_scales" : 0.099},
},
"void_1500" : {
"mean" : {"int_depth" : 0.730, "int_scales" : 0.355},
"std" : {"int_depth" : 0.232, "int_scales" : 0.096},
},
},
"dpt_swin2_large_384" : {
"void_150" : {
"mean" : {"int_depth" : 0.730, "int_scales" : 0.402},
"std" : {"int_depth" : 0.219, "int_scales" : 0.107},
},
"void_500" : {
"mean" : {"int_depth" : 0.736, "int_scales" : 0.389},
"std" : {"int_depth" : 0.224, "int_scales" : 0.106},
},
"void_1500" : {
"mean" : {"int_depth" : 0.730, "int_scales" : 0.377},
"std" : {"int_depth" : 0.226, "int_scales" : 0.103},
},
},
"dpt_large" : {
"void_150" : {
"mean" : {"int_depth" : 0.729, "int_scales" : 0.403},
"std" : {"int_depth" : 0.213, "int_scales" : 0.116},
},
"void_500" : {
"mean" : {"int_depth" : 0.735, "int_scales" : 0.390},
"std" : {"int_depth" : 0.219, "int_scales" : 0.116},
},
"void_1500" : {
"mean" : {"int_depth" : 0.730, "int_scales" : 0.380},
"std" : {"int_depth" : 0.221, "int_scales" : 0.116},
},
},
"dpt_hybrid": {
"void_150" : {
"mean" : {"int_depth" : 0.729, "int_scales" : 0.404},
"std" : {"int_depth" : 0.210, "int_scales" : 0.117},
},
"void_500" : {
"mean" : {"int_depth" : 0.735, "int_scales" : 0.392},
"std" : {"int_depth" : 0.215, "int_scales" : 0.118},
},
"void_1500" : {
"mean" : {"int_depth" : 0.730, "int_scales" : 0.381},
"std" : {"int_depth" : 0.218, "int_scales" : 0.117},
},
},
"dpt_swin2_tiny_256" : {
"void_150" : {
"mean" : {"int_depth" : 0.735, "int_scales" : 0.419},
"std" : {"int_depth" : 0.207, "int_scales" : 0.122},
},
"void_500" : {
"mean" : {"int_depth" : 0.741, "int_scales" : 0.406},
"std" : {"int_depth" : 0.212, "int_scales" : 0.124},
},
"void_1500" : {
"mean" : {"int_depth" : 0.733, "int_scales" : 0.396},
"std" : {"int_depth" : 0.213, "int_scales" : 0.125},
},
},
"dpt_levit_224" : {
"void_150" : {
"mean" : {"int_depth" : 0.734, "int_scales" : 0.421},
"std" : {"int_depth" : 0.198, "int_scales" : 0.129},
},
"void_500" : {
"mean" : {"int_depth" : 0.740, "int_scales" : 0.410},
"std" : {"int_depth" : 0.202, "int_scales" : 0.134},
},
"void_1500" : {
"mean" : {"int_depth" : 0.734, "int_scales" : 0.400},
"std" : {"int_depth" : 0.204, "int_scales" : 0.137},
},
},
"midas_small" : {
"void_150" : {
"mean" : {"int_depth" : 0.723, "int_scales" : 0.402},
"std" : {"int_depth" : 0.190, "int_scales" : 0.132},
},
"void_500" : {
"mean" : {"int_depth" : 0.731, "int_scales" : 0.393},
"std" : {"int_depth" : 0.196, "int_scales" : 0.136},
},
"void_1500" : {
"mean" : {"int_depth" : 0.728, "int_scales" : 0.385},
"std" : {"int_depth" : 0.199, "int_scales" : 0.140},
},
},
}
================================================
FILE: modules/midas/transforms.py
================================================
import numpy as np
import cv2
import math
import torch
import torchvision.transforms as transforms
from modules.midas.utils import normalize_unit_range
import modules.midas.normalization as normalization
class Resize(object):
"""Resize sample to given size (width, height).
"""
def __init__(
self,
width,
height,
resize_target=True,
keep_aspect_ratio=False,
ensure_multiple_of=1,
resize_method="lower_bound",
image_interpolation_method=cv2.INTER_AREA,
):
"""Init.
Args:
width (int): desired output width
height (int): desired output height
resize_target (bool, optional):
True: Resize the full sample (image, mask, target).
False: Resize image only.
Defaults to True.
keep_aspect_ratio (bool, optional):
True: Keep the aspect ratio of the input sample.
Output sample might not have the given width and height, and
resize behaviour depends on the parameter 'resize_method'.
Defaults to False.
ensure_multiple_of (int, optional):
Output width and height is constrained to be multiple of this parameter.
Defaults to 1.
resize_method (str, optional):
"lower_bound": Output will be at least as large as the given size.
"upper_bound": Output will be at max as large as the given size. (Output size might be smaller than given size.)
"minimal": Scale as least as possible. (Output size might be smaller than given size.)
Defaults to "lower_bound".
"""
self.__width = width
self.__height = height
self.__resize_target = resize_target
self.__keep_aspect_ratio = keep_aspect_ratio
self.__multiple_of = ensure_multiple_of
self.__resize_method = resize_method
self.__image_interpolation_method = image_interpolation_method
def constrain_to_multiple_of(self, x, min_val=0, max_val=None):
y = (np.round(x / self.__multiple_of) * self.__multiple_of).astype(int)
if max_val is not None and y > max_val:
y = (np.floor(x / self.__multiple_of) * self.__multiple_of).astype(int)
if y < min_val:
y = (np.ceil(x / self.__multiple_of) * self.__multiple_of).astype(int)
return y
def get_size(self, width, height):
# determine new height and width
scale_height = self.__height / height
scale_width = self.__width / width
if self.__keep_aspect_ratio:
if self.__resize_method == "lower_bound":
# scale such that output size is lower bound
if scale_width > scale_height:
# fit width
scale_height = scale_width
else:
# fit height
scale_width = scale_height
elif self.__resize_method == "upper_bound":
# scale such that output size is upper bound
if scale_width < scale_height:
# fit width
scale_height = scale_width
else:
# fit height
scale_width = scale_height
elif self.__resize_method == "minimal":
# scale as least as possbile
if abs(1 - scale_width) < abs(1 - scale_height):
# fit width
scale_height = scale_width
else:
# fit height
scale_width = scale_height
else:
raise ValueError(
f"resize_method {self.__resize_method} not implemented"
)
if self.__resize_method == "lower_bound":
new_height = self.constrain_to_multiple_of(
scale_height * height, min_val=self.__height
)
new_width = self.constrain_to_multiple_of(
scale_width * width, min_val=self.__width
)
elif self.__resize_method == "upper_bound":
new_height = self.constrain_to_multiple_of(
scale_height * height, max_val=self.__height
)
new_width = self.constrain_to_multiple_of(
scale_width * width, max_val=self.__width
)
elif self.__resize_method == "minimal":
new_height = self.constrain_to_multiple_of(scale_height * height)
new_width = self.constrain_to_multiple_of(scale_width * width)
else:
raise ValueError(f"resize_method {self.__resize_method} not implemented")
return (new_width, new_height)
def __call__(self, sample):
width, height = self.get_size(
sample["image"].shape[1], sample["image"].shape[0]
)
# resize sample
for item in sample.keys():
interpolation_method = self.__image_interpolation_method
sample[item] = cv2.resize(
sample[item],
(width, height),
interpolation=interpolation_method,
)
if self.__resize_target:
if "depth" in sample:
sample["depth"] = cv2.resize(
sample["depth"],
(width, height),
interpolation=cv2.INTER_NEAREST
)
if "mask" in sample:
sample["mask"] = cv2.resize(
sample["mask"].astype(np.float32),
(width, height),
interpolation=cv2.INTER_NEAREST,
)
sample["mask"] = sample["mask"].astype(bool)
return sample
class NormalizeImage(object):
"""Normalize image by given mean and std.
"""
def __init__(self, mean, std):
self.__mean = mean
self.__std = std
def __call__(self, sample):
sample["image"] = (sample["image"] - self.__mean) / self.__std
return sample
class NormalizeIntermediate(object):
"""Normalize intermediate data by given mean and std.
"""
def __init__(self, mean, std):
self.__int_depth_mean = mean["int_depth"]
self.__int_depth_std = std["int_depth"]
self.__int_scales_mean = mean["int_scales"]
self.__int_scales_std = std["int_scales"]
def __call__(self, sample):
if "int_depth" in sample and sample["int_depth"] is not None:
sample["int_depth"] = (sample["int_depth"] - self.__int_depth_mean) / self.__int_depth_std
if "int_scales" in sample and sample["int_scales"] is not None:
sample["int_scales"] = (sample["int_scales"] - self.__int_scales_mean) / self.__int_scales_std
return sample
class PrepareForNet(object):
"""Prepare sample for usage as network input.
"""
def __init__(self):
pass
def __call__(self, sample):
for item in sample.keys():
if sample[item] is None:
pass
elif item == "image":
image = np.transpose(sample["image"], (2, 0, 1))
sample["image"] = np.ascontiguousarray(image).astype(np.float32)
else:
array = sample[item].astype(np.float32)
array = np.expand_dims(array, axis=0) # add channel dim
sample[item] = np.ascontiguousarray(array)
return sample
class Tensorize(object):
"""Convert sample to tensor.
"""
def __init__(self):
pass
def __call__(self, sample):
for item in sample.keys():
if sample[item] is None:
pass
else:
# before tensorizing, verify that data is clean
assert not np.any(np.isnan(sample[item]))
sample[item] = torch.Tensor(sample[item])
return sample
def get_transforms(depth_predictor, sparsifier, nsamples):
image_mean_dict = {
"dpt_beit_large_512" : [0.5, 0.5, 0.5],
"dpt_swin2_large_384" : [0.5, 0.5, 0.5],
"dpt_large" : [0.5, 0.5, 0.5],
"dpt_hybrid" : [0.5, 0.5, 0.5],
"dpt_swin2_tiny_256" : [0.5, 0.5, 0.5],
"dpt_levit_224" : [0.5, 0.5, 0.5],
"midas_small" : [0.485, 0.456, 0.406],
}
image_std_dict = {
"dpt_beit_large_512" : [0.5, 0.5, 0.5],
"dpt_swin2_large_384" : [0.5, 0.5, 0.5],
"dpt_large" : [0.5, 0.5, 0.5],
"dpt_hybrid" : [0.5, 0.5, 0.5],
"dpt_swin2_tiny_256" : [0.5, 0.5, 0.5],
"dpt_levit_224" : [0.5, 0.5, 0.5],
"midas_small" : [0.229, 0.224, 0.225],
}
resize_method_dict = {
"dpt_beit_large_512" : "minimal",
"dpt_swin2_large_384" : "minimal",
"dpt_large" : "minimal",
"dpt_hybrid" : "minimal",
"dpt_swin2_tiny_256" : "minimal",
"dpt_levit_224" : "minimal",
"midas_small" : "upper_bound",
}
resize_dict = {
"dpt_beit_large_512" : 384,
"dpt_swin2_large_384" : 384,
"dpt_large" : 384,
"dpt_hybrid" : 384,
"dpt_swin2_tiny_256" : 256,
"dpt_levit_224" : 224,
"midas_small" : 384,
}
keep_aspect_ratio = True
if "swin2" in depth_predictor or "levit" in depth_predictor:
keep_aspect_ratio = False
depth_model_transform_steps = [
Resize(
width=resize_dict[depth_predictor],
height=resize_dict[depth_predictor],
resize_target=False,
keep_aspect_ratio=keep_aspect_ratio,
ensure_multiple_of=32,
resize_method=resize_method_dict[depth_predictor],
image_interpolation_method=cv2.INTER_CUBIC,
),
NormalizeImage(
mean=image_mean_dict[depth_predictor],
std=image_std_dict[depth_predictor]
),
PrepareForNet(),
Tensorize(),
]
sml_model_transform_steps = [
Resize(
width=384,
height=384,
resize_target=False,
keep_aspect_ratio=True,
ensure_multiple_of=32,
resize_method=resize_method_dict["midas_small"],
image_interpolation_method=cv2.INTER_CUBIC,
),
NormalizeIntermediate(
mean=normalization.VOID_INTERMEDIATE[depth_predictor][f"{sparsifier}_{nsamples}"]["mean"],
std=normalization.VOID_INTERMEDIATE[depth_predictor][f"{sparsifier}_{nsamples}"]["std"],
),
PrepareForNet(),
Tensorize(),
]
return {
"depth_model" : transforms.Compose(depth_model_transform_steps),
"sml_model" : transforms.Compose(sml_model_transform_steps),
}
================================================
FILE: modules/midas/utils.py
================================================
"""Utils for monoDepth.
"""
import sys
import re
import numpy as np
import cv2
import torch
def read_pfm(path):
"""Read pfm file.
Args:
path (str): path to file
Returns:
tuple: (data, scale)
"""
with open(path, "rb") as file:
color = None
width = None
height = None
scale = None
endian = None
header = file.readline().rstrip()
if header.decode("ascii") == "PF":
color = True
elif header.decode("ascii") == "Pf":
color = False
else:
raise Exception("Not a PFM file: " + path)
dim_match = re.match(r"^(\d+)\s(\d+)\s$", file.readline().decode("ascii"))
if dim_match:
width, height = list(map(int, dim_match.groups()))
else:
raise Exception("Malformed PFM header.")
scale = float(file.readline().decode("ascii").rstrip())
if scale < 0:
# little-endian
endian = "<"
scale = -scale
else:
# big-endian
endian = ">"
data = np.fromfile(file, endian + "f")
shape = (height, width, 3) if color else (height, width)
data = np.reshape(data, shape)
data = np.flipud(data)
return data, scale
def write_pfm(path, image, scale=1):
"""Write pfm file.
Args:
path (str): pathto file
image (array): data
scale (int, optional): Scale. Defaults to 1.
"""
with open(path, "wb") as file:
color = None
if image.dtype.name != "float32":
raise Exception("Image dtype must be float32.")
image = np.flipud(image)
if len(image.shape) == 3 and image.shape[2] == 3: # color image
color = True
elif (
len(image.shape) == 2 or len(image.shape) == 3 and image.shape[2] == 1
): # greyscale
color = False
else:
raise Exception("Image must have H x W x 3, H x W x 1 or H x W dimensions.")
file.write("PF\n" if color else "Pf\n".encode())
file.write("%d %d\n".encode() % (image.shape[1], image.shape[0]))
endian = image.dtype.byteorder
if endian == "<" or endian == "=" and sys.byteorder == "little":
scale = -scale
file.write("%f\n".encode() % scale)
image.tofile(file)
def read_image(path):
"""Read image and output RGB image (0-1).
Args:
path (str): path to file
Returns:
array: RGB image (0-1)
"""
img = cv2.imread(path)
if img.ndim == 2:
img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) / 255.0
return img
def resize_image(img):
"""Resize image and make it fit for network.
Args:
img (array): image
Returns:
tensor: data ready for network
"""
height_orig = img.shape[0]
width_orig = img.shape[1]
if width_orig > height_orig:
scale = width_orig / 384
else:
scale = height_orig / 384
height = (np.ceil(height_orig / scale / 32) * 32).astype(int)
width = (np.ceil(width_orig / scale / 32) * 32).astype(int)
img_resized = cv2.resize(img, (width, height), interpolation=cv2.INTER_AREA)
img_resized = (
torch.from_numpy(np.transpose(img_resized, (2, 0, 1))).contiguous().float()
)
img_resized = img_resized.unsqueeze(0)
return img_resized
def resize_depth(depth, width, height):
"""Resize depth map and bring to CPU (numpy).
Args:
depth (tensor): depth
width (int): image width
height (int): image height
Returns:
array: processed depth
"""
depth = torch.squeeze(depth[0, :, :, :]).to("cpu")
depth_resized = cv2.resize(
depth.numpy(), (width, height), interpolation=cv2.INTER_CUBIC
)
return depth_resized
def write_depth(path, depth, bits=1):
"""Write depth map to pfm and png file.
Args:
path (str): filepath without extension
depth (array): depth
"""
write_pfm(path + ".pfm", depth.astype(np.float32))
depth_min = depth.min()
depth_max = depth.max()
max_val = (2**(8*bits))-1
if depth_max - depth_min > np.finfo("float").eps:
out = max_val * (depth - depth_min) / (depth_max - depth_min)
else:
out = np.zeros(depth.shape, dtype=depth.type)
if bits == 1:
cv2.imwrite(path + ".png", out.astype("uint8"))
elif bits == 2:
cv2.imwrite(path + ".png", out.astype("uint16"))
return
def write_png(path, array, bits=2):
"""Write array to png file.
Args:
path (str): filepath without extension
array (array): array to be saved
"""
array_min = np.min(array)
array_max = np.max(array)
max_val = (2**(8*bits))-1
if array_max - array_min > np.finfo("float").eps:
out = max_val * (array - array_min) / (array_max - array_min)
else:
print(f"zero array not being saved at {path}")
return
if bits == 1:
cv2.imwrite(path + ".png", out.astype("uint8"))
elif bits == 2:
cv2.imwrite(path + ".png", out.astype("uint16"))
return
def normalize_unit_range(data):
"""Normalize data array to [0, 1] range.
Args:
data (array): input array
Returns:
array: normalized array
"""
if np.max(data) - np.min(data) > np.finfo("float").eps:
normalized = (data - np.min(data)) / (np.max(data) - np.min(data))
else:
raise ValueError("cannot normalize array, max-min range is 0")
return normalized
================================================
FILE: pipeline.py
================================================
import torch
import numpy as np
from modules.midas.midas_net_custom import MidasNet_small_videpth
from modules.estimator import LeastSquaresEstimator
from modules.interpolator import Interpolator2D
import modules.midas.transforms as transforms
import modules.midas.utils as utils
class VIDepth(object):
def __init__(self, depth_predictor, nsamples, sml_model_path,
min_pred, max_pred, min_depth, max_depth, device):
# get transforms
model_transforms = transforms.get_transforms(depth_predictor, "void", str(nsamples))
self.depth_model_transform = model_transforms["depth_model"]
self.ScaleMapLearner_transform = model_transforms["sml_model"]
# define depth model
if depth_predictor == "dpt_beit_large_512":
self.DepthModel = torch.hub.load("intel-isl/MiDaS", "DPT_BEiT_L_512")
elif depth_predictor == "dpt_swin2_large_384":
self.DepthModel = torch.hub.load("intel-isl/MiDaS", "DPT_SwinV2_L_384")
elif depth_predictor == "dpt_large":
self.DepthModel = torch.hub.load("intel-isl/MiDaS", "DPT_Large")
elif depth_predictor == "dpt_hybrid":
self.DepthModel = torch.hub.load("intel-isl/MiDaS", "DPT_Hybrid")
elif depth_predictor == "dpt_swin2_tiny_256":
self.DepthModel = torch.hub.load("intel-isl/MiDaS", "DPT_SwinV2_T_256")
elif depth_predictor == "dpt_levit_224":
self.DepthModel = torch.hub.load("intel-isl/MiDaS", "DPT_LeViT_224")
elif depth_predictor == "midas_small":
self.DepthModel = torch.hub.load("intel-isl/MiDaS", "MiDaS_small")
else:
self.DepthModel = None
# define SML model
self.ScaleMapLearner = MidasNet_small_videpth(
path=sml_model_path,
min_pred=min_pred,
max_pred=max_pred,
)
# depth prediction ranges
self.min_pred, self.max_pred = min_pred, max_pred
# depth evaluation ranges
self.min_depth, self.max_depth = min_depth, max_depth
# eval mode
self.DepthModel.eval()
self.DepthModel.to(device)
# eval mode
self.ScaleMapLearner.eval()
self.ScaleMapLearner.to(device)
def run(self, input_image, input_sparse_depth, validity_map, device):
input_height, input_width = np.shape(input_image)[0], np.shape(input_image)[1]
sample = {"image" : input_image}
sample = self.depth_model_transform(sample)
im = sample["image"].to(device)
input_sparse_depth_valid = (input_sparse_depth < self.max_depth) * (input_sparse_depth > self.min_depth)
if validity_map is not None:
input_sparse_depth_valid *= validity_map.astype(np.bool)
input_sparse_depth_valid = input_sparse_depth_valid.astype(bool)
input_sparse_depth[~input_sparse_depth_valid] = np.inf # set invalid depth
input_sparse_depth = 1.0 / input_sparse_depth
# run depth model
with torch.no_grad():
depth_pred = self.DepthModel.forward(im.unsqueeze(0))
depth_pred = (
torch.nn.functional.interpolate(
depth_pred.unsqueeze(1),
size=(input_height, input_width),
mode="bicubic",
align_corners=False,
)
.squeeze()
.cpu()
.numpy()
)
# global scale and shift alignment
GlobalAlignment = LeastSquaresEstimator(
estimate=depth_pred,
target=input_sparse_depth,
valid=input_sparse_depth_valid
)
GlobalAlignment.compute_scale_and_shift()
GlobalAlignment.apply_scale_and_shift()
GlobalAlignment.clamp_min_max(clamp_min=self.min_pred, clamp_max=self.max_pred)
int_depth = GlobalAlignment.output.astype(np.float32)
# interpolation of scale map
assert (np.sum(input_sparse_depth_valid) >= 3), "not enough valid sparse points"
ScaleMapInterpolator = Interpolator2D(
pred_inv = int_depth,
sparse_depth_inv = input_sparse_depth,
valid = input_sparse_depth_valid,
)
ScaleMapInterpolator.generate_interpolated_scale_map(
interpolate_method='linear',
fill_corners=False
)
int_scales = ScaleMapInterpolator.interpolated_scale_map.astype(np.float32)
int_scales = utils.normalize_unit_range(int_scales)
sample = {"image" : input_image, "int_depth" : int_depth, "int_scales" : int_scales, "int_depth_no_tf" : int_depth}
sample = self.ScaleMapLearner_transform(sample)
x = torch.cat([sample["int_depth"], sample["int_scales"]], 0)
x = x.to(device)
d = sample["int_depth_no_tf"].to(device)
# run SML model
with torch.no_grad():
sml_pred, sml_scales = self.ScaleMapLearner.forward(x.unsqueeze(0), d.unsqueeze(0))
sml_pred = (
torch.nn.functional.interpolate(
sml_pred,
size=(input_height, input_width),
mode="bicubic",
align_corners=False,
)
.squeeze()
.cpu()
.numpy()
)
output = {
"ga_depth" : int_depth,
"sml_depth" : sml_pred,
}
return output
================================================
FILE: run.py
================================================
import os
import argparse
import glob
import torch
import numpy as np
from PIL import Image
import modules.midas.utils as utils
import pipeline
def load_input_image(input_image_fp):
return utils.read_image(input_image_fp)
def load_sparse_depth(input_sparse_depth_fp):
input_sparse_depth = np.array(Image.open(input_sparse_depth_fp), dtype=np.float32) / 256.0
input_sparse_depth[input_sparse_depth <= 0] = 0.0
return input_sparse_depth
def run(depth_predictor, nsamples, sml_model_path,
min_pred, max_pred, min_depth, max_depth,
input_path, output_path, save_output):
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("device: %s" % device)
# instantiate method
method = pipeline.VIDepth(
depth_predictor, nsamples, sml_model_path,
min_pred, max_pred, min_depth, max_depth, device
)
# get inputs
img_names = glob.glob(os.path.join(input_path, "image", "*"))
num_images = len(img_names)
# create output folders
if save_output:
os.makedirs(os.path.join(output_path, 'ga_depth'), exist_ok=True)
os.makedirs(os.path.join(output_path, 'sml_depth'), exist_ok=True)
for ind, input_image_fp in enumerate(img_names):
if os.path.isdir(input_image_fp):
continue
print(" processing {} ({}/{})".format(input_image_fp, ind + 1, num_images))
input_image = load_input_image(input_image_fp)
input_sparse_depth_fp = input_image_fp.replace("image", "sparse_depth")
input_sparse_depth = load_sparse_depth(input_sparse_depth_fp)
# values in the [min_depth, max_depth] range are considered valid;
# an additional validity map may be specified
validity_map = None
# run method
output = method.run(input_image, input_sparse_depth, validity_map, device)
if save_output:
basename = os.path.splitext(os.path.basename(input_image_fp))[0]
# saving depth map after global alignment
utils.write_depth(
os.path.join(output_path, 'ga_depth', basename),
output["ga_depth"], bits=2
)
# saving depth map after local alignment with SML
utils.write_depth(
os.path.join(output_path, 'sml_depth', basename),
output["sml_depth"], bits=2
)
if __name__=="__main__":
parser = argparse.ArgumentParser()
# model parameters
parser.add_argument('-dp', '--depth-predictor', type=str, default='dpt_hybrid',
help='Name of depth predictor to use in pipeline.')
parser.add_argument('-ns', '--nsamples', type=int, default=150,
help='Number of sparse metric depth samples available.')
parser.add_argument('-sm', '--sml-model-path', type=str, default='',
help='Path to trained SML model weights.')
# depth parameters
parser.add_argument('--min-pred', type=float, default=0.1,
help='Min bound for predicted depth values.')
parser.add_argument('--max-pred', type=float, default=8.0,
help='Max bound for predicted depth values.')
parser.add_argument('--min-depth', type=float, default=0.2,
help='Min valid depth when evaluating.')
parser.add_argument('--max-depth', type=float, default=5.0,
help='Max valid depth when evaluating.')
# I/O paths
parser.add_argument('-i', '--input-path', type=str, default='./input',
help='Path to inputs.')
parser.add_argument('-o', '--output-path', type=str, default='./output',
help='Path to outputs.')
parser.add_argument('--save-output', dest='save_output', action='store_true',
help='Save output depth map.')
parser.set_defaults(save_output=False)
args = parser.parse_args()
print(args)
run(
args.depth_predictor,
args.nsamples,
args.sml_model_path,
args.min_pred,
args.max_pred,
args.min_depth,
args.max_depth,
args.input_path,
args.output_path,
args.save_output
)
gitextract_9b323q_9/ ├── .gitignore ├── LICENSE ├── README.md ├── environment.yaml ├── evaluate.py ├── metrics.py ├── modules/ │ ├── estimator.py │ ├── interpolator.py │ └── midas/ │ ├── base_model.py │ ├── blocks.py │ ├── midas_net_custom.py │ ├── normalization.py │ ├── transforms.py │ └── utils.py ├── pipeline.py └── run.py
SYMBOL INDEX (75 symbols across 11 files)
FILE: evaluate.py
function evaluate (line 16) | def evaluate(dataset_path, depth_predictor, nsamples, sml_model_path):
FILE: metrics.py
function rmse (line 4) | def rmse(estimate, target):
function mae (line 7) | def mae(estimate, target):
function absrel (line 10) | def absrel(estimate, target):
function inv_rmse (line 13) | def inv_rmse(estimate, target):
function inv_mae (line 16) | def inv_mae(estimate, target):
function inv_absrel (line 19) | def inv_absrel(estimate, target):
class ErrorMetrics (line 22) | class ErrorMetrics(object):
method __init__ (line 23) | def __init__(self):
method compute (line 28) | def compute(self, estimate, target, valid):
class ErrorMetricsAverager (line 47) | class ErrorMetricsAverager(object):
method __init__ (line 48) | def __init__(self):
method accumulate (line 54) | def accumulate(self, error_metrics):
method average (line 68) | def average(self):
FILE: modules/estimator.py
function compute_scale_and_shift_ls (line 3) | def compute_scale_and_shift_ls(prediction, target, mask):
class LeastSquaresEstimator (line 29) | class LeastSquaresEstimator(object):
method __init__ (line 30) | def __init__(self, estimate, target, valid):
method compute_scale_and_shift (line 40) | def compute_scale_and_shift(self):
method apply_scale_and_shift (line 43) | def apply_scale_and_shift(self):
method clamp_min_max (line 46) | def clamp_min_max(self, clamp_min=None, clamp_max=None):
FILE: modules/interpolator.py
function interpolate_knots (line 7) | def interpolate_knots(map_size, knot_coords, knot_values, interpolate, f...
class Interpolator2D (line 20) | class Interpolator2D(object):
method __init__ (line 21) | def __init__(self, pred_inv, sparse_depth_inv, valid):
method generate_interpolated_scale_map (line 43) | def generate_interpolated_scale_map(self, interpolate_method, fill_cor...
FILE: modules/midas/base_model.py
class BaseModel (line 4) | class BaseModel(torch.nn.Module):
method load (line 5) | def load(self, path):
FILE: modules/midas/blocks.py
function _make_encoder (line 4) | def _make_encoder(backbone, features, use_pretrained, groups=1, expand=F...
function _make_scratch (line 15) | def _make_scratch(in_shape, out_shape, groups=1, expand=False):
function _make_pretrained_efficientnet_lite3 (line 44) | def _make_pretrained_efficientnet_lite3(use_pretrained, exportable=False):
function _make_efficientnet_backbone (line 54) | def _make_efficientnet_backbone(effnet):
class ResidualConvUnit_custom (line 67) | class ResidualConvUnit_custom(nn.Module):
method __init__ (line 71) | def __init__(self, features, activation, bn):
method forward (line 99) | def forward(self, x):
class FeatureFusionBlock_custom (line 125) | class FeatureFusionBlock_custom(nn.Module):
method __init__ (line 129) | def __init__(self, features, activation, deconv=False, bn=False, expan...
method forward (line 154) | def forward(self, *xs):
class OutputConv (line 177) | class OutputConv(nn.Module):
method __init__ (line 181) | def __init__(self, features, groups, activation, non_negative):
method forward (line 195) | def forward(self, x):
FILE: modules/midas/midas_net_custom.py
function weights_init (line 13) | def weights_init(m):
class MidasNet_small_videpth (line 26) | class MidasNet_small_videpth(BaseModel):
method __init__ (line 30) | def __init__(self, path=None, features=64, backbone="efficientnet_lite...
method forward (line 90) | def forward(self, x, d):
FILE: modules/midas/transforms.py
class Resize (line 10) | class Resize(object):
method __init__ (line 14) | def __init__(
method constrain_to_multiple_of (line 56) | def constrain_to_multiple_of(self, x, min_val=0, max_val=None):
method get_size (line 67) | def get_size(self, width, height):
method __call__ (line 124) | def __call__(self, sample):
class NormalizeImage (line 158) | class NormalizeImage(object):
method __init__ (line 162) | def __init__(self, mean, std):
method __call__ (line 166) | def __call__(self, sample):
class NormalizeIntermediate (line 171) | class NormalizeIntermediate(object):
method __init__ (line 175) | def __init__(self, mean, std):
method __call__ (line 183) | def __call__(self, sample):
class PrepareForNet (line 193) | class PrepareForNet(object):
method __init__ (line 197) | def __init__(self):
method __call__ (line 200) | def __call__(self, sample):
class Tensorize (line 217) | class Tensorize(object):
method __init__ (line 221) | def __init__(self):
method __call__ (line 224) | def __call__(self, sample):
function get_transforms (line 238) | def get_transforms(depth_predictor, sparsifier, nsamples):
FILE: modules/midas/utils.py
function read_pfm (line 10) | def read_pfm(path):
function write_pfm (line 59) | def write_pfm(path, image, scale=1):
function read_image (line 98) | def read_image(path):
function resize_image (line 117) | def resize_image(img):
function resize_depth (line 147) | def resize_depth(depth, width, height):
function write_depth (line 167) | def write_depth(path, depth, bits=1):
function write_png (line 194) | def write_png(path, array, bits=2):
function normalize_unit_range (line 221) | def normalize_unit_range(data):
FILE: pipeline.py
class VIDepth (line 11) | class VIDepth(object):
method __init__ (line 12) | def __init__(self, depth_predictor, nsamples, sml_model_path,
method run (line 60) | def run(self, input_image, input_sparse_depth, validity_map, device):
FILE: run.py
function load_input_image (line 15) | def load_input_image(input_image_fp):
function load_sparse_depth (line 19) | def load_sparse_depth(input_sparse_depth_fp):
function run (line 25) | def run(depth_predictor, nsamples, sml_model_path,
Condensed preview — 16 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (69K chars).
[
{
"path": ".gitignore",
"chars": 1829,
"preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
},
{
"path": "LICENSE",
"chars": 1084,
"preview": "MIT License\n\nCopyright (c) 2023 Intelligent Systems Lab Org\n\nPermission is hereby granted, free of charge, to any person"
},
{
"path": "README.md",
"chars": 8823,
"preview": "# PROJECT NOT UNDER ACTIVE MANAGEMENT # \nThis project will no longer be maintained by Intel. \nIntel will not provide o"
},
{
"path": "environment.yaml",
"chars": 349,
"preview": "name: vi-depth\nchannels:\n - pytorch\n - defaults\ndependencies:\n - nvidia::cudatoolkit=11.7\n - python=3.10.8\n - pytor"
},
{
"path": "evaluate.py",
"chars": 5121,
"preview": "import os\nimport argparse\n\nimport torch\nimport imageio\nimport numpy as np\n\nfrom tqdm import tqdm\nfrom PIL import Image\n\n"
},
{
"path": "metrics.py",
"chars": 2889,
"preview": "import numpy as np\nimport torch\n\ndef rmse(estimate, target):\n return np.sqrt(np.mean((estimate - target) ** 2))\n\ndef "
},
{
"path": "modules/estimator.py",
"chars": 2162,
"preview": "import numpy as np\n\ndef compute_scale_and_shift_ls(prediction, target, mask):\n # tuple specifying with axes to sum\n "
},
{
"path": "modules/interpolator.py",
"chars": 1686,
"preview": "import numpy as np\nnp.set_printoptions(suppress=True)\n\nfrom scipy.interpolate import griddata\n\n\ndef interpolate_knots(ma"
},
{
"path": "modules/midas/base_model.py",
"chars": 702,
"preview": "import torch\n\n\nclass BaseModel(torch.nn.Module):\n def load(self, path):\n \"\"\"Load model from file.\n\n Arg"
},
{
"path": "modules/midas/blocks.py",
"chars": 5592,
"preview": "import torch\nimport torch.nn as nn\n\ndef _make_encoder(backbone, features, use_pretrained, groups=1, expand=False, export"
},
{
"path": "modules/midas/midas_net_custom.py",
"chars": 5074,
"preview": "\"\"\"MidashNet: Network for monocular depth estimation trained by mixing several datasets.\nThis file contains code that is"
},
{
"path": "modules/midas/normalization.py",
"chars": 3770,
"preview": "VOID_INTERMEDIATE = {\n\n \"dpt_beit_large_512\" : {\n \"void_150\" : { \n \"mean\" : {\"int_depth\" : 0.730, \""
},
{
"path": "modules/midas/transforms.py",
"chars": 11027,
"preview": "import numpy as np\nimport cv2\nimport math\nimport torch\nimport torchvision.transforms as transforms\n\nfrom modules.midas.u"
},
{
"path": "modules/midas/utils.py",
"chars": 5642,
"preview": "\"\"\"Utils for monoDepth.\n\"\"\"\nimport sys\nimport re\nimport numpy as np\nimport cv2\nimport torch\n\n\ndef read_pfm(path):\n \"\""
},
{
"path": "pipeline.py",
"chars": 5486,
"preview": "import torch\nimport numpy as np\n\nfrom modules.midas.midas_net_custom import MidasNet_small_videpth\nfrom modules.estimato"
},
{
"path": "run.py",
"chars": 4319,
"preview": "import os\nimport argparse\nimport glob\n\nimport torch\nimport numpy as np\n\nfrom PIL import Image\n\nimport modules.midas.util"
}
]
About this extraction
This page contains the full source code of the isl-org/VI-Depth GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 16 files (64.0 KB), approximately 17.4k tokens, and a symbol index with 75 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.