Repository: isl-org/VI-Depth Branch: main Commit: 2b4cf6eab369 Files: 16 Total size: 64.0 KB Directory structure: gitextract_9b323q_9/ ├── .gitignore ├── LICENSE ├── README.md ├── environment.yaml ├── evaluate.py ├── metrics.py ├── modules/ │ ├── estimator.py │ ├── interpolator.py │ └── midas/ │ ├── base_model.py │ ├── blocks.py │ ├── midas_net_custom.py │ ├── normalization.py │ ├── transforms.py │ └── utils.py ├── pipeline.py └── run.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] *$py.class # C extensions *.so # Distribution / packaging .Python build/ develop-eggs/ dist/ downloads/ eggs/ .eggs/ lib/ lib64/ parts/ sdist/ var/ wheels/ pip-wheel-metadata/ share/python-wheels/ *.egg-info/ .installed.cfg *.egg MANIFEST # PyInstaller # Usually these files are written by a python script from a template # before PyInstaller builds the exe, so as to inject date/other infos into it. *.manifest *.spec # Installer logs pip-log.txt pip-delete-this-directory.txt # Unit test / coverage reports htmlcov/ .tox/ .nox/ .coverage .coverage.* .cache nosetests.xml coverage.xml *.cover *.py,cover .hypothesis/ .pytest_cache/ # Translations *.mo *.pot # Django stuff: *.log local_settings.py db.sqlite3 db.sqlite3-journal # Flask stuff: instance/ .webassets-cache # Scrapy stuff: .scrapy # Sphinx documentation docs/_build/ # PyBuilder target/ # Jupyter Notebook .ipynb_checkpoints # IPython profile_default/ ipython_config.py # pyenv .python-version # pipenv # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. # However, in case of collaboration, if having platform-specific dependencies or dependencies # having no cross-platform support, pipenv may install dependencies that don't work, or not # install all needed dependencies. #Pipfile.lock # PEP 582; used by e.g. github.com/David-OConnor/pyflow __pypackages__/ # Celery stuff celerybeat-schedule celerybeat.pid # SageMath parsed files *.sage.py # Environments .env .venv env/ venv/ ENV/ env.bak/ venv.bak/ # Spyder project settings .spyderproject .spyproject # Rope project settings .ropeproject # mkdocs documentation /site # mypy .mypy_cache/ .dmypy.json dmypy.json # Pyre type checker .pyre/ # I/O input/ output/ weights/ ================================================ FILE: LICENSE ================================================ MIT License Copyright (c) 2023 Intelligent Systems Lab Org Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: README.md ================================================ # PROJECT NOT UNDER ACTIVE MANAGEMENT # This project will no longer be maintained by Intel. Intel will not provide or guarantee development of or support for this project, including but not limited to, maintenance, bug fixes, new releases or updates. Patches to this project are no longer accepted by Intel. If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the community, please create your own fork of the project. # Monocular Visual-Inertial Depth Estimation This repository contains code and models for our paper: > [Monocular Visual-Inertial Depth Estimation](https://arxiv.org/abs/2303.12134) > Diana Wofk, René Ranftl, Matthias Müller, Vladlen Koltun For a quick overview of the work you can watch the [short talk](https://youtu.be/Ja4Nic3YYCg) and [teaser](https://youtu.be/IMwiKwSpshQ) on YouTube. ## Introduction ![Methodology Diagram](figures/methodology_diagram.png) We present a visual-inertial depth estimation pipeline that integrates monocular depth estimation and visual-inertial odometry to produce dense depth estimates with metric scale. Our approach consists of three stages: (1) input processing, where RGB and IMU data feed into monocular depth estimation alongside visual-inertial odometry, (2) global scale and shift alignment, where monocular depth estimates are fitted to sparse depth from VIO in a least-squares manner, and (3) learning-based dense scale alignment, where globally-aligned depth is locally realigned using a dense scale map regressed by the ScaleMapLearner (SML). The images at the bottom in the diagram above illustrate a VOID sample being processed through our pipeline; from left to right: the input RGB, ground truth depth, sparse depth from VIO, globally-aligned depth, scale map scaffolding, dense scale map regressed by SML, final depth output. ![Teaser Figure](figures/teaser_figure.png) ## Setup 1) Setup dependencies: ```shell conda env create -f environment.yaml conda activate vi-depth ``` 2) Pick one or more ScaleMapLearner (SML) models and download the corresponding weights to the `weights` folder. | Depth Predictor | SML on VOID 150 | SML on VOID 500 | SML on VOID 1500 | | :--- | :----: | :----: | :----: | | DPT-BEiT-Large | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_beit_large_512.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_beit_large_512.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_beit_large_512.nsamples.1500.ckpt) | | DPT-SwinV2-Large | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_large_384.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_large_384.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_large_384.nsamples.1500.ckpt) | | DPT-Large | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_large.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_large.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_large.nsamples.1500.ckpt) | | DPT-Hybrid | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_hybrid.nsamples.150.ckpt)* | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_hybrid.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_hybrid.nsamples.1500.ckpt) | | DPT-SwinV2-Tiny | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_tiny_256.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_tiny_256.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_swin2_tiny_256.nsamples.1500.ckpt) | | DPT-LeViT | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_levit_224.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_levit_224.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_levit_224.nsamples.1500.ckpt) | | MiDaS-small | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.midas_small.nsamples.150.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.midas_small.nsamples.500.ckpt) | [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.midas_small.nsamples.1500.ckpt) | *Also available with pretraining on TartanAir: [model](https://github.com/isl-org/VI-Depth/releases/download/v1/sml_model.dpredictor.dpt_hybrid.nsamples.150.pretrained.ckpt) ## Inference 1) Place inputs into the `input` folder. An input image and corresponding sparse metric depth map are expected: ```bash input ├── image # RGB image │ ├── .png │ └── ... └── sparse_depth # sparse metric depth map ├── .png # as 16b PNG └── ... ``` The `load_sparse_depth` function in `run.py` may need to be modified depending on the format in which sparse depth is stored. By default, the depth storage method [used in the VOID dataset](https://github.com/alexklwong/void-dataset/blob/master/src/data_utils.py) is assumed. 2) Run the `run.py` script as follows: ```bash DEPTH_PREDICTOR="dpt_beit_large_512" NSAMPLES=150 SML_MODEL_PATH="weights/sml_model.dpredictor.${DEPTH_PREDICTOR}.nsamples.${NSAMPLES}.ckpt" python run.py -dp $DEPTH_PREDICTOR -ns $NSAMPLES -sm $SML_MODEL_PATH --save-output ``` 3) The `--save-output` flag enables saving outputs to the `output` folder. By default, the following outputs will be saved per sample: ```bash output ├── ga_depth # metric depth map after global alignment │ ├── .pfm # as PFM │ ├── .png # as 16b PNG │ └── ... └── sml_depth # metric depth map output by SML ├── .pfm # as PFM ├── .png # as 16b PNG └── ... ``` ## Evaluation Models provided in this repo were trained on the VOID dataset. 1) Download the VOID dataset following [the instructions in the VOID dataset repo](https://github.com/alexklwong/void-dataset#downloading-void). 2) To evaluate on VOID test sets, run the `evaluate.py` script as follows: ```bash DATASET_PATH="/path/to/void_release/" DEPTH_PREDICTOR="dpt_beit_large_512" NSAMPLES=150 SML_MODEL_PATH="weights/sml_model.dpredictor.${DEPTH_PREDICTOR}.nsamples.${NSAMPLES}.ckpt" python evaluate.py -ds $DATASET_PATH -dp $DEPTH_PREDICTOR -ns $NSAMPLES -sm $SML_MODEL_PATH ``` Results for the example shown above: ``` Averaging metrics for globally-aligned depth over 800 samples Averaging metrics for SML-aligned depth over 800 samples +---------+----------+----------+ | metric | GA Only | GA+SML | +---------+----------+----------+ | RMSE | 191.36 | 142.85 | | MAE | 115.84 | 76.95 | | AbsRel | 0.069 | 0.046 | | iRMSE | 72.70 | 57.13 | | iMAE | 49.32 | 34.25 | | iAbsRel | 0.071 | 0.048 | +---------+----------+----------+ ``` To evaluate on VOID test sets at different densities (void_150, void_500, void_1500), change the `NSAMPLES` argument above accordingly. ## Citation If you reference our work, please consider citing the following: ```bib @inproceedings{wofk2023videpth, author = {{Wofk, Diana and Ranftl, Ren\'{e} and M{\"u}ller, Matthias and Koltun, Vladlen}}, title = {{Monocular Visual-Inertial Depth Estimation}}, booktitle = {{IEEE International Conference on Robotics and Automation (ICRA)}}, year = {{2023}} } ``` ## Acknowledgements Our work builds on and uses code from [MiDaS](https://github.com/isl-org/MiDaS), [timm](https://github.com/rwightman/pytorch-image-models), and [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/). We'd like to thank the authors for making these libraries and frameworks available. _Last revisited: August 2024_ ================================================ FILE: environment.yaml ================================================ name: vi-depth channels: - pytorch - defaults dependencies: - nvidia::cudatoolkit=11.7 - python=3.10.8 - pytorch::pytorch=1.13.0 - torchvision=0.14.0 - pip=22.3.1 - numpy=1.23.4 - pip: - opencv-python==4.6.0.66 - scipy==1.10.1 - timm==0.6.12 - pytorch-lightning==1.9.0 - imageio==2.25.0 - prettytable==3.6.0 ================================================ FILE: evaluate.py ================================================ import os import argparse import torch import imageio import numpy as np from tqdm import tqdm from PIL import Image import modules.midas.utils as utils import pipeline import metrics def evaluate(dataset_path, depth_predictor, nsamples, sml_model_path): device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print("device: %s" % device) # ranges for VOID min_depth, max_depth = 0.2, 5.0 min_pred, max_pred = 0.1, 8.0 # instantiate method method = pipeline.VIDepth( depth_predictor, nsamples, sml_model_path, min_pred, max_pred, min_depth, max_depth, device ) # get inputs with open(f"{dataset_path}/void_{nsamples}/test_image.txt") as f: test_image_list = [line.rstrip() for line in f] # initialize error aggregators avg_error_w_int_depth = metrics.ErrorMetricsAverager() avg_error_w_pred = metrics.ErrorMetricsAverager() # iterate through inputs list for i in tqdm(range(len(test_image_list))): # image input_image_fp = os.path.join(dataset_path, test_image_list[i]) input_image = utils.read_image(input_image_fp) # sparse depth input_sparse_depth_fp = input_image_fp.replace("image", "sparse_depth") input_sparse_depth = np.array(Image.open(input_sparse_depth_fp), dtype=np.float32) / 256.0 input_sparse_depth[input_sparse_depth <= 0] = 0.0 # sparse depth validity map validity_map_fp = input_image_fp.replace("image", "validity_map") validity_map = np.array(Image.open(validity_map_fp), dtype=np.float32) assert(np.all(np.unique(validity_map) == [0, 256])) validity_map[validity_map > 0] = 1 # target (ground truth) depth target_depth_fp = input_image_fp.replace("image", "ground_truth") target_depth = np.array(Image.open(target_depth_fp), dtype=np.float32) / 256.0 target_depth[target_depth <= 0] = 0.0 # target depth valid/mask mask = (target_depth < max_depth) if min_depth is not None: mask *= (target_depth > min_depth) target_depth[~mask] = np.inf # set invalid depth target_depth = 1.0 / target_depth # run pipeline output = method.run(input_image, input_sparse_depth, validity_map, device) # compute error metrics using intermediate (globally aligned) depth error_w_int_depth = metrics.ErrorMetrics() error_w_int_depth.compute( estimate = output["ga_depth"], target = target_depth, valid = mask.astype(np.bool), ) # compute error metrics using SML output depth error_w_pred = metrics.ErrorMetrics() error_w_pred.compute( estimate = output["sml_depth"], target = target_depth, valid = mask.astype(np.bool), ) # accumulate error metrics avg_error_w_int_depth.accumulate(error_w_int_depth) avg_error_w_pred.accumulate(error_w_pred) # compute average error metrics print("Averaging metrics for globally-aligned depth over {} samples".format( avg_error_w_int_depth.total_count )) avg_error_w_int_depth.average() print("Averaging metrics for SML-aligned depth over {} samples".format( avg_error_w_pred.total_count )) avg_error_w_pred.average() from prettytable import PrettyTable summary_tb = PrettyTable() summary_tb.field_names = ["metric", "GA Only", "GA+SML"] summary_tb.add_row(["RMSE", f"{avg_error_w_int_depth.rmse_avg:7.2f}", f"{avg_error_w_pred.rmse_avg:7.2f}"]) summary_tb.add_row(["MAE", f"{avg_error_w_int_depth.mae_avg:7.2f}", f"{avg_error_w_pred.mae_avg:7.2f}"]) summary_tb.add_row(["AbsRel", f"{avg_error_w_int_depth.absrel_avg:8.3f}", f"{avg_error_w_pred.absrel_avg:8.3f}"]) summary_tb.add_row(["iRMSE", f"{avg_error_w_int_depth.inv_rmse_avg:7.2f}", f"{avg_error_w_pred.inv_rmse_avg:7.2f}"]) summary_tb.add_row(["iMAE", f"{avg_error_w_int_depth.inv_mae_avg:7.2f}", f"{avg_error_w_pred.inv_mae_avg:7.2f}"]) summary_tb.add_row(["iAbsRel", f"{avg_error_w_int_depth.inv_absrel_avg:8.3f}", f"{avg_error_w_pred.inv_absrel_avg:8.3f}"]) print(summary_tb) if __name__=="__main__": parser = argparse.ArgumentParser() parser.add_argument('-ds', '--dataset-path', type=str, default='/path/to/void_release/', help='Path to VOID release dataset.') parser.add_argument('-dp', '--depth-predictor', type=str, default='midas_small', help='Name of depth predictor to use in pipeline.') parser.add_argument('-ns', '--nsamples', type=int, default=150, help='Number of sparse metric depth samples available.') parser.add_argument('-sm', '--sml-model-path', type=str, default='', help='Path to trained SML model weights.') args = parser.parse_args() print(args) evaluate( args.dataset_path, args.depth_predictor, args.nsamples, args.sml_model_path, ) ================================================ FILE: metrics.py ================================================ import numpy as np import torch def rmse(estimate, target): return np.sqrt(np.mean((estimate - target) ** 2)) def mae(estimate, target): return np.mean(np.abs(estimate - target)) def absrel(estimate, target): return np.mean(np.abs(estimate - target) / target) def inv_rmse(estimate, target): return np.sqrt(np.mean((1.0/estimate - 1.0/target) ** 2)) def inv_mae(estimate, target): return np.mean(np.abs(1.0/estimate - 1.0/target)) def inv_absrel(estimate, target): return np.mean((np.abs(1.0/estimate - 1.0/target)) / (1.0/target)) class ErrorMetrics(object): def __init__(self): # initialize by setting to worst values self.rmse, self.mae, self.absrel = np.inf, np.inf, np.inf self.inv_rmse, self.inv_mae, self.inv_absrel = np.inf, np.inf, np.inf def compute(self, estimate, target, valid): # apply valid masks estimate = estimate[valid] target = target[valid] # estimate and target will be in inverse space, convert to regular estimate = 1.0/estimate target = 1.0/target # depth error, estimate in meters, convert units to mm self.rmse = rmse(1000.0*estimate, 1000.0*target) self.mae = mae(1000.0*estimate, 1000.0*target) self.absrel = absrel(1000.0*estimate, 1000.0*target) # inverse depth error, estimate in meters, convert units to 1/km self.inv_rmse = inv_rmse(0.001*estimate, 0.001*target) self.inv_mae = inv_mae(0.001*estimate, 0.001*target) self.inv_absrel = inv_absrel(0.001*estimate, 0.001*target) class ErrorMetricsAverager(object): def __init__(self): # initialize avg accumulators to zero self.rmse_avg, self.mae_avg, self.absrel_avg = 0, 0, 0 self.inv_rmse_avg, self.inv_mae_avg, self.inv_absrel_avg = 0, 0, 0 self.total_count = 0 def accumulate(self, error_metrics): # adds to accumulators from ErrorMetrics object assert isinstance(error_metrics, ErrorMetrics) self.rmse_avg += error_metrics.rmse self.mae_avg += error_metrics.mae self.absrel_avg += error_metrics.absrel self.inv_rmse_avg += error_metrics.inv_rmse self.inv_mae_avg += error_metrics.inv_mae self.inv_absrel_avg += error_metrics.inv_absrel self.total_count += 1 def average(self): # print(f"Averaging depth metrics over {self.total_count} samples") self.rmse_avg = self.rmse_avg / self.total_count self.mae_avg = self.mae_avg / self.total_count self.absrel_avg = self.absrel_avg / self.total_count # print(f"Averaging inv depth metrics over {self.total_count} samples") self.inv_rmse_avg = self.inv_rmse_avg / self.total_count self.inv_mae_avg = self.inv_mae_avg / self.total_count self.inv_absrel_avg = self.inv_absrel_avg / self.total_count ================================================ FILE: modules/estimator.py ================================================ import numpy as np def compute_scale_and_shift_ls(prediction, target, mask): # tuple specifying with axes to sum sum_axes = (0, 1) # system matrix: A = [[a_00, a_01], [a_10, a_11]] a_00 = np.sum(mask * prediction * prediction, sum_axes) a_01 = np.sum(mask * prediction, sum_axes) a_11 = np.sum(mask, sum_axes) # right hand side: b = [b_0, b_1] b_0 = np.sum(mask * prediction * target, sum_axes) b_1 = np.sum(mask * target, sum_axes) # solution: x = A^-1 . b = [[a_11, -a_01], [-a_10, a_00]] / (a_00 * a_11 - a_01 * a_10) . b x_0 = np.zeros_like(b_0) x_1 = np.zeros_like(b_1) det = a_00 * a_11 - a_01 * a_01 # A needs to be a positive definite matrix. valid = det > 0 x_0[valid] = (a_11[valid] * b_0[valid] - a_01[valid] * b_1[valid]) / det[valid] x_1[valid] = (-a_01[valid] * b_0[valid] + a_00[valid] * b_1[valid]) / det[valid] return x_0, x_1 class LeastSquaresEstimator(object): def __init__(self, estimate, target, valid): self.estimate = estimate self.target = target self.valid = valid # to be computed self.scale = 1.0 self.shift = 0.0 self.output = None def compute_scale_and_shift(self): self.scale, self.shift = compute_scale_and_shift_ls(self.estimate, self.target, self.valid) def apply_scale_and_shift(self): self.output = self.estimate * self.scale + self.shift def clamp_min_max(self, clamp_min=None, clamp_max=None): if clamp_min is not None: if clamp_min > 0: clamp_min_inv = 1.0/clamp_min self.output[self.output > clamp_min_inv] = clamp_min_inv assert np.max(self.output) <= clamp_min_inv else: # divide by zero, so skip pass if clamp_max is not None: clamp_max_inv = 1.0/clamp_max self.output[self.output < clamp_max_inv] = clamp_max_inv # print(np.min(self.output), clamp_max_inv) assert np.min(self.output) >= clamp_max_inv # check for nonzero range # assert np.min(self.output) != np.max(self.output) ================================================ FILE: modules/interpolator.py ================================================ import numpy as np np.set_printoptions(suppress=True) from scipy.interpolate import griddata def interpolate_knots(map_size, knot_coords, knot_values, interpolate, fill_corners): grid_x, grid_y = np.mgrid[0:map_size[0], 0:map_size[1]] interpolated_map = griddata( points=knot_coords.T, values=knot_values, xi=(grid_y, grid_x), method=interpolate, fill_value=1.0) return interpolated_map class Interpolator2D(object): def __init__(self, pred_inv, sparse_depth_inv, valid): self.pred_inv = pred_inv self.sparse_depth_inv = sparse_depth_inv self.valid = valid self.map_size = np.shape(pred_inv) self.num_knots = np.sum(valid) nonzero_y_loc = np.nonzero(valid)[0] nonzero_x_loc = np.nonzero(valid)[1] self.knot_coords = np.stack((nonzero_x_loc, nonzero_y_loc)) self.knot_scales = sparse_depth_inv[valid] / pred_inv[valid] self.knot_shifts = sparse_depth_inv[valid] - pred_inv[valid] self.knot_list = [] for i in range(self.num_knots): self.knot_list.append((int(self.knot_coords[0,i]), int(self.knot_coords[1,i]))) # to be computed self.interpolated_map = None self.confidence_map = None self.output = None def generate_interpolated_scale_map(self, interpolate_method, fill_corners=False): self.interpolated_scale_map = interpolate_knots( map_size=self.map_size, knot_coords=self.knot_coords, knot_values=self.knot_scales, interpolate=interpolate_method, fill_corners=fill_corners ).astype(np.float32) ================================================ FILE: modules/midas/base_model.py ================================================ import torch class BaseModel(torch.nn.Module): def load(self, path): """Load model from file. Args: path (str): file path """ parameters = torch.load(path, map_location=torch.device('cpu')) if "optimizer" in parameters: parameters = parameters["model"] if "state_dict" in parameters: state_dict = parameters["state_dict"] new_state_dict = {} for key in state_dict.keys(): if key[0:6] == "model.": new_state_dict[key[6:]] = state_dict[key] self.load_state_dict(new_state_dict) else: self.load_state_dict(parameters) ================================================ FILE: modules/midas/blocks.py ================================================ import torch import torch.nn as nn def _make_encoder(backbone, features, use_pretrained, groups=1, expand=False, exportable=True): if backbone == "efficientnet_lite3": pretrained = _make_pretrained_efficientnet_lite3(use_pretrained, exportable=exportable) scratch = _make_scratch([32, 48, 136, 384], features, groups=groups, expand=expand) # efficientnet_lite3 else: print(f"Backbone '{backbone}' not implemented") assert False return pretrained, scratch def _make_scratch(in_shape, out_shape, groups=1, expand=False): scratch = nn.Module() out_shape1 = out_shape out_shape2 = out_shape out_shape3 = out_shape out_shape4 = out_shape if expand==True: out_shape1 = out_shape out_shape2 = out_shape*2 out_shape3 = out_shape*4 out_shape4 = out_shape*8 scratch.layer1_rn = nn.Conv2d( in_shape[0], out_shape1, kernel_size=3, stride=1, padding=1, bias=False, groups=groups ) scratch.layer2_rn = nn.Conv2d( in_shape[1], out_shape2, kernel_size=3, stride=1, padding=1, bias=False, groups=groups ) scratch.layer3_rn = nn.Conv2d( in_shape[2], out_shape3, kernel_size=3, stride=1, padding=1, bias=False, groups=groups ) scratch.layer4_rn = nn.Conv2d( in_shape[3], out_shape4, kernel_size=3, stride=1, padding=1, bias=False, groups=groups ) return scratch def _make_pretrained_efficientnet_lite3(use_pretrained, exportable=False): efficientnet = torch.hub.load( "rwightman/gen-efficientnet-pytorch", "tf_efficientnet_lite3", pretrained=use_pretrained, exportable=exportable ) return _make_efficientnet_backbone(efficientnet) def _make_efficientnet_backbone(effnet): pretrained = nn.Module() pretrained.layer1 = nn.Sequential( effnet.conv_stem, effnet.bn1, effnet.act1, *effnet.blocks[0:2] ) pretrained.layer2 = nn.Sequential(*effnet.blocks[2:3]) pretrained.layer3 = nn.Sequential(*effnet.blocks[3:5]) pretrained.layer4 = nn.Sequential(*effnet.blocks[5:9]) return pretrained class ResidualConvUnit_custom(nn.Module): """Residual convolution module. """ def __init__(self, features, activation, bn): """Init. Args: features (int): number of features """ super().__init__() self.bn = bn self.groups=1 self.conv1 = nn.Conv2d( features, features, kernel_size=3, stride=1, padding=1, bias=True, groups=self.groups ) self.conv2 = nn.Conv2d( features, features, kernel_size=3, stride=1, padding=1, bias=True, groups=self.groups ) if self.bn==True: self.bn1 = nn.BatchNorm2d(features) self.bn2 = nn.BatchNorm2d(features) self.activation = activation self.skip_add = nn.quantized.FloatFunctional() def forward(self, x): """Forward pass. Args: x (tensor): input Returns: tensor: output """ out = self.activation(x) out = self.conv1(out) if self.bn==True: out = self.bn1(out) out = self.activation(out) out = self.conv2(out) if self.bn==True: out = self.bn2(out) if self.groups > 1: out = self.conv_merge(out) return self.skip_add.add(out, x) class FeatureFusionBlock_custom(nn.Module): """Feature fusion block. """ def __init__(self, features, activation, deconv=False, bn=False, expand=False, align_corners=True): """Init. Args: features (int): number of features """ super(FeatureFusionBlock_custom, self).__init__() self.deconv = deconv self.align_corners = align_corners self.groups=1 self.expand = expand out_features = features if self.expand==True: out_features = features//2 self.out_conv = nn.Conv2d(features, out_features, kernel_size=1, stride=1, padding=0, bias=True, groups=1) self.resConfUnit1 = ResidualConvUnit_custom(features, activation, bn) self.resConfUnit2 = ResidualConvUnit_custom(features, activation, bn) self.skip_add = nn.quantized.FloatFunctional() def forward(self, *xs): """Forward pass. Returns: tensor: output """ output = xs[0] if len(xs) == 2: res = self.resConfUnit1(xs[1]) output = self.skip_add.add(output, res) output = self.resConfUnit2(output) output = nn.functional.interpolate( output, scale_factor=2, mode="bilinear", align_corners=self.align_corners ) output = self.out_conv(output) return output class OutputConv(nn.Module): """Output conv block. """ def __init__(self, features, groups, activation, non_negative): super(OutputConv, self).__init__() self.output_conv = nn.Sequential( nn.Conv2d(features, features//2, kernel_size=3, stride=1, padding=1, groups=groups), nn.Upsample(scale_factor=2, mode="bilinear"), nn.Conv2d(features//2, 32, kernel_size=3, stride=1, padding=1), activation, nn.Conv2d(32, 1, kernel_size=1, stride=1, padding=0), nn.ReLU(True) if non_negative else nn.Identity(), nn.Identity(), ) def forward(self, x): return self.output_conv(x) ================================================ FILE: modules/midas/midas_net_custom.py ================================================ """MidashNet: Network for monocular depth estimation trained by mixing several datasets. This file contains code that is adapted from https://github.com/thomasjpfan/pytorch_refinenet/blob/master/pytorch_refinenet/refinenet/refinenet_4cascade.py """ import torch import torch.nn as nn from torch.nn import functional as F from .base_model import BaseModel from .blocks import FeatureFusionBlock_custom, _make_encoder, OutputConv def weights_init(m): import math # initialize from normal (Gaussian) distribution if isinstance(m, nn.Conv2d): n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels m.weight.data.normal_(0, math.sqrt(2.0 / n)) if m.bias is not None: m.bias.data.zero_() elif isinstance(m, nn.BatchNorm2d): m.weight.data.fill_(1) m.bias.data.zero_() class MidasNet_small_videpth(BaseModel): """Network for monocular depth estimation. """ def __init__(self, path=None, features=64, backbone="efficientnet_lite3", non_negative=False, exportable=True, channels_last=False, align_corners=True, blocks={'expand': True}, in_channels=2, regress='r', min_pred=None, max_pred=None): """Init. Args: path (str, optional): Path to saved model. Defaults to None. features (int, optional): Number of features. Defaults to 64. backbone (str, optional): Backbone network for encoder. Defaults to efficientnet_lite3. """ print("Loading weights: ", path) super(MidasNet_small_videpth, self).__init__() use_pretrained = False if path else True self.channels_last = channels_last self.blocks = blocks self.backbone = backbone self.groups = 1 # for model output self.regress = regress self.min_pred = min_pred self.max_pred = max_pred features1=features features2=features features3=features features4=features self.expand = False if "expand" in self.blocks and self.blocks['expand'] == True: self.expand = True features1=features features2=features*2 features3=features*4 features4=features*8 self.first = nn.Sequential( nn.Conv2d(in_channels, 3, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(3), nn.ReLU(inplace=True) ) self.first.apply(weights_init) self.pretrained, self.scratch = _make_encoder(self.backbone, features, use_pretrained, groups=self.groups, expand=self.expand, exportable=exportable) self.scratch.activation = nn.ReLU(False) self.scratch.refinenet4 = FeatureFusionBlock_custom(features4, self.scratch.activation, deconv=False, bn=False, expand=self.expand, align_corners=align_corners) self.scratch.refinenet3 = FeatureFusionBlock_custom(features3, self.scratch.activation, deconv=False, bn=False, expand=self.expand, align_corners=align_corners) self.scratch.refinenet2 = FeatureFusionBlock_custom(features2, self.scratch.activation, deconv=False, bn=False, expand=self.expand, align_corners=align_corners) self.scratch.refinenet1 = FeatureFusionBlock_custom(features1, self.scratch.activation, deconv=False, bn=False, align_corners=align_corners) self.scratch.output_conv = OutputConv(features, self.groups, self.scratch.activation, non_negative) if path: self.load(path) def forward(self, x, d): """Forward pass. Args: x (tensor): input data (image) d (tensor): unalterated input depth Returns: tensor: depth """ if self.channels_last==True: print("self.channels_last = ", self.channels_last) x.contiguous(memory_format=torch.channels_last) layer_0 = self.first(x) layer_1 = self.pretrained.layer1(layer_0) layer_2 = self.pretrained.layer2(layer_1) layer_3 = self.pretrained.layer3(layer_2) layer_4 = self.pretrained.layer4(layer_3) layer_1_rn = self.scratch.layer1_rn(layer_1) layer_2_rn = self.scratch.layer2_rn(layer_2) layer_3_rn = self.scratch.layer3_rn(layer_3) layer_4_rn = self.scratch.layer4_rn(layer_4) path_4 = self.scratch.refinenet4(layer_4_rn) path_3 = self.scratch.refinenet3(path_4, layer_3_rn) path_2 = self.scratch.refinenet2(path_3, layer_2_rn) path_1 = self.scratch.refinenet1(path_2, layer_1_rn) out = self.scratch.output_conv(path_1) scales = F.relu(1.0 + out) pred = d * scales # clamp pred to min and max if self.min_pred is not None: min_pred_inv = 1.0/self.min_pred pred[pred > min_pred_inv] = min_pred_inv if self.max_pred is not None: max_pred_inv = 1.0/self.max_pred pred[pred < max_pred_inv] = max_pred_inv # also return scales return (pred, scales) ================================================ FILE: modules/midas/normalization.py ================================================ VOID_INTERMEDIATE = { "dpt_beit_large_512" : { "void_150" : { "mean" : {"int_depth" : 0.730, "int_scales" : 0.380}, "std" : {"int_depth" : 0.226, "int_scales" : 0.102}, }, "void_500" : { "mean" : {"int_depth" : 0.736, "int_scales" : 0.366}, "std" : {"int_depth" : 0.232, "int_scales" : 0.099}, }, "void_1500" : { "mean" : {"int_depth" : 0.730, "int_scales" : 0.355}, "std" : {"int_depth" : 0.232, "int_scales" : 0.096}, }, }, "dpt_swin2_large_384" : { "void_150" : { "mean" : {"int_depth" : 0.730, "int_scales" : 0.402}, "std" : {"int_depth" : 0.219, "int_scales" : 0.107}, }, "void_500" : { "mean" : {"int_depth" : 0.736, "int_scales" : 0.389}, "std" : {"int_depth" : 0.224, "int_scales" : 0.106}, }, "void_1500" : { "mean" : {"int_depth" : 0.730, "int_scales" : 0.377}, "std" : {"int_depth" : 0.226, "int_scales" : 0.103}, }, }, "dpt_large" : { "void_150" : { "mean" : {"int_depth" : 0.729, "int_scales" : 0.403}, "std" : {"int_depth" : 0.213, "int_scales" : 0.116}, }, "void_500" : { "mean" : {"int_depth" : 0.735, "int_scales" : 0.390}, "std" : {"int_depth" : 0.219, "int_scales" : 0.116}, }, "void_1500" : { "mean" : {"int_depth" : 0.730, "int_scales" : 0.380}, "std" : {"int_depth" : 0.221, "int_scales" : 0.116}, }, }, "dpt_hybrid": { "void_150" : { "mean" : {"int_depth" : 0.729, "int_scales" : 0.404}, "std" : {"int_depth" : 0.210, "int_scales" : 0.117}, }, "void_500" : { "mean" : {"int_depth" : 0.735, "int_scales" : 0.392}, "std" : {"int_depth" : 0.215, "int_scales" : 0.118}, }, "void_1500" : { "mean" : {"int_depth" : 0.730, "int_scales" : 0.381}, "std" : {"int_depth" : 0.218, "int_scales" : 0.117}, }, }, "dpt_swin2_tiny_256" : { "void_150" : { "mean" : {"int_depth" : 0.735, "int_scales" : 0.419}, "std" : {"int_depth" : 0.207, "int_scales" : 0.122}, }, "void_500" : { "mean" : {"int_depth" : 0.741, "int_scales" : 0.406}, "std" : {"int_depth" : 0.212, "int_scales" : 0.124}, }, "void_1500" : { "mean" : {"int_depth" : 0.733, "int_scales" : 0.396}, "std" : {"int_depth" : 0.213, "int_scales" : 0.125}, }, }, "dpt_levit_224" : { "void_150" : { "mean" : {"int_depth" : 0.734, "int_scales" : 0.421}, "std" : {"int_depth" : 0.198, "int_scales" : 0.129}, }, "void_500" : { "mean" : {"int_depth" : 0.740, "int_scales" : 0.410}, "std" : {"int_depth" : 0.202, "int_scales" : 0.134}, }, "void_1500" : { "mean" : {"int_depth" : 0.734, "int_scales" : 0.400}, "std" : {"int_depth" : 0.204, "int_scales" : 0.137}, }, }, "midas_small" : { "void_150" : { "mean" : {"int_depth" : 0.723, "int_scales" : 0.402}, "std" : {"int_depth" : 0.190, "int_scales" : 0.132}, }, "void_500" : { "mean" : {"int_depth" : 0.731, "int_scales" : 0.393}, "std" : {"int_depth" : 0.196, "int_scales" : 0.136}, }, "void_1500" : { "mean" : {"int_depth" : 0.728, "int_scales" : 0.385}, "std" : {"int_depth" : 0.199, "int_scales" : 0.140}, }, }, } ================================================ FILE: modules/midas/transforms.py ================================================ import numpy as np import cv2 import math import torch import torchvision.transforms as transforms from modules.midas.utils import normalize_unit_range import modules.midas.normalization as normalization class Resize(object): """Resize sample to given size (width, height). """ def __init__( self, width, height, resize_target=True, keep_aspect_ratio=False, ensure_multiple_of=1, resize_method="lower_bound", image_interpolation_method=cv2.INTER_AREA, ): """Init. Args: width (int): desired output width height (int): desired output height resize_target (bool, optional): True: Resize the full sample (image, mask, target). False: Resize image only. Defaults to True. keep_aspect_ratio (bool, optional): True: Keep the aspect ratio of the input sample. Output sample might not have the given width and height, and resize behaviour depends on the parameter 'resize_method'. Defaults to False. ensure_multiple_of (int, optional): Output width and height is constrained to be multiple of this parameter. Defaults to 1. resize_method (str, optional): "lower_bound": Output will be at least as large as the given size. "upper_bound": Output will be at max as large as the given size. (Output size might be smaller than given size.) "minimal": Scale as least as possible. (Output size might be smaller than given size.) Defaults to "lower_bound". """ self.__width = width self.__height = height self.__resize_target = resize_target self.__keep_aspect_ratio = keep_aspect_ratio self.__multiple_of = ensure_multiple_of self.__resize_method = resize_method self.__image_interpolation_method = image_interpolation_method def constrain_to_multiple_of(self, x, min_val=0, max_val=None): y = (np.round(x / self.__multiple_of) * self.__multiple_of).astype(int) if max_val is not None and y > max_val: y = (np.floor(x / self.__multiple_of) * self.__multiple_of).astype(int) if y < min_val: y = (np.ceil(x / self.__multiple_of) * self.__multiple_of).astype(int) return y def get_size(self, width, height): # determine new height and width scale_height = self.__height / height scale_width = self.__width / width if self.__keep_aspect_ratio: if self.__resize_method == "lower_bound": # scale such that output size is lower bound if scale_width > scale_height: # fit width scale_height = scale_width else: # fit height scale_width = scale_height elif self.__resize_method == "upper_bound": # scale such that output size is upper bound if scale_width < scale_height: # fit width scale_height = scale_width else: # fit height scale_width = scale_height elif self.__resize_method == "minimal": # scale as least as possbile if abs(1 - scale_width) < abs(1 - scale_height): # fit width scale_height = scale_width else: # fit height scale_width = scale_height else: raise ValueError( f"resize_method {self.__resize_method} not implemented" ) if self.__resize_method == "lower_bound": new_height = self.constrain_to_multiple_of( scale_height * height, min_val=self.__height ) new_width = self.constrain_to_multiple_of( scale_width * width, min_val=self.__width ) elif self.__resize_method == "upper_bound": new_height = self.constrain_to_multiple_of( scale_height * height, max_val=self.__height ) new_width = self.constrain_to_multiple_of( scale_width * width, max_val=self.__width ) elif self.__resize_method == "minimal": new_height = self.constrain_to_multiple_of(scale_height * height) new_width = self.constrain_to_multiple_of(scale_width * width) else: raise ValueError(f"resize_method {self.__resize_method} not implemented") return (new_width, new_height) def __call__(self, sample): width, height = self.get_size( sample["image"].shape[1], sample["image"].shape[0] ) # resize sample for item in sample.keys(): interpolation_method = self.__image_interpolation_method sample[item] = cv2.resize( sample[item], (width, height), interpolation=interpolation_method, ) if self.__resize_target: if "depth" in sample: sample["depth"] = cv2.resize( sample["depth"], (width, height), interpolation=cv2.INTER_NEAREST ) if "mask" in sample: sample["mask"] = cv2.resize( sample["mask"].astype(np.float32), (width, height), interpolation=cv2.INTER_NEAREST, ) sample["mask"] = sample["mask"].astype(bool) return sample class NormalizeImage(object): """Normalize image by given mean and std. """ def __init__(self, mean, std): self.__mean = mean self.__std = std def __call__(self, sample): sample["image"] = (sample["image"] - self.__mean) / self.__std return sample class NormalizeIntermediate(object): """Normalize intermediate data by given mean and std. """ def __init__(self, mean, std): self.__int_depth_mean = mean["int_depth"] self.__int_depth_std = std["int_depth"] self.__int_scales_mean = mean["int_scales"] self.__int_scales_std = std["int_scales"] def __call__(self, sample): if "int_depth" in sample and sample["int_depth"] is not None: sample["int_depth"] = (sample["int_depth"] - self.__int_depth_mean) / self.__int_depth_std if "int_scales" in sample and sample["int_scales"] is not None: sample["int_scales"] = (sample["int_scales"] - self.__int_scales_mean) / self.__int_scales_std return sample class PrepareForNet(object): """Prepare sample for usage as network input. """ def __init__(self): pass def __call__(self, sample): for item in sample.keys(): if sample[item] is None: pass elif item == "image": image = np.transpose(sample["image"], (2, 0, 1)) sample["image"] = np.ascontiguousarray(image).astype(np.float32) else: array = sample[item].astype(np.float32) array = np.expand_dims(array, axis=0) # add channel dim sample[item] = np.ascontiguousarray(array) return sample class Tensorize(object): """Convert sample to tensor. """ def __init__(self): pass def __call__(self, sample): for item in sample.keys(): if sample[item] is None: pass else: # before tensorizing, verify that data is clean assert not np.any(np.isnan(sample[item])) sample[item] = torch.Tensor(sample[item]) return sample def get_transforms(depth_predictor, sparsifier, nsamples): image_mean_dict = { "dpt_beit_large_512" : [0.5, 0.5, 0.5], "dpt_swin2_large_384" : [0.5, 0.5, 0.5], "dpt_large" : [0.5, 0.5, 0.5], "dpt_hybrid" : [0.5, 0.5, 0.5], "dpt_swin2_tiny_256" : [0.5, 0.5, 0.5], "dpt_levit_224" : [0.5, 0.5, 0.5], "midas_small" : [0.485, 0.456, 0.406], } image_std_dict = { "dpt_beit_large_512" : [0.5, 0.5, 0.5], "dpt_swin2_large_384" : [0.5, 0.5, 0.5], "dpt_large" : [0.5, 0.5, 0.5], "dpt_hybrid" : [0.5, 0.5, 0.5], "dpt_swin2_tiny_256" : [0.5, 0.5, 0.5], "dpt_levit_224" : [0.5, 0.5, 0.5], "midas_small" : [0.229, 0.224, 0.225], } resize_method_dict = { "dpt_beit_large_512" : "minimal", "dpt_swin2_large_384" : "minimal", "dpt_large" : "minimal", "dpt_hybrid" : "minimal", "dpt_swin2_tiny_256" : "minimal", "dpt_levit_224" : "minimal", "midas_small" : "upper_bound", } resize_dict = { "dpt_beit_large_512" : 384, "dpt_swin2_large_384" : 384, "dpt_large" : 384, "dpt_hybrid" : 384, "dpt_swin2_tiny_256" : 256, "dpt_levit_224" : 224, "midas_small" : 384, } keep_aspect_ratio = True if "swin2" in depth_predictor or "levit" in depth_predictor: keep_aspect_ratio = False depth_model_transform_steps = [ Resize( width=resize_dict[depth_predictor], height=resize_dict[depth_predictor], resize_target=False, keep_aspect_ratio=keep_aspect_ratio, ensure_multiple_of=32, resize_method=resize_method_dict[depth_predictor], image_interpolation_method=cv2.INTER_CUBIC, ), NormalizeImage( mean=image_mean_dict[depth_predictor], std=image_std_dict[depth_predictor] ), PrepareForNet(), Tensorize(), ] sml_model_transform_steps = [ Resize( width=384, height=384, resize_target=False, keep_aspect_ratio=True, ensure_multiple_of=32, resize_method=resize_method_dict["midas_small"], image_interpolation_method=cv2.INTER_CUBIC, ), NormalizeIntermediate( mean=normalization.VOID_INTERMEDIATE[depth_predictor][f"{sparsifier}_{nsamples}"]["mean"], std=normalization.VOID_INTERMEDIATE[depth_predictor][f"{sparsifier}_{nsamples}"]["std"], ), PrepareForNet(), Tensorize(), ] return { "depth_model" : transforms.Compose(depth_model_transform_steps), "sml_model" : transforms.Compose(sml_model_transform_steps), } ================================================ FILE: modules/midas/utils.py ================================================ """Utils for monoDepth. """ import sys import re import numpy as np import cv2 import torch def read_pfm(path): """Read pfm file. Args: path (str): path to file Returns: tuple: (data, scale) """ with open(path, "rb") as file: color = None width = None height = None scale = None endian = None header = file.readline().rstrip() if header.decode("ascii") == "PF": color = True elif header.decode("ascii") == "Pf": color = False else: raise Exception("Not a PFM file: " + path) dim_match = re.match(r"^(\d+)\s(\d+)\s$", file.readline().decode("ascii")) if dim_match: width, height = list(map(int, dim_match.groups())) else: raise Exception("Malformed PFM header.") scale = float(file.readline().decode("ascii").rstrip()) if scale < 0: # little-endian endian = "<" scale = -scale else: # big-endian endian = ">" data = np.fromfile(file, endian + "f") shape = (height, width, 3) if color else (height, width) data = np.reshape(data, shape) data = np.flipud(data) return data, scale def write_pfm(path, image, scale=1): """Write pfm file. Args: path (str): pathto file image (array): data scale (int, optional): Scale. Defaults to 1. """ with open(path, "wb") as file: color = None if image.dtype.name != "float32": raise Exception("Image dtype must be float32.") image = np.flipud(image) if len(image.shape) == 3 and image.shape[2] == 3: # color image color = True elif ( len(image.shape) == 2 or len(image.shape) == 3 and image.shape[2] == 1 ): # greyscale color = False else: raise Exception("Image must have H x W x 3, H x W x 1 or H x W dimensions.") file.write("PF\n" if color else "Pf\n".encode()) file.write("%d %d\n".encode() % (image.shape[1], image.shape[0])) endian = image.dtype.byteorder if endian == "<" or endian == "=" and sys.byteorder == "little": scale = -scale file.write("%f\n".encode() % scale) image.tofile(file) def read_image(path): """Read image and output RGB image (0-1). Args: path (str): path to file Returns: array: RGB image (0-1) """ img = cv2.imread(path) if img.ndim == 2: img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) / 255.0 return img def resize_image(img): """Resize image and make it fit for network. Args: img (array): image Returns: tensor: data ready for network """ height_orig = img.shape[0] width_orig = img.shape[1] if width_orig > height_orig: scale = width_orig / 384 else: scale = height_orig / 384 height = (np.ceil(height_orig / scale / 32) * 32).astype(int) width = (np.ceil(width_orig / scale / 32) * 32).astype(int) img_resized = cv2.resize(img, (width, height), interpolation=cv2.INTER_AREA) img_resized = ( torch.from_numpy(np.transpose(img_resized, (2, 0, 1))).contiguous().float() ) img_resized = img_resized.unsqueeze(0) return img_resized def resize_depth(depth, width, height): """Resize depth map and bring to CPU (numpy). Args: depth (tensor): depth width (int): image width height (int): image height Returns: array: processed depth """ depth = torch.squeeze(depth[0, :, :, :]).to("cpu") depth_resized = cv2.resize( depth.numpy(), (width, height), interpolation=cv2.INTER_CUBIC ) return depth_resized def write_depth(path, depth, bits=1): """Write depth map to pfm and png file. Args: path (str): filepath without extension depth (array): depth """ write_pfm(path + ".pfm", depth.astype(np.float32)) depth_min = depth.min() depth_max = depth.max() max_val = (2**(8*bits))-1 if depth_max - depth_min > np.finfo("float").eps: out = max_val * (depth - depth_min) / (depth_max - depth_min) else: out = np.zeros(depth.shape, dtype=depth.type) if bits == 1: cv2.imwrite(path + ".png", out.astype("uint8")) elif bits == 2: cv2.imwrite(path + ".png", out.astype("uint16")) return def write_png(path, array, bits=2): """Write array to png file. Args: path (str): filepath without extension array (array): array to be saved """ array_min = np.min(array) array_max = np.max(array) max_val = (2**(8*bits))-1 if array_max - array_min > np.finfo("float").eps: out = max_val * (array - array_min) / (array_max - array_min) else: print(f"zero array not being saved at {path}") return if bits == 1: cv2.imwrite(path + ".png", out.astype("uint8")) elif bits == 2: cv2.imwrite(path + ".png", out.astype("uint16")) return def normalize_unit_range(data): """Normalize data array to [0, 1] range. Args: data (array): input array Returns: array: normalized array """ if np.max(data) - np.min(data) > np.finfo("float").eps: normalized = (data - np.min(data)) / (np.max(data) - np.min(data)) else: raise ValueError("cannot normalize array, max-min range is 0") return normalized ================================================ FILE: pipeline.py ================================================ import torch import numpy as np from modules.midas.midas_net_custom import MidasNet_small_videpth from modules.estimator import LeastSquaresEstimator from modules.interpolator import Interpolator2D import modules.midas.transforms as transforms import modules.midas.utils as utils class VIDepth(object): def __init__(self, depth_predictor, nsamples, sml_model_path, min_pred, max_pred, min_depth, max_depth, device): # get transforms model_transforms = transforms.get_transforms(depth_predictor, "void", str(nsamples)) self.depth_model_transform = model_transforms["depth_model"] self.ScaleMapLearner_transform = model_transforms["sml_model"] # define depth model if depth_predictor == "dpt_beit_large_512": self.DepthModel = torch.hub.load("intel-isl/MiDaS", "DPT_BEiT_L_512") elif depth_predictor == "dpt_swin2_large_384": self.DepthModel = torch.hub.load("intel-isl/MiDaS", "DPT_SwinV2_L_384") elif depth_predictor == "dpt_large": self.DepthModel = torch.hub.load("intel-isl/MiDaS", "DPT_Large") elif depth_predictor == "dpt_hybrid": self.DepthModel = torch.hub.load("intel-isl/MiDaS", "DPT_Hybrid") elif depth_predictor == "dpt_swin2_tiny_256": self.DepthModel = torch.hub.load("intel-isl/MiDaS", "DPT_SwinV2_T_256") elif depth_predictor == "dpt_levit_224": self.DepthModel = torch.hub.load("intel-isl/MiDaS", "DPT_LeViT_224") elif depth_predictor == "midas_small": self.DepthModel = torch.hub.load("intel-isl/MiDaS", "MiDaS_small") else: self.DepthModel = None # define SML model self.ScaleMapLearner = MidasNet_small_videpth( path=sml_model_path, min_pred=min_pred, max_pred=max_pred, ) # depth prediction ranges self.min_pred, self.max_pred = min_pred, max_pred # depth evaluation ranges self.min_depth, self.max_depth = min_depth, max_depth # eval mode self.DepthModel.eval() self.DepthModel.to(device) # eval mode self.ScaleMapLearner.eval() self.ScaleMapLearner.to(device) def run(self, input_image, input_sparse_depth, validity_map, device): input_height, input_width = np.shape(input_image)[0], np.shape(input_image)[1] sample = {"image" : input_image} sample = self.depth_model_transform(sample) im = sample["image"].to(device) input_sparse_depth_valid = (input_sparse_depth < self.max_depth) * (input_sparse_depth > self.min_depth) if validity_map is not None: input_sparse_depth_valid *= validity_map.astype(np.bool) input_sparse_depth_valid = input_sparse_depth_valid.astype(bool) input_sparse_depth[~input_sparse_depth_valid] = np.inf # set invalid depth input_sparse_depth = 1.0 / input_sparse_depth # run depth model with torch.no_grad(): depth_pred = self.DepthModel.forward(im.unsqueeze(0)) depth_pred = ( torch.nn.functional.interpolate( depth_pred.unsqueeze(1), size=(input_height, input_width), mode="bicubic", align_corners=False, ) .squeeze() .cpu() .numpy() ) # global scale and shift alignment GlobalAlignment = LeastSquaresEstimator( estimate=depth_pred, target=input_sparse_depth, valid=input_sparse_depth_valid ) GlobalAlignment.compute_scale_and_shift() GlobalAlignment.apply_scale_and_shift() GlobalAlignment.clamp_min_max(clamp_min=self.min_pred, clamp_max=self.max_pred) int_depth = GlobalAlignment.output.astype(np.float32) # interpolation of scale map assert (np.sum(input_sparse_depth_valid) >= 3), "not enough valid sparse points" ScaleMapInterpolator = Interpolator2D( pred_inv = int_depth, sparse_depth_inv = input_sparse_depth, valid = input_sparse_depth_valid, ) ScaleMapInterpolator.generate_interpolated_scale_map( interpolate_method='linear', fill_corners=False ) int_scales = ScaleMapInterpolator.interpolated_scale_map.astype(np.float32) int_scales = utils.normalize_unit_range(int_scales) sample = {"image" : input_image, "int_depth" : int_depth, "int_scales" : int_scales, "int_depth_no_tf" : int_depth} sample = self.ScaleMapLearner_transform(sample) x = torch.cat([sample["int_depth"], sample["int_scales"]], 0) x = x.to(device) d = sample["int_depth_no_tf"].to(device) # run SML model with torch.no_grad(): sml_pred, sml_scales = self.ScaleMapLearner.forward(x.unsqueeze(0), d.unsqueeze(0)) sml_pred = ( torch.nn.functional.interpolate( sml_pred, size=(input_height, input_width), mode="bicubic", align_corners=False, ) .squeeze() .cpu() .numpy() ) output = { "ga_depth" : int_depth, "sml_depth" : sml_pred, } return output ================================================ FILE: run.py ================================================ import os import argparse import glob import torch import numpy as np from PIL import Image import modules.midas.utils as utils import pipeline def load_input_image(input_image_fp): return utils.read_image(input_image_fp) def load_sparse_depth(input_sparse_depth_fp): input_sparse_depth = np.array(Image.open(input_sparse_depth_fp), dtype=np.float32) / 256.0 input_sparse_depth[input_sparse_depth <= 0] = 0.0 return input_sparse_depth def run(depth_predictor, nsamples, sml_model_path, min_pred, max_pred, min_depth, max_depth, input_path, output_path, save_output): device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print("device: %s" % device) # instantiate method method = pipeline.VIDepth( depth_predictor, nsamples, sml_model_path, min_pred, max_pred, min_depth, max_depth, device ) # get inputs img_names = glob.glob(os.path.join(input_path, "image", "*")) num_images = len(img_names) # create output folders if save_output: os.makedirs(os.path.join(output_path, 'ga_depth'), exist_ok=True) os.makedirs(os.path.join(output_path, 'sml_depth'), exist_ok=True) for ind, input_image_fp in enumerate(img_names): if os.path.isdir(input_image_fp): continue print(" processing {} ({}/{})".format(input_image_fp, ind + 1, num_images)) input_image = load_input_image(input_image_fp) input_sparse_depth_fp = input_image_fp.replace("image", "sparse_depth") input_sparse_depth = load_sparse_depth(input_sparse_depth_fp) # values in the [min_depth, max_depth] range are considered valid; # an additional validity map may be specified validity_map = None # run method output = method.run(input_image, input_sparse_depth, validity_map, device) if save_output: basename = os.path.splitext(os.path.basename(input_image_fp))[0] # saving depth map after global alignment utils.write_depth( os.path.join(output_path, 'ga_depth', basename), output["ga_depth"], bits=2 ) # saving depth map after local alignment with SML utils.write_depth( os.path.join(output_path, 'sml_depth', basename), output["sml_depth"], bits=2 ) if __name__=="__main__": parser = argparse.ArgumentParser() # model parameters parser.add_argument('-dp', '--depth-predictor', type=str, default='dpt_hybrid', help='Name of depth predictor to use in pipeline.') parser.add_argument('-ns', '--nsamples', type=int, default=150, help='Number of sparse metric depth samples available.') parser.add_argument('-sm', '--sml-model-path', type=str, default='', help='Path to trained SML model weights.') # depth parameters parser.add_argument('--min-pred', type=float, default=0.1, help='Min bound for predicted depth values.') parser.add_argument('--max-pred', type=float, default=8.0, help='Max bound for predicted depth values.') parser.add_argument('--min-depth', type=float, default=0.2, help='Min valid depth when evaluating.') parser.add_argument('--max-depth', type=float, default=5.0, help='Max valid depth when evaluating.') # I/O paths parser.add_argument('-i', '--input-path', type=str, default='./input', help='Path to inputs.') parser.add_argument('-o', '--output-path', type=str, default='./output', help='Path to outputs.') parser.add_argument('--save-output', dest='save_output', action='store_true', help='Save output depth map.') parser.set_defaults(save_output=False) args = parser.parse_args() print(args) run( args.depth_predictor, args.nsamples, args.sml_model_path, args.min_pred, args.max_pred, args.min_depth, args.max_depth, args.input_path, args.output_path, args.save_output )