Repository: doubleZ0108/GeoMVSNet
Branch: master
Commit: 09167fc95f04
Files: 48
Total size: 230.0 KB
Directory structure:
gitextract_c1590sn0/
├── .gitignore
├── LICENSE
├── README.md
├── datasets/
│ ├── __init__.py
│ ├── blendedmvs.py
│ ├── data_io.py
│ ├── dtu.py
│ ├── evaluations/
│ │ └── dtu_parallel/
│ │ ├── BaseEval2Obj_web.m
│ │ ├── BaseEvalMain_web.m
│ │ ├── ComputeStat_web.m
│ │ ├── MaxDistCP.m
│ │ ├── PointCompareMain.m
│ │ ├── plyread.m
│ │ └── reducePts_haa.m
│ ├── lists/
│ │ ├── blendedmvs/
│ │ │ ├── low_res_all.txt
│ │ │ └── val.txt
│ │ ├── dtu/
│ │ │ ├── test.txt
│ │ │ ├── train.txt
│ │ │ └── val.txt
│ │ └── tnt/
│ │ ├── advanced.txt
│ │ └── intermediate.txt
│ └── tnt.py
├── fusions/
│ ├── dtu/
│ │ ├── _open3d.py
│ │ ├── gipuma.py
│ │ └── pcd.py
│ └── tnt/
│ └── dypcd.py
├── models/
│ ├── __init__.py
│ ├── filter.py
│ ├── geometry.py
│ ├── geomvsnet.py
│ ├── loss.py
│ ├── submodules.py
│ └── utils/
│ ├── __init__.py
│ ├── opts.py
│ └── utils.py
├── outputs/
│ └── visual.ipynb
├── requirements.txt
├── scripts/
│ ├── blend/
│ │ └── train_blend.sh
│ ├── data_path.sh
│ ├── dtu/
│ │ ├── fusion_dtu.sh
│ │ ├── matlab_quan_dtu.sh
│ │ ├── test_dtu.sh
│ │ ├── train_dtu.sh
│ │ └── train_dtu_raw.sh
│ └── tnt/
│ ├── fusion_tnt.sh
│ └── test_tnt.sh
├── test.py
└── train.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
.DS_Store
__pycache__
================================================
FILE: LICENSE
================================================
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
================================================
FILE: README.md
================================================
GeoMVSNet: Learning Multi-View Stereo With Geometry Perception (CVPR 2023)
## 🔨 Setup
### 1.1 Requirements
Use the following commands to build the `conda` environment.
```bash
conda create -n geomvsnet python=3.8
conda activate geomvsnet
pip install -r requirements.txt
```
### 1.2 Datasets
Download the following datasets and modify the corresponding local path in `scripts/data_path.sh`.
#### DTU Dataset
**Training data**. We use the same DTU training data as mentioned in MVSNet and CasMVSNet, please refer to [DTU training data](https://drive.google.com/file/d/1eDjh-_bxKKnEuz5h-HXS7EDJn59clx6V/view) and [Depth raw](https://virutalbuy-public.oss-cn-hangzhou.aliyuncs.com/share/cascade-stereo/CasMVSNet/dtu_data/dtu_train_hr/Depths_raw.zip) for data download. Optional, you should download the [Recitfied raw](http://roboimagedata2.compute.dtu.dk/data/MVS/Rectified.zip) if you want to train the model in raw image resolution. Unzip and organize them as:
```
dtu/
├── Cameras
├── Depths
├── Depths_raw
├── Rectified
└── Rectified_raw (optional)
```
**Testing data**. For convenience, we use the [DTU testing data](https://drive.google.com/file/d/1rX0EXlUL4prRxrRu2DgLJv2j7-tpUD4D/view?usp=sharing) processed by CVP-MVSNet. Also unzip and organize it as:
```
dtu-test/
├── Cameras
├── Depths
└── Rectified
```
> Please note that the images and lighting here are consistent with the original dataset.
#### BlendedMVS Dataset
Download the low image resolution version of [BlendedMVS dataset](https://drive.google.com/file/d/1ilxls-VJNvJnB7IaFj7P0ehMPr7ikRCb/view) and unzip it as:
```
blendedmvs/
└── dataset_low_res
├── ...
└── 5c34529873a8df509ae57b58
```
#### Tanks and Temples Dataset
Download the intermediate and advanced subsets of [Tanks and Temples dataset](https://drive.google.com/file/d/1YArOJaX9WVLJh4757uE8AEREYkgszrCo/view) and unzip them. If you want to use the short range version of camera parameters for `Intermediate` subset, unzip `short_range_caemeras_for_mvsnet.zip` and move `cam_[]` to the corresponding scenarios.
```
tnt/
├── advanced
│ ├── ...
│ └── Temple
│ ├── cams
│ ├── images
│ ├── pair.txt
│ └── Temple.log
└── intermediate
├── ...
└── Train
├── cams
├── cams_train
├── images
├── pair.txt
└── Train.log
```
## 🚂 Training
You can train GeoMVSNet from scratch on DTU dataset and BlendedMVS dataset. After suitable setting and training, you can get the training checkpoints model in `checkpoints/[Dataset]/[THISNAME]`, and the following outputs lied in the folder:
- `events.out.tfevents*`: you can use `tensorboard` to monitor the training process.
- `model_[epoch].ckpt`: we save a checkpoint every `--save_freq`.
- `train-[TIME].log`: logged the detailed training message, you can refer to appropiate indicators to judge the quality of training.
### 2.1 DTU
To train GeoMVSNet on DTU dataset, you can refer to `scripts/dtu/train_dtu.sh`, specify `THISNAME`, `CUDA_VISIBLE_DEVICES`, `batch_size`, etc. to meet your demand. And run:
```bash
bash scripts/dtu/train_dtu.sh
```
The default training strategy we provide is the *distributed* training mode. If you want to use the *general* training mode, you can refer to the following code.
general training script
```bash
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train.py ${@} \
--which_dataset="dtu" --epochs=16 --logdir=$LOG_DIR \
--trainpath=$DTU_TRAIN_ROOT --testpath=$DTU_TRAIN_ROOT \
--trainlist="datasets/lists/dtu/train.txt" --testlist="datasets/lists/dtu/test.txt" \
\
--data_scale="mid" --n_views="5" --batch_size=16 --lr=0.025 --robust_train \
--lrepochs="1,3,5,7,9,11,13,15:1.5"
```
> It should be noted that two different training strategies need to adjust the `batch_size` and `lr` parameters to achieve the best training results.
### 2.2 BlendedMVS
To train GeoMVSNet on BlendedMVS dataset, you can refer to `scripts/bled/train_blend.sh`, and also specify `THISNAME`, `CUDA_VISIBLE_DEVICES`, `batch_size`, etc. to meet your demand. And run:
```bash
bash scripts/blend/train_blend.sh
```
By default, we use `7` viewpoints as input for the BlendedMVS training. Similarly, you can choose to use the *distributed* training mode or the *general* one as mentioned in 2.1.
## ⚗️ Testing
### 3.1 DTU
For DTU testing, we use model trained on DTU training dataset. You can basically download our [DTU pretrained model](https://drive.google.com/file/d/147_UbjE87E-HB9sZ5yLDbckynH825nJd/view?usp=sharing) and put it into `checkpoints/dtu/geomvsnet/`. And perform *depth map estimation, point cloud fusion, and result evaluation* according to the following steps.
1. Run `bash scripts/dtu/test_dtu.sh` for depth map estimation. The results will be stored in `outputs/dtu/[THISNAME]/`, each scan folder holding `depth_est` and `confidence`, etc.
- Use `outputs/visual.ipynb` for depth map visualization.
2. Run `bash scripts/dtu/fusion_dtu.sh` for point cloud fusion. We provide 3 different fusion methods, and we recommend the `open3d` option by default. After fusion, you can get `[FUSION_METHOD]_fusion_plys` under the experiment output folder, point clouds of each testing scan are there.
(Optional) If you want to use the "Gipuma" fusion method.
1. Clone the [edited fusibile repo](https://github.com/YoYo000/fusibile).
2. Refer to [fusibile configuration blog (Chinese)](https://zhuanlan.zhihu.com/p/460212787) for building details.
3. Create a new python2.7 conda env.
```bash
conda create -n fusibile python=2.7
conda install scipy matplotlib
conda install tensorflow==1.14.0
conda install -c https://conda.anaconda.org/menpo opencv
```
4. Use the `fusibile` conda environment for `gipuma` fusion method.
3. Download the [ObsMask](http://roboimagedata2.compute.dtu.dk/data/MVS/SampleSet.zip) and [Points](http://roboimagedata2.compute.dtu.dk/data/MVS/Points.zip) of DTU GT point clouds from the official website and organize them as:
```
dtu-evaluation/
├── ObsMask
└── Points
```
4. Setup `Matlab` in command line mode, and run `bash scripts/dtu/matlab_quan_dtu.sh`. You can adjust the `num_at_once` config according to your machine's CPU and memory ceiling. After quantitative evaluation, you will get `[FUSION_METHOD]_quantitative/` and `[THISNAME].log` just store the quantitative results.
### 3.2 Tanks and Temples
For testing on [Tanks and Temples benchmark](https://www.tanksandtemples.org/leaderboard/), you can use any of the following configurations:
- Only train on DTU training dataset.
- Only train on BlendedMVS dataset.
- Pretrained on DTU training dataset and finetune on BlendedMVS dataset. (Recommend)
After your personal training, also follow these steps:
1. Run `bash scripts/tnt/test_tnt.sh` for depth map estimation. The results will be stored in `outputs/[TRAINING_DATASET]/[THISNAME]/`.
- Use `outputs/visual.ipynb` for depth map visualization.
2. Run `bash scripts/tnt/fusion_tnt.sh` for point cloud fusion. We provide the popular dynamic fusion strategy, and you can tune the fusion threshold in `fusions/tnt/dypcd.py`.
3. Follow the *Upload Instructions* on the [T&T official website](https://www.tanksandtemples.org/submit/) to make online submissions.
### 3.3 Custom Data (TODO)
GeoMVSNet can reconstruct on custom data. At present, you can refer to [MVSNet](https://github.com/YoYo000/MVSNet#file-formats) to organize your data, and refer to the same steps as above for *depth estimation* and *point cloud fusion*.
## 💡 Results
Our results on DTU and Tanks and Temples Dataset are listed in the tables.
| DTU Dataset | Acc. ↓ | Comp. ↓ | Overall ↓ |
| ----------- | ------ | ------- | --------- |
| GeoMVSNet | 0.3309 | 0.2593 | 0.2951 |
| T&T (Intermediate) | Mean ↑ | Family | Francis | Horse | Lighthouse | M60 | Panther | Playground | Train |
| ------------------ | ------ | ------ | ------- | ----- | ---------- | ----- | ------- | ---------- | ----- |
| GeoMVSNet | 65.89 | 81.64 | 67.53 | 55.78 | 68.02 | 65.49 | 67.19 | 63.27 | 58.22 |
| T&T (Advanced) | Mean ↑ | Auditorium | Ballroom | Courtroom | Museum | Palace | Temple |
| -------------- | ------ | ---------- | -------- | --------- | ------ | ------ | ------ |
| GeoMVSNet | 41.52 | 30.23 | 46.53 | 39.98 | 53.05 | 35.98 | 43.34 |
And you can download our [Point Cloud](https://disk.pku.edu.cn:443/link/69D473126C509C8DCBCC7E233FAAEEAA) and [Estimated Depth](https://disk.pku.edu.cn:443/link/4217EB2F063D2B10EDC711F54A12B5F7) for academic usage.
🌟 About Reproduce Paper Results
In our experiment, we found that the reproduction of MVS network is relatively difficult. Therefore, we summarize some of the problems encountered in our experiment as follows, hoping to be helpful to you.
**Q1. GPU Architecture Matters.**
There are two commonly used NVIDIA GPU series: GeForce RTX (e.g. 4090Ti, 3090Ti, 2090Ti) and Tesla (e.g. V100, T4). We find that there is generally no performance degradation in training and testing on the same series of GPUs. But on the contrary, for example, if you train on V100 and test on 3090Ti, the visual effect of the depth map looks exactly the same, but each pixel value is not exactly the same. We conjecture that the two series or architectures differ in numerical computation and processing precision.
> Our pretrained model is trained on NVIDIA V100 GPUs.
**Q2. Pytorch Version Matters.**
Different Cuda versions will result in different optional Pytorch versions. Different torch versions will affect the accuracy of network training and testing. One of the reasons we found is that the implementation and parameter control of the `F.grid_sample()` are various in different versions of Pytorch.
**Q3. Training Hyperparameters Matters.**
In the era of neural network, hyperparameters really matter. We made some network hyperparameters tuning, but it may not be the same as your configuration. Most fundamentally, due to differences in GPU graphics memory, you need to synchronize `batch_size` and `lr`. And the schedule of learning rate also matters.
**Q4. Testing Epoch Matters.**
By default, our model will train 16 epochs. But how to select the best training model for testing to achieve the best performance? One solution is to use [PyTorch-lightning](https://lightning.ai/docs/pytorch/latest/starter/introduction.html). For simplicity, you can decide which checkpoint to use based on the `.log` file we provide.
**Q5. Fusion Hyperparameters Matters.**
For both DTU and T&T datasets, the hyperparameters of point cloud fusion greatly affect the final performance. We have provided different fusion strategies and easy access to adjust parameters. Maybe you need to know the temperament of your model.
Qx. Others, you can [raise an issue](https://github.com/doubleZ0108/GeoMVSNet/issues/new/choose) if you meet other problems.
## ⚖️ Citation
```
@InProceedings{zhe2023geomvsnet,
title={GeoMVSNet: Learning Multi-View Stereo With Geometry Perception},
author={Zhang, Zhe and Peng, Rui and Hu, Yuxi and Wang, Ronggang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={21508--21518},
year={2023}
}
```
## 💌 Acknowledgements
This repository is partly based on [MVSNet](https://github.com/YoYo000/MVSNet), [MVSNet-pytorch](https://github.com/xy-guo/MVSNet_pytorch), [CVP-MVSNet](https://github.com/JiayuYANG/CVP-MVSNet), [cascade-stereo](https://github.com/alibaba/cascade-stereo), [MVSTER](https://github.com/JeffWang987/MVSTER).
We appreciate their contributions to the MVS community.
================================================
FILE: datasets/__init__.py
================================================
================================================
FILE: datasets/blendedmvs.py
================================================
# -*- coding: utf-8 -*-
# @Description: Data preprocessing and organization for BlendedMVS dataset.
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07
import os
import cv2
import random
import numpy as np
from PIL import Image
from torch.utils.data import Dataset
from torchvision import transforms as T
from datasets.data_io import *
def motion_blur(img: np.ndarray, max_kernel_size=3):
# Either vertial, hozirontal or diagonal blur
mode = np.random.choice(['h', 'v', 'diag_down', 'diag_up'])
ksize = np.random.randint(0, (max_kernel_size + 1) / 2) * 2 + 1 # make sure is odd
center = int((ksize - 1) / 2)
kernel = np.zeros((ksize, ksize))
if mode == 'h':
kernel[center, :] = 1.
elif mode == 'v':
kernel[:, center] = 1.
elif mode == 'diag_down':
kernel = np.eye(ksize)
elif mode == 'diag_up':
kernel = np.flip(np.eye(ksize), 0)
var = ksize * ksize / 16.
grid = np.repeat(np.arange(ksize)[:, np.newaxis], ksize, axis=-1)
gaussian = np.exp(-(np.square(grid - center) + np.square(grid.T - center)) / (2. * var))
kernel *= gaussian
kernel /= np.sum(kernel)
img = cv2.filter2D(img, -1, kernel)
return img
class BlendedMVSDataset(Dataset):
def __init__(self, root_dir, list_file, split, n_views, **kwargs):
super(BlendedMVSDataset, self).__init__()
self.levels = 4
self.root_dir = root_dir
self.list_file = list_file
self.split = split
self.n_views = n_views
assert self.split in ['train', 'val', 'all']
self.scale_factors = {}
self.scale_factor = 0
self.img_wh = kwargs.get("img_wh", (768, 576))
assert self.img_wh[0]%32==0 and self.img_wh[1]%32==0, \
'img_wh must both be multiples of 2^5!'
self.robust_train = kwargs.get("robust_train", True)
self.augment = kwargs.get("augment", True)
if self.augment:
self.color_augment = T.ColorJitter(brightness=0.25, contrast=(0.3, 1.5))
self.metas = self.build_metas()
def build_metas(self):
metas = []
with open(self.list_file) as f:
self.scans = [line.rstrip() for line in f.readlines()]
for scan in self.scans:
with open(os.path.join(self.root_dir, scan, "cams/pair.txt")) as f:
num_viewpoint = int(f.readline())
for _ in range(num_viewpoint):
ref_view = int(f.readline().rstrip())
src_views = [int(x) for x in f.readline().rstrip().split()[1::2]]
if len(src_views) >= self.n_views-1:
metas += [(scan, ref_view, src_views)]
return metas
def read_cam_file(self, scan, filename):
with open(filename) as f:
lines = f.readlines()
lines = [line.rstrip() for line in lines]
# extrinsics: line [1,5), 4x4 matrix
extrinsics = np.fromstring(' '.join(lines[1:5]), dtype=np.float32, sep=' ').reshape((4, 4))
# intrinsics: line [7-10), 3x3 matrix
intrinsics = np.fromstring(' '.join(lines[7:10]), dtype=np.float32, sep=' ').reshape((3, 3))
depth_min = float(lines[11].split()[0])
depth_max = float(lines[11].split()[-1])
if scan not in self.scale_factors:
self.scale_factors[scan] = 100.0 / depth_min
depth_min *= self.scale_factors[scan]
depth_max *= self.scale_factors[scan]
extrinsics[:3, 3] *= self.scale_factors[scan]
return intrinsics, extrinsics, depth_min, depth_max
def read_depth_mask(self, scan, filename, depth_min, depth_max, scale):
depth = np.array(read_pfm(filename)[0], dtype=np.float32)
depth = (depth * self.scale_factors[scan]) * scale
mask = (depth>=depth_min) & (depth<=depth_max)
assert mask.sum() > 0
mask = mask.astype(np.float32)
if self.img_wh is not None:
depth = cv2.resize(depth, self.img_wh, interpolation=cv2.INTER_NEAREST)
h, w = depth.shape
depth_ms = {}
mask_ms = {}
for i in range(self.levels):
depth_cur = cv2.resize(depth, (w//(2**i), h//(2**i)), interpolation=cv2.INTER_NEAREST)
mask_cur = cv2.resize(mask, (w//(2**i), h//(2**i)), interpolation=cv2.INTER_NEAREST)
depth_ms[f"stage{self.levels-i}"] = depth_cur
mask_ms[f"stage{self.levels-i}"] = mask_cur
return depth_ms, mask_ms
def read_img(self, filename):
img = Image.open(filename)
if self.augment:
img = self.color_augment(img)
img = motion_blur(np.array(img, dtype=np.float32))
np_img = np.array(img, dtype=np.float32) / 255.
return np_img
def __len__(self):
return len(self.metas)
def __getitem__(self, idx):
meta = self.metas[idx]
scan, ref_view, src_views = meta
if self.robust_train:
num_src_views = len(src_views)
index = random.sample(range(num_src_views), self.n_views - 1)
view_ids = [ref_view] + [src_views[i] for i in index]
scale_ratio = random.uniform(0.8, 1.25)
else:
view_ids = [ref_view] + src_views[:self.n_views - 1]
scale_ratio = 1
imgs = []
mask = None
depth = None
depth_min = None
depth_max = None
proj={}
proj_matrices_0 = []
proj_matrices_1 = []
proj_matrices_2 = []
proj_matrices_3 = []
for i, vid in enumerate(view_ids):
img_filename = os.path.join(self.root_dir, '{}/blended_images/{:0>8}.jpg'.format(scan, vid))
depth_filename = os.path.join(self.root_dir, '{}/rendered_depth_maps/{:0>8}.pfm'.format(scan, vid))
proj_mat_filename = os.path.join(self.root_dir, '{}/cams/{:0>8}_cam.txt'.format(scan, vid))
img = self.read_img(img_filename)
imgs.append(img.transpose(2,0,1))
intrinsics, extrinsics, depth_min_, depth_max_ = self.read_cam_file(scan, proj_mat_filename)
proj_mat_0 = np.zeros(shape=(2, 4, 4), dtype=np.float32)
proj_mat_1 = np.zeros(shape=(2, 4, 4), dtype=np.float32)
proj_mat_2 = np.zeros(shape=(2, 4, 4), dtype=np.float32)
proj_mat_3 = np.zeros(shape=(2, 4, 4), dtype=np.float32)
extrinsics[:3, 3] *= scale_ratio
intrinsics[:2,:] *= 0.125
proj_mat_0[0,:4,:4] = extrinsics.copy()
proj_mat_0[1,:3,:3] = intrinsics.copy()
int_mat_0 = intrinsics.copy()
intrinsics[:2,:] *= 2
proj_mat_1[0,:4,:4] = extrinsics.copy()
proj_mat_1[1,:3,:3] = intrinsics.copy()
int_mat_1 = intrinsics.copy()
intrinsics[:2,:] *= 2
proj_mat_2[0,:4,:4] = extrinsics.copy()
proj_mat_2[1,:3,:3] = intrinsics.copy()
int_mat_2 = intrinsics.copy()
intrinsics[:2,:] *= 2
proj_mat_3[0,:4,:4] = extrinsics.copy()
proj_mat_3[1,:3,:3] = intrinsics.copy()
int_mat_3 = intrinsics.copy()
proj_matrices_0.append(proj_mat_0)
proj_matrices_1.append(proj_mat_1)
proj_matrices_2.append(proj_mat_2)
proj_matrices_3.append(proj_mat_3)
# reference view
if i == 0:
depth_min = depth_min_ * scale_ratio
depth_max = depth_max_ * scale_ratio
depth, mask = self.read_depth_mask(scan, depth_filename, depth_min, depth_max, scale_ratio)
for l in range(self.levels):
mask[f'stage{l+1}'] = mask[f'stage{l+1}']
depth[f'stage{l+1}'] = depth[f'stage{l+1}']
proj['stage1'] = np.stack(proj_matrices_0)
proj['stage2'] = np.stack(proj_matrices_1)
proj['stage3'] = np.stack(proj_matrices_2)
proj['stage4'] = np.stack(proj_matrices_3)
intrinsics_matrices = {
"stage1": int_mat_0,
"stage2": int_mat_1,
"stage3": int_mat_2,
"stage4": int_mat_3
}
sample = {
"imgs": imgs,
"proj_matrices": proj,
"intrinsics_matrices": intrinsics_matrices,
"depth": depth,
"depth_values": np.array([depth_min, depth_max], dtype=np.float32),
"mask": mask
}
return sample
================================================
FILE: datasets/data_io.py
================================================
# -*- coding: utf-8 -*-
# @Description: I/O functions for depth maps and camera files.
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07
import sys, re
import numpy as np
def read_pfm(filename):
file = open(filename, 'rb')
color = None
width = None
height = None
scale = None
endian = None
header = file.readline().decode('utf-8').rstrip()
if header == 'PF':
color = True
elif header == 'Pf':
color = False
else:
raise Exception('Not a PFM file.')
dim_match = re.match(r'^(\d+)\s(\d+)\s$', file.readline().decode('utf-8'))
if dim_match:
width, height = map(int, dim_match.groups())
else:
raise Exception('Malformed PFM header.')
scale = float(file.readline().rstrip())
if scale < 0: # little-endian
endian = '<'
scale = -scale
else:
endian = '>' # big-endian
data = np.fromfile(file, endian + 'f')
shape = (height, width, 3) if color else (height, width)
data = np.reshape(data, shape)
data = np.flipud(data)
file.close()
return data, scale
def save_pfm(filename, image, scale=1):
file = open(filename, "wb")
color = None
image = np.flipud(image)
if image.dtype.name != 'float32':
raise Exception('Image dtype must be float32.')
if len(image.shape) == 3 and image.shape[2] == 3: # color image
color = True
elif len(image.shape) == 2 or len(image.shape) == 3 and image.shape[2] == 1: # greyscale
color = False
else:
raise Exception('Image must have H x W x 3, H x W x 1 or H x W dimensions.')
file.write('PF\n'.encode('utf-8') if color else 'Pf\n'.encode('utf-8'))
file.write('{} {}\n'.format(image.shape[1], image.shape[0]).encode('utf-8'))
endian = image.dtype.byteorder
if endian == '<' or endian == '=' and sys.byteorder == 'little':
scale = -scale
file.write(('%f\n' % scale).encode('utf-8'))
image.tofile(file)
file.close()
def write_cam(file, cam):
f = open(file, "w")
f.write('extrinsic\n')
for i in range(0, 4):
for j in range(0, 4):
f.write(str(cam[0][i][j]) + ' ')
f.write('\n')
f.write('\n')
f.write('intrinsic\n')
for i in range(0, 3):
for j in range(0, 3):
f.write(str(cam[1][i][j]) + ' ')
f.write('\n')
f.write('\n' + str(cam[1][3][0]) + ' ' + str(cam[1][3][1]) + ' ' + str(cam[1][3][2]) + ' ' + str(cam[1][3][3]) + '\n')
f.close()
================================================
FILE: datasets/dtu.py
================================================
# -*- coding: utf-8 -*-
# @Description: Data preprocessing and organization for DTU dataset.
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07
import os
import cv2
import random
import numpy as np
from PIL import Image
from torchvision import transforms
from torch.utils.data import Dataset
from datasets.data_io import *
class DTUDataset(Dataset):
def __init__(self, root_dir, list_file, mode, n_views, **kwargs):
super(DTUDataset, self).__init__()
self.root_dir = root_dir
self.list_file = list_file
self.mode = mode
self.n_views = n_views
assert self.mode in ["train", "val", "test"]
self.total_depths = 192
self.interval_scale = 1.06
self.data_scale = kwargs.get("data_scale", "mid") # mid / raw
self.robust_train = kwargs.get("robust_train", False) # True / False
self.color_augment = transforms.ColorJitter(brightness=0.5, contrast=0.5)
if self.mode == "test":
self.max_wh = kwargs.get("max_wh", (1600, 1200))
self.metas = self.build_metas()
def build_metas(self):
metas = []
with open(os.path.join(self.list_file)) as f:
scans = [line.rstrip() for line in f.readlines()]
pair_file = "Cameras/pair.txt"
for scan in scans:
with open(os.path.join(self.root_dir, pair_file)) as f:
num_viewpoint = int(f.readline())
# viewpoints (49)
for _ in range(num_viewpoint):
ref_view = int(f.readline().rstrip())
src_views = [int(x) for x in f.readline().rstrip().split()[1::2]]
if self.mode == "train":
# light conditions 0-6
for light_idx in range(7):
metas.append((scan, light_idx, ref_view, src_views))
elif self.mode in ["test", "val"]:
if len(src_views) < self.n_views:
print("{} < num_views:{}".format(len(src_views), self.n_views))
src_views += [src_views[0]] * (self.n_views - len(src_views))
metas.append((scan, 3, ref_view, src_views))
print("DTU Dataset in", self.mode, "mode metas:", len(metas))
return metas
def __len__(self):
return len(self.metas)
def read_cam_file(self, filename):
with open(filename) as f:
lines = f.readlines()
lines = [line.rstrip() for line in lines]
# extrinsics: line [1,5), 4x4 matrix
extrinsics = np.fromstring(' '.join(lines[1:5]), dtype=np.float32, sep=' ').reshape((4, 4))
# intrinsics: line [7-10), 3x3 matrix
intrinsics = np.fromstring(' '.join(lines[7:10]), dtype=np.float32, sep=' ').reshape((3, 3))
if self.mode == "test":
intrinsics[:2, :] /= 4.0
# depth_min & depth_interval: line 11
depth_min = float(lines[11].split()[0])
depth_interval = float(lines[11].split()[1])
if len(lines[11].split()) >= 3:
num_depth = lines[11].split()[2]
depth_max = depth_min + int(float(num_depth)) * depth_interval
depth_interval = (depth_max - depth_min) / self.total_depths
depth_interval *= self.interval_scale
return intrinsics, extrinsics, depth_min, depth_interval
def read_img(self, filename):
img = Image.open(filename)
if self.mode == "train" and self.robust_train:
img = self.color_augment(img)
# scale 0~255 to 0~1
np_img = np.array(img, dtype=np.float32) / 255.
return np_img
def crop_img(self, img):
raw_h, raw_w = img.shape[:2]
start_h = (raw_h-1024)//2
start_w = (raw_w-1280)//2
return img[start_h:start_h+1024, start_w:start_w+1280, :] # (1024, 1280)
def prepare_img(self, hr_img):
h, w = hr_img.shape
if self.data_scale == "mid":
hr_img_ds = cv2.resize(hr_img, (w//2, h//2), interpolation=cv2.INTER_NEAREST)
h, w = hr_img_ds.shape
target_h, target_w = 512, 640
start_h, start_w = (h - target_h)//2, (w - target_w)//2
hr_img_crop = hr_img_ds[start_h: start_h + target_h, start_w: start_w + target_w]
elif self.data_scale == "raw":
hr_img_crop = hr_img[h//2-1024//2:h//2+1024//2, w//2-1280//2:w//2+1280//2] # (1024, 1280)
return hr_img_crop
def scale_mvs_input(self, img, intrinsics, max_w, max_h, base=64):
h, w = img.shape[:2]
if h > max_h or w > max_w:
scale = 1.0 * max_h / h
if scale * w > max_w:
scale = 1.0 * max_w / w
new_w, new_h = scale * w // base * base, scale * h // base * base
else:
new_w, new_h = 1.0 * w // base * base, 1.0 * h // base * base
scale_w = 1.0 * new_w / w
scale_h = 1.0 * new_h / h
intrinsics[0, :] *= scale_w
intrinsics[1, :] *= scale_h
img = cv2.resize(img, (int(new_w), int(new_h)))
return img, intrinsics
def read_mask_hr(self, filename):
img = Image.open(filename)
np_img = np.array(img, dtype=np.float32)
np_img = (np_img > 10).astype(np.float32)
np_img = self.prepare_img(np_img)
h, w = np_img.shape
np_img_ms = {
"stage1": cv2.resize(np_img, (w//8, h//8), interpolation=cv2.INTER_NEAREST),
"stage2": cv2.resize(np_img, (w//4, h//4), interpolation=cv2.INTER_NEAREST),
"stage3": cv2.resize(np_img, (w//2, h//2), interpolation=cv2.INTER_NEAREST),
"stage4": np_img,
}
return np_img_ms
def read_depth_hr(self, filename, scale):
depth_hr = np.array(read_pfm(filename)[0], dtype=np.float32) * scale
depth_lr = self.prepare_img(depth_hr)
h, w = depth_lr.shape
depth_lr_ms = {
"stage1": cv2.resize(depth_lr, (w//8, h//8), interpolation=cv2.INTER_NEAREST),
"stage2": cv2.resize(depth_lr, (w//4, h//4), interpolation=cv2.INTER_NEAREST),
"stage3": cv2.resize(depth_lr, (w//2, h//2), interpolation=cv2.INTER_NEAREST),
"stage4": depth_lr,
}
return depth_lr_ms
def __getitem__(self, idx):
scan, light_idx, ref_view, src_views = self.metas[idx]
if self.mode == "train" and self.robust_train:
num_src_views = len(src_views)
index = random.sample(range(num_src_views), self.n_views-1)
view_ids = [ref_view] + [src_views[i] for i in index]
scale_ratio = random.uniform(0.8, 1.25)
else:
view_ids = [ref_view] + src_views[:self.n_views-1]
scale_ratio = 1
imgs = []
mask = None
depth_values = None
proj_matrices = []
for i, vid in enumerate(view_ids):
# @Note image & cam
if self.mode in ["train", "val"]:
if self.data_scale == "mid":
img_filename = os.path.join(self.root_dir, 'Rectified/{}_train/rect_{:0>3}_{}_r5000.png'.format(scan, vid+1, light_idx))
elif self.data_scale == "raw":
img_filename = os.path.join(self.root_dir, 'Rectified_raw/{}/rect_{:0>3}_{}_r5000.png'.format(scan, vid + 1, light_idx))
proj_mat_filename = os.path.join(self.root_dir, 'Cameras/train/{:0>8}_cam.txt').format(vid)
elif self.mode == "test":
img_filename = os.path.join(self.root_dir, 'Rectified/{}/rect_{:0>3}_3_r5000.png'.format(scan, vid+1))
proj_mat_filename = os.path.join(self.root_dir, 'Cameras/{:0>8}_cam.txt'.format(vid))
img = self.read_img(img_filename)
intrinsics, extrinsics, depth_min, depth_interval = self.read_cam_file(proj_mat_filename)
if self.mode in ["train", "val"]:
if self.data_scale == "raw":
img = self.crop_img(img)
intrinsics[:2, :] *= 2.0
if self.mode == "train" and self.robust_train:
extrinsics[:3,3] *= scale_ratio
elif self.mode == "test":
img, intrinsics = self.scale_mvs_input(img, intrinsics, self.max_wh[0], self.max_wh[1])
imgs.append(img.transpose(2,0,1))
# reference view
if i == 0:
# @Note depth values
diff = 0.5 if self.mode in ["test", "val"] else 0
depth_max = depth_interval * (self.total_depths - diff) + depth_min
depth_values = np.array([depth_min * scale_ratio, depth_max * scale_ratio], dtype=np.float32)
# @Note depth & mask
if self.mode in ["train", "val"]:
depth_filename_hr = os.path.join(self.root_dir, 'Depths_raw/{}/depth_map_{:0>4}.pfm'.format(scan, vid))
depth = self.read_depth_hr(depth_filename_hr, scale_ratio)
mask_filename_hr = os.path.join(self.root_dir, 'Depths_raw/{}/depth_visual_{:0>4}.png'.format(scan, vid))
mask = self.read_mask_hr(mask_filename_hr)
proj_mat = np.zeros(shape=(2, 4, 4), dtype=np.float32)
proj_mat[0, :4, :4] = extrinsics
proj_mat[1, :3, :3] = intrinsics
proj_matrices.append(proj_mat)
proj_matrices = np.stack(proj_matrices)
intrinsics = np.stack(intrinsics)
stage1_pjmats = proj_matrices.copy()
stage1_pjmats[:, 1, :2, :] = proj_matrices[:, 1, :2, :] / 2.0
stage1_ins = intrinsics.copy()
stage1_ins[:2, :] = intrinsics[:2, :] / 2.0
stage3_pjmats = proj_matrices.copy()
stage3_pjmats[:, 1, :2, :] = proj_matrices[:, 1, :2, :] * 2
stage3_ins = intrinsics.copy()
stage3_ins[:2, :] = intrinsics[:2, :] * 2.0
stage4_pjmats = proj_matrices.copy()
stage4_pjmats[:, 1, :2, :] = proj_matrices[:, 1, :2, :] * 4
stage4_ins = intrinsics.copy()
stage4_ins[:2, :] = intrinsics[:2, :] * 4.0
proj_matrices = {
"stage1": stage1_pjmats,
"stage2": proj_matrices,
"stage3": stage3_pjmats,
"stage4": stage4_pjmats
}
intrinsics_matrices = {
"stage1": stage1_ins,
"stage2": intrinsics,
"stage3": stage3_ins,
"stage4": stage4_ins
}
sample = {
"imgs": imgs,
"proj_matrices": proj_matrices,
"intrinsics_matrices": intrinsics_matrices,
"depth_values": depth_values
}
if self.mode in ["train", "val"]:
sample["depth"] = depth
sample["mask"] = mask
elif self.mode == "test":
sample["filename"] = scan + '/{}/' + '{:0>8}'.format(view_ids[0]) + "{}"
return sample
================================================
FILE: datasets/evaluations/dtu_parallel/BaseEval2Obj_web.m
================================================
function BaseEval2Obj_web(BaseEval,method_string,outputPath)
if(nargin<3)
outputPath='./';
end
% tresshold for coloring alpha channel in the range of 0-10 mm
dist_tresshold=10;
cSet=BaseEval.cSet;
Qdata=BaseEval.Qdata;
alpha=min(BaseEval.Ddata,dist_tresshold)/dist_tresshold;
fid=fopen([outputPath method_string '2Stl_' num2str(cSet) ' .obj'],'w+');
for cP=1:size(Qdata,2)
if(BaseEval.DataInMask(cP))
C=[1 0 0]*alpha(cP)+[1 1 1]*(1-alpha(cP)); %coloring from red to white in the range of 0-10 mm (0 to dist_tresshold)
else
C=[0 1 0]*alpha(cP)+[0 0 1]*(1-alpha(cP)); %green to blue for points outside the mask (which are not included in the analysis)
end
fprintf(fid,'v %f %f %f %f %f %f\n',[Qdata(1,cP) Qdata(2,cP) Qdata(3,cP) C(1) C(2) C(3)]);
end
fclose(fid);
disp('Data2Stl saved as obj')
Qstl=BaseEval.Qstl;
fid=fopen([outputPath 'Stl2' method_string '_' num2str(cSet) '.obj'],'w+');
alpha=min(BaseEval.Dstl,dist_tresshold)/dist_tresshold;
for cP=1:size(Qstl,2)
if(BaseEval.StlAbovePlane(cP))
C=[1 0 0]*alpha(cP)+[1 1 1]*(1-alpha(cP)); %coloring from red to white in the range of 0-10 mm (0 to dist_tresshold)
else
C=[0 1 0]*alpha(cP)+[0 0 1]*(1-alpha(cP)); %green to blue for points below plane (which are not included in the analysis)
end
fprintf(fid,'v %f %f %f %f %f %f\n',[Qstl(1,cP) Qstl(2,cP) Qstl(3,cP) C(1) C(2) C(3)]);
end
fclose(fid);
disp('Stl2Data saved as obj')
================================================
FILE: datasets/evaluations/dtu_parallel/BaseEvalMain_web.m
================================================
format compact
representation_string='Points'; %mvs representation 'Points' or 'Surfaces'
switch representation_string
case 'Points'
eval_string='_Eval_'; %results naming
settings_string='';
end
dst=0.2; %Min dist between points when reducing
% start this evaluation
cSet = str2num(thisset)
%input data name
DataInName = [plyPath sprintf('%s%03d.ply', lower(method_string), cSet)]
%results name
EvalName=[resultsPath method_string eval_string num2str(cSet) '.mat']
%check if file is already computed
if(~exist(EvalName,'file'))
disp(DataInName);
time=clock;time(4:5), drawnow
tic
Mesh = plyread(DataInName);
Qdata=[Mesh.vertex.x Mesh.vertex.y Mesh.vertex.z]';
toc
BaseEval=PointCompareMain(cSet,Qdata,dst,dataPath);
disp('Saving results'), drawnow
toc
save(EvalName,'BaseEval');
toc
% write obj-file of evaluation
% BaseEval2Obj_web(BaseEval,method_string, resultsPath)
% toc
time=clock;time(4:5), drawnow
BaseEval.MaxDist=20; %outlier threshold of 20 mm
BaseEval.FilteredDstl=BaseEval.Dstl(BaseEval.StlAbovePlane); %use only points that are above the plane
BaseEval.FilteredDstl=BaseEval.FilteredDstl(BaseEval.FilteredDstl=Low(1) & Qfrom(2,:)>=Low(2) & Qfrom(3,:)>=Low(3) &...
Qfrom(1,:)=Low(1) & Qto(2,:)>=Low(2) & Qto(3,:)>=Low(3) &...
Qto(1,:)3)]
end
================================================
FILE: datasets/evaluations/dtu_parallel/PointCompareMain.m
================================================
function BaseEval=PointCompareMain(cSet,Qdata,dst,dataPath)
% evaluation function the calculates the distantes from the reference data (stl) to the evalution points (Qdata) and the
% distances from the evaluation points to the reference
tic
% reduce points 0.2 mm neighbourhood density
Qdata=reducePts_haa(Qdata,dst);
toc
StlInName=[dataPath '/Points/stl/stl' sprintf('%03d',cSet) '_total.ply'];
StlMesh = plyread(StlInName); %STL points already reduced 0.2 mm neighbourhood density
Qstl=[StlMesh.vertex.x StlMesh.vertex.y StlMesh.vertex.z]';
%Load Mask (ObsMask) and Bounding box (BB) and Resolution (Res)
Margin=10;
MaskName=[dataPath '/ObsMask/ObsMask' num2str(cSet) '_' num2str(Margin) '.mat'];
load(MaskName)
MaxDist=60;
disp('Computing Data 2 Stl distances')
Ddata = MaxDistCP(Qstl,Qdata,BB,MaxDist);
toc
disp('Computing Stl 2 Data distances')
Dstl=MaxDistCP(Qdata,Qstl,BB,MaxDist);
disp('Distances computed')
toc
%use mask
%From Get mask - inverted & modified.
One=ones(1,size(Qdata,2));
Qv=(Qdata-BB(1,:)'*One)/Res+1;
Qv=round(Qv);
Midx1=find(Qv(1,:)>0 & Qv(1,:)<=size(ObsMask,1) & Qv(2,:)>0 & Qv(2,:)<=size(ObsMask,2) & Qv(3,:)>0 & Qv(3,:)<=size(ObsMask,3));
MidxA=sub2ind(size(ObsMask),Qv(1,Midx1),Qv(2,Midx1),Qv(3,Midx1));
Midx2=find(ObsMask(MidxA));
BaseEval.DataInMask(1:size(Qv,2))=false;
BaseEval.DataInMask(Midx1(Midx2))=true; %If Data is within the mask
BaseEval.cSet=cSet;
BaseEval.Margin=Margin; %Margin of masks
BaseEval.dst=dst; %Min dist between points when reducing
BaseEval.Qdata=Qdata; %Input data points
BaseEval.Ddata=Ddata; %distance from data to stl
BaseEval.Qstl=Qstl; %Input stl points
BaseEval.Dstl=Dstl; %Distance from the stl to data
load([dataPath '/ObsMask/Plane' num2str(cSet)],'P')
BaseEval.GroundPlane=P; % Plane used to destinguise which Stl points are 'used'
BaseEval.StlAbovePlane=(P'*[Qstl;ones(1,size(Qstl,2))])>0; %Is stl above 'ground plane'
BaseEval.Time=clock; %Time when computation is finished
================================================
FILE: datasets/evaluations/dtu_parallel/plyread.m
================================================
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [Elements,varargout] = plyread(Path,Str)
%PLYREAD Read a PLY 3D data file.
% [DATA,COMMENTS] = PLYREAD(FILENAME) reads a version 1.0 PLY file
% FILENAME and returns a structure DATA. The fields in this structure
% are defined by the PLY header; each element type is a field and each
% element property is a subfield. If the file contains any comments,
% they are returned in a cell string array COMMENTS.
%
% [TRI,PTS] = PLYREAD(FILENAME,'tri') or
% [TRI,PTS,DATA,COMMENTS] = PLYREAD(FILENAME,'tri') converts vertex
% and face data into triangular connectivity and vertex arrays. The
% mesh can then be displayed using the TRISURF command.
%
% Note: This function is slow for large mesh files (+50K faces),
% especially when reading data with list type properties.
%
% Example:
% [Tri,Pts] = PLYREAD('cow.ply','tri');
% trisurf(Tri,Pts(:,1),Pts(:,2),Pts(:,3));
% colormap(gray); axis equal;
%
% See also: PLYWRITE
% Pascal Getreuer 2004
[fid,Msg] = fopen(Path,'rt'); % open file in read text mode
if fid == -1, error(Msg); end
Buf = fscanf(fid,'%s',1);
if ~strcmp(Buf,'ply')
fclose(fid);
error('Not a PLY file.');
end
%%% read header %%%
Position = ftell(fid);
Format = '';
NumComments = 0;
Comments = {}; % for storing any file comments
NumElements = 0;
NumProperties = 0;
Elements = []; % structure for holding the element data
ElementCount = []; % number of each type of element in file
PropertyTypes = []; % corresponding structure recording property types
ElementNames = {}; % list of element names in the order they are stored in the file
PropertyNames = []; % structure of lists of property names
while 1
Buf = fgetl(fid); % read one line from file
BufRem = Buf;
Token = {};
Count = 0;
while ~isempty(BufRem) % split line into tokens
[tmp,BufRem] = strtok(BufRem);
if ~isempty(tmp)
Count = Count + 1; % count tokens
Token{Count} = tmp;
end
end
if Count % parse line
switch lower(Token{1})
case 'format' % read data format
if Count >= 2
Format = lower(Token{2});
if Count == 3 & ~strcmp(Token{3},'1.0')
fclose(fid);
error('Only PLY format version 1.0 supported.');
end
end
case 'comment' % read file comment
NumComments = NumComments + 1;
Comments{NumComments} = '';
for i = 2:Count
Comments{NumComments} = [Comments{NumComments},Token{i},' '];
end
case 'element' % element name
if Count >= 3
if isfield(Elements,Token{2})
fclose(fid);
error(['Duplicate element name, ''',Token{2},'''.']);
end
NumElements = NumElements + 1;
NumProperties = 0;
Elements = setfield(Elements,Token{2},[]);
PropertyTypes = setfield(PropertyTypes,Token{2},[]);
ElementNames{NumElements} = Token{2};
PropertyNames = setfield(PropertyNames,Token{2},{});
CurElement = Token{2};
ElementCount(NumElements) = str2double(Token{3});
if isnan(ElementCount(NumElements))
fclose(fid);
error(['Bad element definition: ',Buf]);
end
else
error(['Bad element definition: ',Buf]);
end
case 'property' % element property
if ~isempty(CurElement) & Count >= 3
NumProperties = NumProperties + 1;
eval(['tmp=isfield(Elements.',CurElement,',Token{Count});'],...
'fclose(fid);error([''Error reading property: '',Buf])');
if tmp
error(['Duplicate property name, ''',CurElement,'.',Token{2},'''.']);
end
% add property subfield to Elements
eval(['Elements.',CurElement,'.',Token{Count},'=[];'], ...
'fclose(fid);error([''Error reading property: '',Buf])');
% add property subfield to PropertyTypes and save type
eval(['PropertyTypes.',CurElement,'.',Token{Count},'={Token{2:Count-1}};'], ...
'fclose(fid);error([''Error reading property: '',Buf])');
% record property name order
eval(['PropertyNames.',CurElement,'{NumProperties}=Token{Count};'], ...
'fclose(fid);error([''Error reading property: '',Buf])');
else
fclose(fid);
if isempty(CurElement)
error(['Property definition without element definition: ',Buf]);
else
error(['Bad property definition: ',Buf]);
end
end
case 'end_header' % end of header, break from while loop
break;
end
end
end
%%% set reading for specified data format %%%
if isempty(Format)
warning('Data format unspecified, assuming ASCII.');
Format = 'ascii';
end
switch Format
case 'ascii'
Format = 0;
case 'binary_little_endian'
Format = 1;
case 'binary_big_endian'
Format = 2;
otherwise
fclose(fid);
error(['Data format ''',Format,''' not supported.']);
end
if ~Format
Buf = fscanf(fid,'%f'); % read the rest of the file as ASCII data
BufOff = 1;
else
% reopen the file in read binary mode
fclose(fid);
if Format == 1
fid = fopen(Path,'r','ieee-le.l64'); % little endian
else
fid = fopen(Path,'r','ieee-be.l64'); % big endian
end
% find the end of the header again (using ftell on the old handle doesn't give the correct position)
BufSize = 8192;
Buf = [blanks(10),char(fread(fid,BufSize,'uchar')')];
i = [];
tmp = -11;
while isempty(i)
i = findstr(Buf,['end_header',13,10]); % look for end_header + CR/LF
i = [i,findstr(Buf,['end_header',10])]; % look for end_header + LF
if isempty(i)
tmp = tmp + BufSize;
Buf = [Buf(BufSize+1:BufSize+10),char(fread(fid,BufSize,'uchar')')];
end
end
% seek to just after the line feed
fseek(fid,i + tmp + 11 + (Buf(i + 10) == 13),-1);
end
%%% read element data %%%
% PLY and MATLAB data types (for fread)
PlyTypeNames = {'char','uchar','short','ushort','int','uint','float','double', ...
'char8','uchar8','short16','ushort16','int32','uint32','float32','double64'};
MatlabTypeNames = {'schar','uchar','int16','uint16','int32','uint32','single','double'};
SizeOf = [1,1,2,2,4,4,4,8]; % size in bytes of each type
for i = 1:NumElements
% get current element property information
eval(['CurPropertyNames=PropertyNames.',ElementNames{i},';']);
eval(['CurPropertyTypes=PropertyTypes.',ElementNames{i},';']);
NumProperties = size(CurPropertyNames,2);
% fprintf('Reading %s...\n',ElementNames{i});
if ~Format %%% read ASCII data %%%
for j = 1:NumProperties
Token = getfield(CurPropertyTypes,CurPropertyNames{j});
if strcmpi(Token{1},'list')
Type(j) = 1;
else
Type(j) = 0;
end
end
% parse buffer
if ~any(Type)
% no list types
Data = reshape(Buf(BufOff:BufOff+ElementCount(i)*NumProperties-1),NumProperties,ElementCount(i))';
BufOff = BufOff + ElementCount(i)*NumProperties;
else
ListData = cell(NumProperties,1);
for k = 1:NumProperties
ListData{k} = cell(ElementCount(i),1);
end
% list type
for j = 1:ElementCount(i)
for k = 1:NumProperties
if ~Type(k)
Data(j,k) = Buf(BufOff);
BufOff = BufOff + 1;
else
tmp = Buf(BufOff);
ListData{k}{j} = Buf(BufOff+(1:tmp))';
BufOff = BufOff + tmp + 1;
end
end
end
end
else %%% read binary data %%%
% translate PLY data type names to MATLAB data type names
ListFlag = 0; % = 1 if there is a list type
SameFlag = 1; % = 1 if all types are the same
for j = 1:NumProperties
Token = getfield(CurPropertyTypes,CurPropertyNames{j});
if ~strcmp(Token{1},'list') % non-list type
tmp = rem(strmatch(Token{1},PlyTypeNames,'exact')-1,8)+1;
if ~isempty(tmp)
TypeSize(j) = SizeOf(tmp);
Type{j} = MatlabTypeNames{tmp};
TypeSize2(j) = 0;
Type2{j} = '';
SameFlag = SameFlag & strcmp(Type{1},Type{j});
else
fclose(fid);
error(['Unknown property data type, ''',Token{1},''', in ', ...
ElementNames{i},'.',CurPropertyNames{j},'.']);
end
else % list type
if length(Token) == 3
ListFlag = 1;
SameFlag = 0;
tmp = rem(strmatch(Token{2},PlyTypeNames,'exact')-1,8)+1;
tmp2 = rem(strmatch(Token{3},PlyTypeNames,'exact')-1,8)+1;
if ~isempty(tmp) & ~isempty(tmp2)
TypeSize(j) = SizeOf(tmp);
Type{j} = MatlabTypeNames{tmp};
TypeSize2(j) = SizeOf(tmp2);
Type2{j} = MatlabTypeNames{tmp2};
else
fclose(fid);
error(['Unknown property data type, ''list ',Token{2},' ',Token{3},''', in ', ...
ElementNames{i},'.',CurPropertyNames{j},'.']);
end
else
fclose(fid);
error(['Invalid list syntax in ',ElementNames{i},'.',CurPropertyNames{j},'.']);
end
end
end
% read file
if ~ListFlag
if SameFlag
% no list types, all the same type (fast)
Data = fread(fid,[NumProperties,ElementCount(i)],Type{1})';
else
% no list types, mixed type
Data = zeros(ElementCount(i),NumProperties);
for j = 1:ElementCount(i)
for k = 1:NumProperties
Data(j,k) = fread(fid,1,Type{k});
end
end
end
else
ListData = cell(NumProperties,1);
for k = 1:NumProperties
ListData{k} = cell(ElementCount(i),1);
end
if NumProperties == 1
BufSize = 512;
SkipNum = 4;
j = 0;
% list type, one property (fast if lists are usually the same length)
while j < ElementCount(i)
Position = ftell(fid);
% read in BufSize count values, assuming all counts = SkipNum
[Buf,BufSize] = fread(fid,BufSize,Type{1},SkipNum*TypeSize2(1));
Miss = find(Buf ~= SkipNum); % find first count that is not SkipNum
fseek(fid,Position + TypeSize(1),-1); % seek back to after first count
if isempty(Miss) % all counts are SkipNum
Buf = fread(fid,[SkipNum,BufSize],[int2str(SkipNum),'*',Type2{1}],TypeSize(1))';
fseek(fid,-TypeSize(1),0); % undo last skip
for k = 1:BufSize
ListData{1}{j+k} = Buf(k,:);
end
j = j + BufSize;
BufSize = floor(1.5*BufSize);
else
if Miss(1) > 1 % some counts are SkipNum
Buf2 = fread(fid,[SkipNum,Miss(1)-1],[int2str(SkipNum),'*',Type2{1}],TypeSize(1))';
for k = 1:Miss(1)-1
ListData{1}{j+k} = Buf2(k,:);
end
j = j + k;
end
% read in the list with the missed count
SkipNum = Buf(Miss(1));
j = j + 1;
ListData{1}{j} = fread(fid,[1,SkipNum],Type2{1});
BufSize = ceil(0.6*BufSize);
end
end
else
% list type(s), multiple properties (slow)
Data = zeros(ElementCount(i),NumProperties);
for j = 1:ElementCount(i)
for k = 1:NumProperties
if isempty(Type2{k})
Data(j,k) = fread(fid,1,Type{k});
else
tmp = fread(fid,1,Type{k});
ListData{k}{j} = fread(fid,[1,tmp],Type2{k});
end
end
end
end
end
end
% put data into Elements structure
for k = 1:NumProperties
if (~Format & ~Type(k)) | (Format & isempty(Type2{k}))
eval(['Elements.',ElementNames{i},'.',CurPropertyNames{k},'=Data(:,k);']);
else
eval(['Elements.',ElementNames{i},'.',CurPropertyNames{k},'=ListData{k};']);
end
end
end
clear Data ListData;
fclose(fid);
if (nargin > 1 & strcmpi(Str,'Tri')) | nargout > 2
% find vertex element field
Name = {'vertex','Vertex','point','Point','pts','Pts'};
Names = [];
for i = 1:length(Name)
if any(strcmp(ElementNames,Name{i}))
Names = getfield(PropertyNames,Name{i});
Name = Name{i};
break;
end
end
if any(strcmp(Names,'x')) & any(strcmp(Names,'y')) & any(strcmp(Names,'z'))
eval(['varargout{1}=[Elements.',Name,'.x,Elements.',Name,'.y,Elements.',Name,'.z];']);
else
varargout{1} = zeros(1,3);
end
varargout{2} = Elements;
varargout{3} = Comments;
Elements = [];
% find face element field
Name = {'face','Face','poly','Poly','tri','Tri'};
Names = [];
for i = 1:length(Name)
if any(strcmp(ElementNames,Name{i}))
Names = getfield(PropertyNames,Name{i});
Name = Name{i};
break;
end
end
if ~isempty(Names)
% find vertex indices property subfield
PropertyName = {'vertex_indices','vertex_indexes','vertex_index','indices','indexes'};
for i = 1:length(PropertyName)
if any(strcmp(Names,PropertyName{i}))
PropertyName = PropertyName{i};
break;
end
end
if ~iscell(PropertyName)
% convert face index lists to triangular connectivity
eval(['FaceIndices=varargout{2}.',Name,'.',PropertyName,';']);
N = length(FaceIndices);
Elements = zeros(N*2,3);
Extra = 0;
for k = 1:N
Elements(k,:) = FaceIndices{k}(1:3);
for j = 4:length(FaceIndices{k})
Extra = Extra + 1;
Elements(N + Extra,:) = [Elements(k,[1,j-1]),FaceIndices{k}(j)];
end
end
Elements = Elements(1:N+Extra,:) + 1;
end
end
else
varargout{1} = Comments;
end
================================================
FILE: datasets/evaluations/dtu_parallel/reducePts_haa.m
================================================
function [ptsOut,indexSet] = reducePts_haa(pts, dst)
%Reduces a point set, pts, in a stochastic manner, such that the minimum sdistance
% between points is 'dst'. Writen by abd, edited by haa, then by raje
nPoints=size(pts,2);
indexSet=true(nPoints,1);
RandOrd=randperm(nPoints);
%tic
NS = KDTreeSearcher(pts');
%toc
% search the KNTree for close neighbours in a chunk-wise fashion to save memory if point cloud is really big
Chunks=1:min(4e6,nPoints-1):nPoints;
Chunks(end)=nPoints;
for cChunk=1:(length(Chunks)-1)
Range=Chunks(cChunk):Chunks(cChunk+1);
idx = rangesearch(NS,pts(:,RandOrd(Range))',dst);
for i = 1:size(idx,1)
id =RandOrd(i-1+Chunks(cChunk));
if (indexSet(id))
indexSet(idx{i}) = 0;
indexSet(id) = 1;
end
end
end
ptsOut = pts(:,indexSet);
disp(['downsample factor: ' num2str(nPoints/sum(indexSet))]);
================================================
FILE: datasets/lists/blendedmvs/low_res_all.txt
================================================
5c1f33f1d33e1f2e4aa6dda4
5bfe5ae0fe0ea555e6a969ca
5bff3c5cfe0ea555e6bcbf3a
58eaf1513353456af3a1682a
5bfc9d5aec61ca1dd69132a2
5bf18642c50e6f7f8bdbd492
5bf26cbbd43923194854b270
5bf17c0fd439231948355385
5be3ae47f44e235bdbbc9771
5be3a5fb8cfdd56947f6b67c
5bbb6eb2ea1cfa39f1af7e0c
5ba75d79d76ffa2c86cf2f05
5bb7a08aea1cfa39f1a947ab
5b864d850d072a699b32f4ae
5b6eff8b67b396324c5b2672
5b6e716d67b396324c2d77cb
5b69cc0cb44b61786eb959bf
5b62647143840965efc0dbde
5b60fa0c764f146feef84df0
5b558a928bbfb62204e77ba2
5b271079e0878c3816dacca4
5b08286b2775267d5b0634ba
5afacb69ab00705d0cefdd5b
5af28cea59bc705737003253
5af02e904c8216544b4ab5a2
5aa515e613d42d091d29d300
5c34529873a8df509ae57b58
5c34300a73a8df509add216d
5c1af2e2bee9a723c963d019
5c1892f726173c3a09ea9aeb
5c0d13b795da9479e12e2ee9
5c062d84a96e33018ff6f0a6
5bfd0f32ec61ca1dd69dc77b
5bf21799d43923194842c001
5bf3a82cd439231948877aed
5bf03590d4392319481971dc
5beb6e66abd34c35e18e66b9
5be883a4f98cee15019d5b83
5be47bf9b18881428d8fbc1d
5bcf979a6d5f586b95c258cd
5bce7ac9ca24970bce4934b6
5bb8a49aea1cfa39f1aa7f75
5b78e57afc8fcf6781d0c3ba
5b21e18c58e2823a67a10dd8
5b22269758e2823a67a3bd03
5b192eb2170cf166458ff886
5ae2e9c5fe405c5076abc6b2
5adc6bd52430a05ecb2ffb85
5ab8b8e029f5351f7f2ccf59
5abc2506b53b042ead637d86
5ab85f1dac4291329b17cb50
5a969eea91dfc339a9a3ad2c
5a8aa0fab18050187cbe060e
5a7d3db14989e929563eb153
5a69c47d0d5d0a7f3b2e9752
5a618c72784780334bc1972d
5a6464143d809f1d8208c43c
5a588a8193ac3d233f77fbca
5a57542f333d180827dfc132
5a572fd9fc597b0478a81d14
5a563183425d0f5186314855
5a4a38dad38c8a075495b5d2
5a48d4b2c7dab83a7d7b9851
5a489fb1c7dab83a7d7b1070
5a48ba95c7dab83a7d7b44ed
5a3ca9cb270f0e3f14d0eddb
5a3cb4e4270f0e3f14d12f43
5a3f4aba5889373fbbc5d3b5
5a0271884e62597cdee0d0eb
59e864b2a9e91f2c5529325f
599aa591d5b41f366fed0d58
59350ca084b7f26bf5ce6eb8
59338e76772c3e6384afbb15
5c20ca3a0843bc542d94e3e2
5c1dbf200843bc542d8ef8c4
5c1b1500bee9a723c96c3e78
5bea87f4abd34c35e1860ab5
5c2b3ed5e611832e8aed46bf
57f8d9bbe73f6760f10e916a
5bf7d63575c26f32dbf7413b
5be4ab93870d330ff2dce134
5bd43b4ba6b28b1ee86b92dd
5bccd6beca24970bce448134
5bc5f0e896b66a2cd8f9bd36
5b908d3dc6ab78485f3d24a9
5b2c67b5e0878c381608b8d8
5b4933abf2b5f44e95de482a
5b3b353d8d46a939f93524b9
5acf8ca0f3d8a750097e4b15
5ab8713ba3799a1d138bd69a
5aa235f64a17b335eeaf9609
5aa0f9d7a9efce63548c69a1
5a8315f624b8e938486e0bd8
5a48c4e9c7dab83a7d7b5cc7
59ecfd02e225f6492d20fcc9
59f87d0bfa6280566fb38c9a
59f363a8b45be22330016cad
59f70ab1e5c5d366af29bf3e
59e75a2ca9e91f2c5526005d
5947719bf1b45630bd096665
5947b62af1b45630bd0c2a02
59056e6760bb961de55f3501
58f7f7299f5b5647873cb110
58cf4771d0f5fb221defe6da
58d36897f387231e6c929903
58c4bb4f4a69c55606122be4
5b7a3890fc8fcf6781e2593a
5c189f2326173c3a09ed7ef3
5b950c71608de421b1e7318f
5a6400933d809f1d8200af15
59d2657f82ca7774b1ec081d
5ba19a8a360c7c30c1c169df
59817e4a1bd4b175e7038d19
================================================
FILE: datasets/lists/blendedmvs/val.txt
================================================
5b7a3890fc8fcf6781e2593a
5c189f2326173c3a09ed7ef3
5b950c71608de421b1e7318f
5a6400933d809f1d8200af15
59d2657f82ca7774b1ec081d
5ba19a8a360c7c30c1c169df
59817e4a1bd4b175e7038d19
================================================
FILE: datasets/lists/dtu/test.txt
================================================
scan1
scan4
scan9
scan10
scan11
scan12
scan13
scan15
scan23
scan24
scan29
scan32
scan33
scan34
scan48
scan49
scan62
scan75
scan77
scan110
scan114
scan118
================================================
FILE: datasets/lists/dtu/train.txt
================================================
scan2
scan6
scan7
scan8
scan14
scan16
scan18
scan19
scan20
scan22
scan30
scan31
scan36
scan39
scan41
scan42
scan44
scan45
scan46
scan47
scan50
scan51
scan52
scan53
scan55
scan57
scan58
scan60
scan61
scan63
scan64
scan65
scan68
scan69
scan70
scan71
scan72
scan74
scan76
scan83
scan84
scan85
scan87
scan88
scan89
scan90
scan91
scan92
scan93
scan94
scan95
scan96
scan97
scan98
scan99
scan100
scan101
scan102
scan103
scan104
scan105
scan107
scan108
scan109
scan111
scan112
scan113
scan115
scan116
scan119
scan120
scan121
scan122
scan123
scan124
scan125
scan126
scan127
scan128
================================================
FILE: datasets/lists/dtu/val.txt
================================================
scan3
scan5
scan17
scan21
scan28
scan35
scan37
scan38
scan40
scan43
scan56
scan59
scan66
scan67
scan82
scan86
scan106
scan117
================================================
FILE: datasets/lists/tnt/advanced.txt
================================================
Auditorium
Ballroom
Courtroom
Museum
Palace
Temple
================================================
FILE: datasets/lists/tnt/intermediate.txt
================================================
Family
Horse
Francis
Lighthouse
M60
Panther
Playground
Train
================================================
FILE: datasets/tnt.py
================================================
# -*- coding: utf-8 -*-
# @Description: Data preprocessing and organization for Tanks and Temples dataset.
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07
import os
import cv2
import numpy as np
from PIL import Image
from torch.utils.data import Dataset
from datasets.data_io import *
class TNTDataset(Dataset):
def __init__(self, root_dir, list_file, split, n_views, **kwargs):
super(TNTDataset, self).__init__()
self.root_dir = root_dir
self.list_file = list_file
self.split = split
self.n_views = n_views
self.cam_mode = kwargs.get("cam_mode", "origin") # origin / short_range
if self.cam_mode == 'short_range': assert self.split == "intermediate"
self.img_mode = kwargs.get("img_mode", "resize") # resize / crop
self.total_depths = 192
self.depth_interval_table = {
# intermediate
'Family': 2.5e-3, 'Francis': 1e-2, 'Horse': 1.5e-3, 'Lighthouse': 1.5e-2, 'M60': 5e-3, 'Panther': 5e-3, 'Playground': 7e-3, 'Train': 5e-3,
# advanced
'Auditorium': 3e-2, 'Ballroom': 2e-2, 'Courtroom': 2e-2, 'Museum': 2e-2, 'Palace': 1e-2, 'Temple': 1e-2
}
self.img_wh = kwargs.get("img_wh", (-1, 1024))
self.metas = self.build_metas()
def build_metas(self):
metas = []
with open(os.path.join(self.list_file)) as f:
scans = [line.rstrip() for line in f.readlines()]
for scan in scans:
with open(os.path.join(self.root_dir, self.split, scan, 'pair.txt')) as f:
num_viewpoint = int(f.readline())
for view_idx in range(num_viewpoint):
ref_view = int(f.readline().rstrip())
src_views = [int(x) for x in f.readline().rstrip().split()[1::2]]
if len(src_views) != 0:
metas += [(scan, -1, ref_view, src_views)]
return metas
def read_cam_file(self, filename):
with open(filename) as f:
lines = [line.rstrip() for line in f.readlines()]
# extrinsics: line [1,5), 4x4 matrix
extrinsics = np.fromstring(' '.join(lines[1:5]), dtype=np.float32, sep=' ')
extrinsics = extrinsics.reshape((4, 4))
# intrinsics: line [7-10), 3x3 matrix
intrinsics = np.fromstring(' '.join(lines[7:10]), dtype=np.float32, sep=' ')
intrinsics = intrinsics.reshape((3, 3))
depth_min = float(lines[11].split()[0])
depth_max = float(lines[11].split()[-1])
return intrinsics, extrinsics, depth_min, depth_max
def read_img(self, filename):
img = Image.open(filename)
np_img = np.array(img, dtype=np.float32) / 255.
return np_img
def scale_tnt_input(self, intrinsics, img):
if self.img_mode == "crop":
intrinsics[1,2] = intrinsics[1,2] - 28 # 1080 -> 1024
img = img[28:1080-28, :, :]
elif self.img_mode == "resize":
height, width = img.shape[:2]
max_w, max_h = self.img_wh[0], self.img_wh[1]
if max_w == -1:
max_w = width
img = cv2.resize(img, (max_w, max_h))
scale_w = 1.0 * max_w / width
intrinsics[0, :] *= scale_w
scale_h = 1.0 * max_h / height
intrinsics[1, :] *= scale_h
return intrinsics, img
def __len__(self):
return len(self.metas)
def __getitem__(self, idx):
scan, _, ref_view, src_views = self.metas[idx]
view_ids = [ref_view] + src_views[:self.n_views-1]
imgs = []
depth_min = None
depth_max = None
proj_matrices_0 = []
proj_matrices_1 = []
proj_matrices_2 = []
proj_matrices_3 = []
for i, vid in enumerate(view_ids):
img_filename = os.path.join(self.root_dir, self.split, scan, f'images/{vid:08d}.jpg')
if self.cam_mode == 'short_range':
# can only use for Intermediate
proj_mat_filename = os.path.join(self.root_dir, self.split, scan, f'cams_{scan.lower()}/{vid:08d}_cam.txt')
elif self.cam_mode == 'origin':
proj_mat_filename = os.path.join(self.root_dir, self.split, scan, f'cams/{vid:08d}_cam.txt')
img = self.read_img(img_filename)
intrinsics, extrinsics, depth_min_, depth_max_ = self.read_cam_file(proj_mat_filename)
intrinsics, img = self.scale_tnt_input(intrinsics, img)
imgs.append(img.transpose(2,0,1))
proj_mat_0 = np.zeros(shape=(2, 4, 4), dtype=np.float32)
proj_mat_1 = np.zeros(shape=(2, 4, 4), dtype=np.float32)
proj_mat_2 = np.zeros(shape=(2, 4, 4), dtype=np.float32)
proj_mat_3 = np.zeros(shape=(2, 4, 4), dtype=np.float32)
intrinsics[:2,:] *= 0.125
proj_mat_0[0,:4,:4] = extrinsics.copy()
proj_mat_0[1,:3,:3] = intrinsics.copy()
int_mat_0 = intrinsics.copy()
intrinsics[:2,:] *= 2
proj_mat_1[0,:4,:4] = extrinsics.copy()
proj_mat_1[1,:3,:3] = intrinsics.copy()
int_mat_1 = intrinsics.copy()
intrinsics[:2,:] *= 2
proj_mat_2[0,:4,:4] = extrinsics.copy()
proj_mat_2[1,:3,:3] = intrinsics.copy()
int_mat_2 = intrinsics.copy()
intrinsics[:2,:] *= 2
proj_mat_3[0,:4,:4] = extrinsics.copy()
proj_mat_3[1,:3,:3] = intrinsics.copy()
int_mat_3 = intrinsics.copy()
proj_matrices_0.append(proj_mat_0)
proj_matrices_1.append(proj_mat_1)
proj_matrices_2.append(proj_mat_2)
proj_matrices_3.append(proj_mat_3)
# reference view
if i == 0:
depth_min = depth_min_
if self.cam_mode == 'short_range':
depth_max = depth_min + self.total_depths * self.depth_interval_table[scan]
elif self.cam_mode == 'origin':
depth_max = depth_max_
proj={}
proj['stage1'] = np.stack(proj_matrices_0)
proj['stage2'] = np.stack(proj_matrices_1)
proj['stage3'] = np.stack(proj_matrices_2)
proj['stage4'] = np.stack(proj_matrices_3)
intrinsics_matrices = {
"stage1": int_mat_0,
"stage2": int_mat_1,
"stage3": int_mat_2,
"stage4": int_mat_3
}
sample = {
"imgs": imgs,
"proj_matrices": proj,
"intrinsics_matrices": intrinsics_matrices,
"depth_values": np.array([depth_min, depth_max], dtype=np.float32),
"filename": scan + '/{}/' + '{:0>8}'.format(view_ids[0]) + "{}"
}
return sample
================================================
FILE: fusions/dtu/_open3d.py
================================================
# -*- coding: utf-8 -*-
# @Description: Point cloud fusion strategy for DTU dataset based on Open3D Library.
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07
import torch
import numpy as np
import sys
import argparse
import errno, os
import glob
import os.path as osp
import re
import cv2
from PIL import Image
import gc
import open3d as o3d
import torch
import torch.nn.functional as F
import numpy as np
parser = argparse.ArgumentParser(description='Depth fusion with consistency check.')
parser.add_argument('--root_path', type=str, default='[/path/to/]dtu-test-1200')
parser.add_argument('--depth_path', type=str, default='')
parser.add_argument('--data_list', type=str, default='')
parser.add_argument('--ply_path', type=str, default='')
parser.add_argument('--dist_thresh', type=float, default=0.001)
parser.add_argument('--prob_thresh', type=float, default=0.6)
parser.add_argument('--num_consist', type=int, default=10)
parser.add_argument('--device', type=str, default='cpu')
args = parser.parse_args()
def homo_warping(src_fea, src_proj, ref_proj, depth_values):
# src_fea: [B, C, H, W]
# src_proj: [B, 4, 4]
# ref_proj: [B, 4, 4]
# depth_values: [B, Ndepth] o [B, Ndepth, H, W]
# out: [B, C, Ndepth, H, W]
batch, channels = src_fea.shape[0], src_fea.shape[1]
height, width = src_fea.shape[2], src_fea.shape[3]
with torch.no_grad():
proj = torch.matmul(src_proj, torch.inverse(ref_proj))
rot = proj[:, :3, :3] # [B,3,3]
trans = proj[:, :3, 3:4] # [B,3,1]
y, x = torch.meshgrid([torch.arange(0, height, dtype=torch.float32, device=src_fea.device),
torch.arange(0, width, dtype=torch.float32, device=src_fea.device)])
y, x = y.contiguous(), x.contiguous()
y, x = y.view(height * width), x.view(height * width)
xyz = torch.stack((x, y, torch.ones_like(x))) # [3, H*W]
xyz = torch.unsqueeze(xyz, 0).repeat(batch, 1, 1) # [B, 3, H*W]
rot_xyz = torch.matmul(rot, xyz) # [B, 3, H*W]
rot_depth_xyz = rot_xyz.unsqueeze(2) * depth_values.view(-1, 1, 1, height*width) # [B, 3, 1, H*W]
proj_xyz = rot_depth_xyz + trans.view(batch, 3, 1, 1) # [B, 3, Ndepth, H*W]
proj_xy = proj_xyz[:, :2, :, :] / proj_xyz[:, 2:3, :, :] # [B, 2, Ndepth, H*W]
proj_x_normalized = proj_xy[:, 0, :, :] / ((width - 1) / 2) - 1
proj_y_normalized = proj_xy[:, 1, :, :] / ((height - 1) / 2) - 1
proj_xy = torch.stack((proj_x_normalized, proj_y_normalized), dim=3) # [B, Ndepth, H*W, 2]
grid = proj_xy
warped_src_fea = F.grid_sample(src_fea, grid.view(batch, height, width, 2), mode='bilinear',
padding_mode='zeros')
warped_src_fea = warped_src_fea.view(batch, channels, height, width)
return warped_src_fea
def generate_points_from_depth(depth, proj):
'''
:param depth: (B, 1, H, W)
:param proj: (B, 4, 4)
:return: point_cloud (B, 3, H, W)
'''
batch, height, width = depth.shape[0], depth.shape[2], depth.shape[3]
inv_proj = torch.inverse(proj)
rot = inv_proj[:, :3, :3] # [B,3,3]
trans = inv_proj[:, :3, 3:4] # [B,3,1]
y, x = torch.meshgrid([torch.arange(0, height, dtype=torch.float32, device=depth.device),
torch.arange(0, width, dtype=torch.float32, device=depth.device)])
y, x = y.contiguous(), x.contiguous()
y, x = y.view(height * width), x.view(height * width)
xyz = torch.stack((x, y, torch.ones_like(x))) # [3, H*W]
xyz = torch.unsqueeze(xyz, 0).repeat(batch, 1, 1) # [B, 3, H*W]
rot_xyz = torch.matmul(rot, xyz) # [B, 3, H*W]
rot_depth_xyz = rot_xyz * depth.view(batch, 1, -1)
proj_xyz = rot_depth_xyz + trans.view(batch, 3, 1) # [B, 3, H*W]
proj_xyz = proj_xyz.view(batch, 3, height, width)
return proj_xyz
def mkdir_p(path):
try:
os.makedirs(path)
except OSError as exc:
if exc.errno == errno.EEXIST and os.path.isdir(path):
pass
else:
raise
def read_pfm(filename):
file = open(filename, 'rb')
color = None
width = None
height = None
scale = None
endian = None
header = file.readline().decode('utf-8').rstrip()
if header == 'PF':
color = True
elif header == 'Pf':
color = False
else:
raise Exception('Not a PFM file.')
dim_match = re.match(r'^(\d+)\s(\d+)\s$', file.readline().decode('utf-8'))
if dim_match:
width, height = map(int, dim_match.groups())
else:
raise Exception('Malformed PFM header.')
scale = float(file.readline().rstrip())
if scale < 0: # little-endian
endian = '<'
scale = -scale
else:
endian = '>' # big-endian
data = np.fromfile(file, endian + 'f')
shape = (height, width, 3) if color else (height, width)
data = np.reshape(data, shape)
data = np.flipud(data)
file.close()
return data, scale
def write_pfm(file, image, scale=1):
file = open(file, 'wb')
color = None
if image.dtype.name != 'float32':
raise Exception('Image dtype must be float32.')
image = np.flipud(image)
if len(image.shape) == 3 and image.shape[2] == 3: # color image
color = True
elif len(image.shape) == 2 or len(image.shape) == 3 and image.shape[2] == 1: # greyscale
color = False
else:
raise Exception('Image must have H x W x 3, H x W x 1 or H x W dimensions.')
file.write('PF\n'.encode() if color else 'Pf\n'.encode())
file.write('%d %d\n'.encode() % (image.shape[1], image.shape[0]))
endian = image.dtype.byteorder
if endian == '<' or endian == '=' and sys.byteorder == 'little':
scale = -scale
file.write('%f\n'.encode() % scale)
image_string = image.tostring()
file.write(image_string)
file.close()
def write_ply(file, points):
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(points[:, :3])
pcd.colors = o3d.utility.Vector3dVector(points[:, 3:] / 255.)
o3d.io.write_point_cloud(file, pcd, write_ascii=False)
def filter_depth(ref_depth, src_depths, ref_proj, src_projs):
'''
:param ref_depth: (1, 1, H, W)
:param src_depths: (B, 1, H, W)
:param ref_proj: (1, 4, 4)
:param src_proj: (B, 4, 4)
:return: ref_pc: (1, 3, H, W), aligned_pcs: (B, 3, H, W), dist: (B, 1, H, W)
'''
ref_pc = generate_points_from_depth(ref_depth, ref_proj)
src_pcs = generate_points_from_depth(src_depths, src_projs)
aligned_pcs = homo_warping(src_pcs, src_projs, ref_proj, ref_depth)
x_2 = (ref_pc[:, 0] - aligned_pcs[:, 0])**2
y_2 = (ref_pc[:, 1] - aligned_pcs[:, 1])**2
z_2 = (ref_pc[:, 2] - aligned_pcs[:, 2])**2
dist = torch.sqrt(x_2 + y_2 + z_2).unsqueeze(1)
return ref_pc, aligned_pcs, dist
def parse_cameras(path):
cam_txt = open(path).readlines()
f = lambda xs: list(map(lambda x: list(map(float, x.strip().split())), xs))
extr_mat = f(cam_txt[1:5])
intr_mat = f(cam_txt[7:10])
extr_mat = np.array(extr_mat, np.float32)
intr_mat = np.array(intr_mat, np.float32)
return extr_mat, intr_mat
def load_data(root_path, depth_path, scene_name, thresh):
depths = []
projs = []
rgbs = []
for view in range(49):
img_filename = "{}/{}/images/{:08d}.jpg".format(depth_path, scene_name, view)
cam_filename = "{}/{}/cams/{:08d}_cam.txt".format(depth_path, scene_name, view)
depth_filename = "{}/{}/depth_est/{:08d}.pfm".format(depth_path, scene_name, view)
confidence_filename = "{}/{}/confidence/{:08d}.pfm".format(depth_path, scene_name, view)
extr_mat, intr_mat = parse_cameras(cam_filename)
proj_mat = np.eye(4)
proj_mat[:3, :4] = np.dot(intr_mat[:3, :3], extr_mat[:3, :4])
projs.append(torch.from_numpy(proj_mat))
dep_map, _ = read_pfm(depth_filename)
h, w = dep_map.shape
conf_map, _ = read_pfm(confidence_filename)
conf_map = cv2.resize(conf_map, (w, h), interpolation=cv2.INTER_LINEAR)
dep_map = dep_map * (conf_map>thresh).astype(np.float32)
depths.append(torch.from_numpy(dep_map).unsqueeze(0))
rgb = np.array(Image.open(img_filename))
rgbs.append(rgb)
depths = torch.stack(depths).float()
projs = torch.stack(projs).float()
if args.device == 'cuda' and torch.cuda.is_available():
depths = depths.cuda()
projs = projs.cuda()
return depths, projs, rgbs
def extract_points(pc, mask, rgb):
pc = pc.cpu().numpy()
mask = mask.cpu().numpy()
mask = np.reshape(mask, (-1,))
pc = np.reshape(pc, (-1, 3))
rgb = np.reshape(rgb, (-1, 3))
points = pc[np.where(mask)]
colors = rgb[np.where(mask)]
points_with_color = np.concatenate([points, colors], axis=1)
return points_with_color
def open3d_filter():
with torch.no_grad():
mkdir_p(args.ply_path)
all_scenes = open(args.data_list, 'r').readlines()
all_scenes = list(map(str.strip, all_scenes))
for i, scene in enumerate(all_scenes):
print("{}/{} {}:".format(i, len(all_scenes), scene), '------------------------')
depths, projs, rgbs = load_data(args.root_path, args.depth_path, scene, args.prob_thresh)
tot_frame = depths.shape[0]
height, width = depths.shape[2], depths.shape[3]
points = []
print('Scene: {} total: {} frames'.format(scene, tot_frame))
for i in range(tot_frame):
pc_buff = torch.zeros((3, height, width), device=depths.device, dtype=depths.dtype)
val_cnt = torch.zeros((1, height, width), device=depths.device, dtype=depths.dtype)
j = 0
batch_size = 20
while True:
ref_pc, pcs, dist = filter_depth(ref_depth=depths[i:i+1], src_depths=depths[j:min(j+batch_size, tot_frame)],
ref_proj=projs[i:i+1], src_projs=projs[j:min(j+batch_size, tot_frame)])
masks = (dist < args.dist_thresh).float()
masked_pc = pcs * masks
pc_buff += masked_pc.sum(dim=0, keepdim=False)
val_cnt += masks.sum(dim=0, keepdim=False)
j += batch_size
if j >= tot_frame:
break
final_mask = (val_cnt >= args.num_consist).squeeze(0)
avg_points = torch.div(pc_buff, val_cnt).permute(1, 2, 0)
final_pc = extract_points(avg_points, final_mask, rgbs[i])
points.append(final_pc)
if i==0 or i==tot_frame-1:
print('Processing {} {}/{} ...'.format(scene, i+1, tot_frame))
ply_id = int(scene[4:])
write_ply('{}/mvsnet{:03d}.ply'.format(args.ply_path, ply_id), np.concatenate(points, axis=0))
del points, depths, rgbs, projs
gc.collect()
print('Save {}/mvsnet{:03d}.ply successful.'.format(args.ply_path, ply_id))
if __name__ == '__main__':
open3d_filter()
================================================
FILE: fusions/dtu/gipuma.py
================================================
# -*- coding: utf-8 -*-
# @Description: Point cloud fusion strategy for DTU dataset: Gipuma (fusibile).
# Refer to: https://github.com/YoYo000/MVSNet/blob/master/mvsnet/depthfusion.py
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07
from __future__ import print_function
import os, re, sys, shutil
from struct import *
import numpy as np
import argparse
import cv2
from tensorflow.python.lib.io import file_io
parser = argparse.ArgumentParser()
parser.add_argument('--root_dir', type=str, default='[/path/to]/dtu-test-1200', help='root directory of dtu dataset')
parser.add_argument('--list_file', type=str, default='datasets/lists/dtu/train.txt', help='file contains the scans list')
parser.add_argument('--depth_folder', type=str, default = './outputs/')
parser.add_argument('--out_folder', type=str, default = 'fusibile_fused')
parser.add_argument('--plydir', type=str, default='')
parser.add_argument('--quandir', type=str, default='')
parser.add_argument('--fusibile_exe_path', type=str, default = 'fusion/fusibile')
parser.add_argument('--prob_threshold', type=float, default = '0.8')
parser.add_argument('--disp_threshold', type=float, default = '0.13')
parser.add_argument('--num_consistent', type=float, default = '3')
parser.add_argument('--downsample_factor', type=int, default='1')
args = parser.parse_args()
# preprocess ====================================
def load_cam(file, interval_scale=1):
""" read camera txt file """
cam = np.zeros((2, 4, 4))
words = file.read().split()
# read extrinsic
for i in range(0, 4):
for j in range(0, 4):
extrinsic_index = 4 * i + j + 1
cam[0][i][j] = words[extrinsic_index]
# read intrinsic
for i in range(0, 3):
for j in range(0, 3):
intrinsic_index = 3 * i + j + 18
cam[1][i][j] = words[intrinsic_index]
if len(words) == 29:
cam[1][3][0] = words[27]
cam[1][3][1] = float(words[28]) * interval_scale
cam[1][3][2] = 1100
cam[1][3][3] = cam[1][3][0] + cam[1][3][1] * cam[1][3][2]
elif len(words) == 30:
cam[1][3][0] = words[27]
cam[1][3][1] = float(words[28]) * interval_scale
cam[1][3][2] = words[29]
cam[1][3][3] = cam[1][3][0] + cam[1][3][1] * cam[1][3][2]
elif len(words) == 31:
cam[1][3][0] = words[27]
cam[1][3][1] = float(words[28]) * interval_scale
cam[1][3][2] = words[29]
cam[1][3][3] = words[30]
else:
cam[1][3][0] = 0
cam[1][3][1] = 0
cam[1][3][2] = 0
cam[1][3][3] = 0
return cam
def load_pfm(file):
color = None
width = None
height = None
scale = None
data_type = None
header = file.readline().decode('UTF-8').rstrip()
if header == 'PF':
color = True
elif header == 'Pf':
color = False
else:
raise Exception('Not a PFM file.')
dim_match = re.match(r'^(\d+)\s(\d+)\s$', file.readline().decode('UTF-8'))
if dim_match:
width, height = map(int, dim_match.groups())
else:
raise Exception('Malformed PFM header.')
# scale = float(file.readline().rstrip())
scale = float((file.readline()).decode('UTF-8').rstrip())
if scale < 0: # little-endian
data_type = ' 0, 1, 0))
mask_image = np.reshape(mask_image, (image_shape[0], image_shape[1], 1))
mask_image = np.tile(mask_image, [1, 1, 3])
mask_image = np.float32(mask_image)
normal_image = np.multiply(normal_image, mask_image)
normal_image = np.float32(normal_image)
write_gipuma_dmb(out_normal_path, normal_image)
return
def mvsnet_to_gipuma(scan_folder, scan, root_dir, gipuma_point_folder):
image_folder = os.path.join(root_dir, 'Rectified', scan)
cam_folder = os.path.join(root_dir, 'Cameras')
depth_folder = os.path.join(scan_folder, 'depth_est')
gipuma_cam_folder = os.path.join(gipuma_point_folder, 'cams')
gipuma_image_folder = os.path.join(gipuma_point_folder, 'images')
if not os.path.isdir(gipuma_point_folder):
os.mkdir(gipuma_point_folder)
if not os.path.isdir(gipuma_cam_folder):
os.mkdir(gipuma_cam_folder)
if not os.path.isdir(gipuma_image_folder):
os.mkdir(gipuma_image_folder)
# convert cameras
for view in range(0,49):
in_cam_file = os.path.join(cam_folder, "{:08d}_cam.txt".format(view))
out_cam_file = os.path.join(gipuma_cam_folder, "{:08d}.png.P".format(view))
mvsnet_to_gipuma_cam(in_cam_file, out_cam_file)
# copy images to gipuma image folder
for view in range(0,49):
in_image_file = os.path.join(image_folder, "rect_{:03d}_3_r5000.png".format(view+1))# Our image start from 1
out_image_file = os.path.join(gipuma_image_folder, "{:08d}.png".format(view))
# shutil.copy(in_image_file, out_image_file)
in_image = cv2.imread(in_image_file)
out_image = cv2.resize(in_image, None, fx=1.0/args.downsample_factor, fy=1.0/args.downsample_factor, interpolation=cv2.INTER_LINEAR)
cv2.imwrite(out_image_file, out_image)
# convert depth maps and fake normal maps
gipuma_prefix = '2333__'
for view in range(0,49):
sub_depth_folder = os.path.join(gipuma_point_folder, gipuma_prefix+"{:08d}".format(view))
if not os.path.isdir(sub_depth_folder):
os.mkdir(sub_depth_folder)
in_depth_pfm = os.path.join(depth_folder, "{:08d}_prob_filtered.pfm".format(view))
out_depth_dmb = os.path.join(sub_depth_folder, 'disp.dmb')
fake_normal_dmb = os.path.join(sub_depth_folder, 'normals.dmb')
mvsnet_to_gipuma_dmb(in_depth_pfm, out_depth_dmb)
fake_gipuma_normal(out_depth_dmb, fake_normal_dmb)
def probability_filter(scan_folder, prob_threshold):
depth_folder = os.path.join(scan_folder, 'depth_est')
prob_folder = os.path.join(scan_folder, 'confidence')
# convert cameras
for view in range(0,49):
init_depth_map_path = os.path.join(depth_folder, "{:08d}.pfm".format(view)) # New dataset outputs depth start from 0.
prob_map_path = os.path.join(prob_folder, "{:08d}.pfm".format(view)) # Same as above
out_depth_map_path = os.path.join(depth_folder, "{:08d}_prob_filtered.pfm".format(view)) # Gipuma start from 0
depth_map = load_pfm(open(init_depth_map_path))
prob_map = load_pfm(open(prob_map_path))
depth_map[prob_map < prob_threshold] = 0
write_pfm(out_depth_map_path, depth_map)
def depth_map_fusion(point_folder, fusibile_exe_path, disp_thresh, num_consistent):
cam_folder = os.path.join(point_folder, 'cams')
image_folder = os.path.join(point_folder, 'images')
depth_min = 0.001
depth_max = 100000
normal_thresh = 360
cmd = fusibile_exe_path
cmd = cmd + ' -input_folder ' + point_folder + '/'
cmd = cmd + ' -p_folder ' + cam_folder + '/'
cmd = cmd + ' -images_folder ' + image_folder + '/'
cmd = cmd + ' --depth_min=' + str(depth_min)
cmd = cmd + ' --depth_max=' + str(depth_max)
cmd = cmd + ' --normal_thresh=' + str(normal_thresh)
cmd = cmd + ' --disp_thresh=' + str(disp_thresh)
cmd = cmd + ' --num_consistent=' + str(num_consistent)
print (cmd)
os.system(cmd)
return
def collectPly(point_folder, scan_id):
model_name = 'final3d_model.ply'
model_dir = [item for item in os.listdir(point_folder) if item.startswith("consistencyCheck")][-1]
old = os.path.join(point_folder, model_dir, model_name)
fresh = os.path.join(args.plydir, "mvsnet") + scan_id.zfill(3) + ".ply"
shutil.move(old, fresh)
if __name__ == '__main__':
root_dir = args.root_dir
depth_folder = args.depth_folder
out_folder = args.out_folder
fusibile_exe_path = args.fusibile_exe_path
prob_threshold = args.prob_threshold
disp_threshold = args.disp_threshold
num_consistent = args.num_consistent
# Read test list
testlist = args.list_file
with open(testlist) as f:
scans = f.readlines()
scans = [line.rstrip() for line in scans]
print("Start Gipuma(GPU) fusion!")
if not os.path.isdir(args.plydir):
os.mkdir(args.plydir)
# Fusion
for i, scan in enumerate(scans):
print("{}/{} {}:".format(i, len(scans), scan), '------------------------')
scan_folder = os.path.join(depth_folder, scan)
fusibile_workspace = os.path.join(depth_folder, out_folder, scan)
if not os.path.isdir(os.path.join(depth_folder, out_folder)):
os.mkdir(os.path.join(depth_folder, out_folder))
if not os.path.isdir(fusibile_workspace):
os.mkdir(fusibile_workspace)
# probability filtering
print ('filter depth map with probability map')
probability_filter(scan_folder, prob_threshold)
# convert to gipuma format
print ('Convert mvsnet output to gipuma input')
mvsnet_to_gipuma(scan_folder, scan, root_dir, fusibile_workspace)
# depth map fusion with gipuma
print ('Run depth map fusion & filter')
depth_map_fusion(fusibile_workspace, fusibile_exe_path, disp_threshold, num_consistent)
# collect .ply results to summary folder
print('Collect {} ply'.format(scan))
collectPly(fusibile_workspace, scan[4:])
print("Gipuma(GPU) fusion done!")
shutil.rmtree(os.path.join(depth_folder, out_folder))
print("fusibile_fused remove done!")
================================================
FILE: fusions/dtu/pcd.py
================================================
# -*- coding: utf-8 -*-
# @Description: Point cloud fusion strategy for DTU dataset: Basic PCD.
# Refer to: https://github.com/xy-guo/MVSNet_pytorch/blob/master/eval.py
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07
import argparse, os, sys, cv2, re, logging, time
import numpy as np
from plyfile import PlyData, PlyElement
from PIL import Image
from multiprocessing import Pool
from functools import partial
import signal
parser = argparse.ArgumentParser(description='filter, and fuse')
parser.add_argument('--testpath', default='[/path/to]/dtu-test-1200', help='testing data dir for some scenes')
parser.add_argument('--testlist', default="datasets/lists/dtu/test.txt", help='testing scene list')
parser.add_argument('--outdir', default='./outputs/[exp_name]', help='output dir')
parser.add_argument('--logdir', default='./checkpoints/debug', help='the directory to save checkpoints/logs')
parser.add_argument('--nolog', action='store_true', help='do not logging into .log file')
parser.add_argument('--plydir', default='./outputs/[exp_name]/pcd_fusion_plys/', help='output dir')
parser.add_argument('--num_worker', type=int, default=4, help='depth_filer worker')
parser.add_argument('--conf', type=float, default=0.9, help='prob confidence')
parser.add_argument('--thres_view', type=int, default=5, help='threshold of num view')
args = parser.parse_args()
def read_pfm(filename):
file = open(filename, 'rb')
color = None
width = None
height = None
scale = None
endian = None
header = file.readline().decode('utf-8').rstrip()
if header == 'PF':
color = True
elif header == 'Pf':
color = False
else:
raise Exception('Not a PFM file.')
dim_match = re.match(r'^(\d+)\s(\d+)\s$', file.readline().decode('utf-8'))
if dim_match:
width, height = map(int, dim_match.groups())
else:
raise Exception('Malformed PFM header.')
scale = float(file.readline().rstrip())
if scale < 0: # little-endian
endian = '<'
scale = -scale
else:
endian = '>' # big-endian
data = np.fromfile(file, endian + 'f')
shape = (height, width, 3) if color else (height, width)
data = np.reshape(data, shape)
data = np.flipud(data)
file.close()
return data, scale
def read_camera_parameters(filename):
with open(filename) as f:
lines = f.readlines()
lines = [line.rstrip() for line in lines]
# extrinsics: line [1,5), 4x4 matrix
extrinsics = np.fromstring(' '.join(lines[1:5]), dtype=np.float32, sep=' ').reshape((4, 4))
# intrinsics: line [7-10), 3x3 matrix
intrinsics = np.fromstring(' '.join(lines[7:10]), dtype=np.float32, sep=' ').reshape((3, 3))
return intrinsics, extrinsics
def read_img(filename):
img = Image.open(filename)
# scale 0~255 to 0~1
np_img = np.array(img, dtype=np.float32) / 255.
return np_img
def read_mask(filename):
return read_img(filename) > 0.5
def save_mask(filename, mask):
assert mask.dtype == np.bool
mask = mask.astype(np.uint8) * 255
Image.fromarray(mask).save(filename)
def read_pair_file(filename):
data = []
with open(filename) as f:
num_viewpoint = int(f.readline())
# 49 viewpoints
for view_idx in range(num_viewpoint):
ref_view = int(f.readline().rstrip())
src_views = [int(x) for x in f.readline().rstrip().split()[1::2]]
if len(src_views) > 0:
data.append((ref_view, src_views))
return data
# project the reference point cloud into the source view, then project back
def reproject_with_depth(depth_ref, intrinsics_ref, extrinsics_ref, depth_src, intrinsics_src, extrinsics_src):
width, height = depth_ref.shape[1], depth_ref.shape[0]
## step1. project reference pixels to the source view
# reference view x, y
x_ref, y_ref = np.meshgrid(np.arange(0, width), np.arange(0, height))
x_ref, y_ref = x_ref.reshape([-1]), y_ref.reshape([-1])
# reference 3D space
xyz_ref = np.matmul(np.linalg.inv(intrinsics_ref),
np.vstack((x_ref, y_ref, np.ones_like(x_ref))) * depth_ref.reshape([-1]))
# source 3D space
xyz_src = np.matmul(np.matmul(extrinsics_src, np.linalg.inv(extrinsics_ref)),
np.vstack((xyz_ref, np.ones_like(x_ref))))[:3]
# source view x, y
K_xyz_src = np.matmul(intrinsics_src, xyz_src)
xy_src = K_xyz_src[:2] / K_xyz_src[2:3]
## step2. reproject the source view points with source view depth estimation
# find the depth estimation of the source view
x_src = xy_src[0].reshape([height, width]).astype(np.float32)
y_src = xy_src[1].reshape([height, width]).astype(np.float32)
sampled_depth_src = cv2.remap(depth_src, x_src, y_src, interpolation=cv2.INTER_LINEAR)
# mask = sampled_depth_src > 0
# source 3D space
# NOTE that we should use sampled source-view depth_here to project back
xyz_src = np.matmul(np.linalg.inv(intrinsics_src),
np.vstack((xy_src, np.ones_like(x_ref))) * sampled_depth_src.reshape([-1]))
# reference 3D space
xyz_reprojected = np.matmul(np.matmul(extrinsics_ref, np.linalg.inv(extrinsics_src)),
np.vstack((xyz_src, np.ones_like(x_ref))))[:3]
# source view x, y, depth
depth_reprojected = xyz_reprojected[2].reshape([height, width]).astype(np.float32)
K_xyz_reprojected = np.matmul(intrinsics_ref, xyz_reprojected)
xy_reprojected = K_xyz_reprojected[:2] / K_xyz_reprojected[2:3]
x_reprojected = xy_reprojected[0].reshape([height, width]).astype(np.float32)
y_reprojected = xy_reprojected[1].reshape([height, width]).astype(np.float32)
return depth_reprojected, x_reprojected, y_reprojected, x_src, y_src
def check_geometric_consistency(depth_ref, intrinsics_ref, extrinsics_ref, depth_src, intrinsics_src, extrinsics_src):
width, height = depth_ref.shape[1], depth_ref.shape[0]
x_ref, y_ref = np.meshgrid(np.arange(0, width), np.arange(0, height))
depth_reprojected, x2d_reprojected, y2d_reprojected, x2d_src, y2d_src = reproject_with_depth(depth_ref, intrinsics_ref, extrinsics_ref,
depth_src, intrinsics_src, extrinsics_src)
# check |p_reproj-p_1| < 1
dist = np.sqrt((x2d_reprojected - x_ref) ** 2 + (y2d_reprojected - y_ref) ** 2)
# check |d_reproj-d_1| / d_1 < 0.01
depth_diff = np.abs(depth_reprojected - depth_ref)
relative_depth_diff = depth_diff / depth_ref
mask = np.logical_and(dist < 1, relative_depth_diff < 0.01)
depth_reprojected[~mask] = 0
return mask, depth_reprojected, x2d_src, y2d_src
def filter_depth(pair_folder, scan_folder, out_folder, plyfilename):
# the pair file
pair_file = os.path.join(pair_folder, "pair.txt")
# for the final point cloud
vertexs = []
vertex_colors = []
pair_data = read_pair_file(pair_file)
# for each reference view and the corresponding source views
for ref_view, src_views in pair_data:
# src_views = src_views[:args.num_view]
# load the camera parameters
ref_intrinsics, ref_extrinsics = read_camera_parameters(
os.path.join(scan_folder, 'cams/{:0>8}_cam.txt'.format(ref_view)))
# load the reference image
ref_img = read_img(os.path.join(scan_folder, 'images/{:0>8}.jpg'.format(ref_view)))
# load the estimated depth of the reference view
ref_depth_est = read_pfm(os.path.join(out_folder, 'depth_est/{:0>8}.pfm'.format(ref_view)))[0]
# load the photometric mask of the reference view
confidence = read_pfm(os.path.join(out_folder, 'confidence/{:0>8}.pfm'.format(ref_view)))[0]
photo_mask = confidence > args.conf
all_srcview_depth_ests = []
all_srcview_x = []
all_srcview_y = []
all_srcview_geomask = []
# compute the geometric mask
geo_mask_sum = 0
for src_view in src_views:
# camera parameters of the source view
src_intrinsics, src_extrinsics = read_camera_parameters(
os.path.join(scan_folder, 'cams/{:0>8}_cam.txt'.format(src_view)))
# the estimated depth of the source view
src_depth_est = read_pfm(os.path.join(out_folder, 'depth_est/{:0>8}.pfm'.format(src_view)))[0]
geo_mask, depth_reprojected, x2d_src, y2d_src = check_geometric_consistency(ref_depth_est, ref_intrinsics, ref_extrinsics,
src_depth_est,
src_intrinsics, src_extrinsics)
geo_mask_sum += geo_mask.astype(np.int32)
all_srcview_depth_ests.append(depth_reprojected)
all_srcview_x.append(x2d_src)
all_srcview_y.append(y2d_src)
all_srcview_geomask.append(geo_mask)
depth_est_averaged = (sum(all_srcview_depth_ests) + ref_depth_est) / (geo_mask_sum + 1)
# at least 3 source views matched
geo_mask = geo_mask_sum >= args.thres_view
final_mask = np.logical_and(photo_mask, geo_mask)
os.makedirs(os.path.join(out_folder, "mask"), exist_ok=True)
save_mask(os.path.join(out_folder, "mask/{:0>8}_photo.png".format(ref_view)), photo_mask)
save_mask(os.path.join(out_folder, "mask/{:0>8}_geo.png".format(ref_view)), geo_mask)
save_mask(os.path.join(out_folder, "mask/{:0>8}_final.png".format(ref_view)), final_mask)
logger.info("processing {}, ref-view{:0>2}, photo/geo/final-mask:{:.3f}/{:.3f}/{:.3f}".format(scan_folder, ref_view,
photo_mask.mean(),
geo_mask.mean(), final_mask.mean()))
height, width = depth_est_averaged.shape[:2]
x, y = np.meshgrid(np.arange(0, width), np.arange(0, height))
# valid_points = np.logical_and(final_mask, ~used_mask[ref_view])
valid_points = final_mask
logger.info("valid_points: {}".format(valid_points.mean()))
x, y, depth = x[valid_points], y[valid_points], depth_est_averaged[valid_points]
#color = ref_img[1:-16:4, 1::4, :][valid_points] # hardcoded for DTU dataset
color = ref_img[valid_points]
xyz_ref = np.matmul(np.linalg.inv(ref_intrinsics),
np.vstack((x, y, np.ones_like(x))) * depth)
xyz_world = np.matmul(np.linalg.inv(ref_extrinsics),
np.vstack((xyz_ref, np.ones_like(x))))[:3]
vertexs.append(xyz_world.transpose((1, 0)))
vertex_colors.append((color * 255).astype(np.uint8))
vertexs = np.concatenate(vertexs, axis=0)
vertex_colors = np.concatenate(vertex_colors, axis=0)
vertexs = np.array([tuple(v) for v in vertexs], dtype=[('x', 'f4'), ('y', 'f4'), ('z', 'f4')])
vertex_colors = np.array([tuple(v) for v in vertex_colors], dtype=[('red', 'u1'), ('green', 'u1'), ('blue', 'u1')])
vertex_all = np.empty(len(vertexs), vertexs.dtype.descr + vertex_colors.dtype.descr)
for prop in vertexs.dtype.names:
vertex_all[prop] = vertexs[prop]
for prop in vertex_colors.dtype.names:
vertex_all[prop] = vertex_colors[prop]
el = PlyElement.describe(vertex_all, 'vertex')
PlyData([el]).write(plyfilename)
logger.info("saving the final model to " + plyfilename)
def init_worker():
'''
Catch Ctrl+C signal to termiante workers
'''
signal.signal(signal.SIGINT, signal.SIG_IGN)
def pcd_filter_worker(scan):
scan_id = int(scan[4:])
save_name = 'mvsnet{:0>3}.ply'.format(scan_id)
pair_folder = os.path.join(args.testpath, "Cameras")
scan_folder = os.path.join(args.outdir, scan)
out_folder = os.path.join(args.outdir, scan)
filter_depth(pair_folder, scan_folder, out_folder, os.path.join(args.plydir, save_name))
def pcd_filter(testlist, number_worker):
partial_func = partial(pcd_filter_worker)
p = Pool(number_worker, init_worker)
try:
p.map(partial_func, testlist)
except KeyboardInterrupt:
logger.info("....\nCaught KeyboardInterrupt, terminating workers")
p.terminate()
else:
p.close()
p.join()
def initLogger():
logger = logging.getLogger()
logger.setLevel(logging.INFO)
curTime = time.strftime('%Y%m%d-%H%M', time.localtime(time.time()))
if not os.path.isdir(args.logdir):
os.mkdir(args.logdir)
logfile = os.path.join(args.logdir, 'fusion-' + curTime + '.log')
formatter = logging.Formatter("%(asctime)s - %(filename)s[line:%(lineno)d] - %(levelname)s: %(message)s")
if not args.nolog:
fileHandler = logging.FileHandler(logfile, mode='a')
fileHandler.setFormatter(formatter)
logger.addHandler(fileHandler)
consoleHandler = logging.StreamHandler(sys.stdout)
consoleHandler.setFormatter(formatter)
logger.addHandler(consoleHandler)
logger.info("Logger initialized.")
logger.info("Writing logs to file: {}".format(logfile))
logger.info("Current time: {}".format(curTime))
return logger
if __name__ == '__main__':
logger = initLogger()
if not os.path.isdir(args.plydir):
os.mkdir(args.plydir)
with open(args.testlist) as f:
content = f.readlines()
testlist = [line.rstrip() for line in content]
pcd_filter(testlist, args.num_worker)
================================================
FILE: fusions/tnt/dypcd.py
================================================
# -*- coding: utf-8 -*-
# @Description: Point cloud fusion strategy for Tanks and Temples dataset: DYnamic PCD.
# Refer to: https://github.com/yhw-yhw/D2HC-RMVSNet/blob/master/fusion.py
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07
import os
import cv2
import signal
import numpy as np
from PIL import Image
from functools import partial
from multiprocessing import Pool
from plyfile import PlyData, PlyElement
import argparse
import re, json
from sklearn.preprocessing import scale
parser = argparse.ArgumentParser()
parser.add_argument("--root_dir", type=str, default="[/path/to/]tankandtemples/")
parser.add_argument('--out_dir', type=str, default='outputs/[exp_name]')
parser.add_argument('--ply_path', type=str, default='outputs/[exp_name]/dypcd_fusion_plys')
parser.add_argument('--split', type=str, default='intermediate', choices=['intermediate', 'advanced'])
parser.add_argument('--list_file', type=str, default='datasets/lists/tnt/intermediate.txt')
parser.add_argument('--num_workers', type=int, default=1)
parser.add_argument('--single_processor', action='store_true')
parser.add_argument('--rescale', action='store_true')
parser.add_argument('--max_w', type=int)
parser.add_argument('--max_h', type=int)
parser.add_argument('--cam_mode', type=str, default='origin', choices=['origin', 'short_range'])
parser.add_argument('--img_mode', type=str, default='resize', choices=['resize', 'crop'])
parser.add_argument('--dist_base', type=float, default=1 / 4)
parser.add_argument('--rel_diff_base', type=float, default=1 / 1300)
args = parser.parse_args()
tnt_fusion_exps = [
{
"ply_path": "dypcd_fusion_plys_mean",
"param_strategy": "mean",
},
{
"ply_path": "dypcd_fusion_plys",
"param_strategy": "hyper_param",
"hyper_param_table": { # -1 -> mean()
'Family': 0.6,
'Francis': 0.6,
'Horse': 0.2,
'Lighthouse': 0.7,
'M60': 0.6,
'Panther': 0.6,
'Playground': 0.7,
'Train': 0.6,
'Auditorium': 0.1,
'Ballroom': 0.4,
'Courtroom': 0.4,
'Museum': 0.5,
'Palace': 0.5,
'Temple': 0.4
}
},
]
def read_pfm(filename):
file = open(filename, 'rb')
color = None
width = None
height = None
scale = None
endian = None
header = file.readline().decode('utf-8').rstrip()
if header == 'PF':
color = True
elif header == 'Pf':
color = False
else:
raise Exception('Not a PFM file.')
dim_match = re.match(r'^(\d+)\s(\d+)\s$', file.readline().decode('utf-8'))
if dim_match:
width, height = map(int, dim_match.groups())
else:
raise Exception('Malformed PFM header.')
scale = float(file.readline().rstrip())
if scale < 0: # little-endian
endian = '<'
scale = -scale
else:
endian = '>' # big-endian
data = np.fromfile(file, endian + 'f')
shape = (height, width, 3) if color else (height, width)
data = np.reshape(data, shape)
data = np.flipud(data)
file.close()
return data, scale
# save a binary mask
def save_mask(filename, mask):
assert mask.dtype == np.bool
mask = mask.astype(np.uint8) * 255
Image.fromarray(mask).save(filename)
# read an image
def read_img(filename):
img = Image.open(filename)
# scale 0~255 to 0~1
np_img = np.array(img, dtype=np.float32) / 255.
return np_img
# read intrinsics and extrinsics
def read_camera_parameters(filename):
with open(filename) as f:
lines = f.readlines()
lines = [line.rstrip() for line in lines]
# extrinsics: line [1,5), 4x4 matrix
extrinsics = np.fromstring(' '.join(lines[1:5]), dtype=np.float32, sep=' ').reshape((4, 4))
# intrinsics: line [7-10), 3x3 matrix
intrinsics = np.fromstring(' '.join(lines[7:10]), dtype=np.float32, sep=' ').reshape((3, 3))
# TODO: assume the feature is 1/4 of the original image size
# intrinsics[:2, :] /= 4
return intrinsics, extrinsics
# read a pair file, [(ref_view1, [src_view1-1, ...]), (ref_view2, [src_view2-1, ...]), ...]
def read_pair_file(filename):
data = []
with open(filename) as f:
num_viewpoint = int(f.readline())
# 49 viewpoints
for view_idx in range(num_viewpoint):
ref_view = int(f.readline().rstrip())
src_views = [int(x) for x in f.readline().rstrip().split()[1::2]]
if len(src_views) > 0:
data.append((ref_view, src_views))
return data
# project the reference point cloud into the source view, then project back
def reproject_with_depth(depth_ref, intrinsics_ref, extrinsics_ref, depth_src, intrinsics_src, extrinsics_src):
width, height = depth_ref.shape[1], depth_ref.shape[0]
## step1. project reference pixels to the source view
# reference view x, y
x_ref, y_ref = np.meshgrid(np.arange(0, width), np.arange(0, height))
x_ref, y_ref = x_ref.reshape([-1]), y_ref.reshape([-1])
# reference 3D space
xyz_ref = np.matmul(np.linalg.inv(intrinsics_ref),
np.vstack((x_ref, y_ref, np.ones_like(x_ref))) * depth_ref.reshape([-1]))
# source 3D space
xyz_src = np.matmul(np.matmul(extrinsics_src, np.linalg.inv(extrinsics_ref)),
np.vstack((xyz_ref, np.ones_like(x_ref))))[:3]
# source view x, y
K_xyz_src = np.matmul(intrinsics_src, xyz_src)
xy_src = K_xyz_src[:2] / K_xyz_src[2:3]
## step2. reproject the source view points with source view depth estimation
# find the depth estimation of the source view
x_src = xy_src[0].reshape([height, width]).astype(np.float32)
y_src = xy_src[1].reshape([height, width]).astype(np.float32)
sampled_depth_src = cv2.remap(depth_src, x_src, y_src, interpolation=cv2.INTER_LINEAR)
# mask = sampled_depth_src > 0
# source 3D space
# NOTE that we should use sampled source-view depth_here to project back
xyz_src = np.matmul(np.linalg.inv(intrinsics_src),
np.vstack((xy_src, np.ones_like(x_ref))) * sampled_depth_src.reshape([-1]))
# reference 3D space
xyz_reprojected = np.matmul(np.matmul(extrinsics_ref, np.linalg.inv(extrinsics_src)),
np.vstack((xyz_src, np.ones_like(x_ref))))[:3]
# source view x, y, depth
depth_reprojected = xyz_reprojected[2].reshape([height, width]).astype(np.float32)
K_xyz_reprojected = np.matmul(intrinsics_ref, xyz_reprojected)
K_xyz_reprojected[2:3][K_xyz_reprojected[2:3]==0] += 0.00001
xy_reprojected = K_xyz_reprojected[:2] / K_xyz_reprojected[2:3]
x_reprojected = xy_reprojected[0].reshape([height, width]).astype(np.float32)
y_reprojected = xy_reprojected[1].reshape([height, width]).astype(np.float32)
return depth_reprojected, x_reprojected, y_reprojected, x_src, y_src
def check_geometric_consistency(depth_ref, intrinsics_ref, extrinsics_ref, depth_src, intrinsics_src, extrinsics_src):
width, height = depth_ref.shape[1], depth_ref.shape[0]
x_ref, y_ref = np.meshgrid(np.arange(0, width), np.arange(0, height))
depth_reprojected, x2d_reprojected, y2d_reprojected, x2d_src, y2d_src = reproject_with_depth(depth_ref, intrinsics_ref, extrinsics_ref,
depth_src, intrinsics_src, extrinsics_src)
# check |p_reproj-p_1| < 1
dist = np.sqrt((x2d_reprojected - x_ref) ** 2 + (y2d_reprojected - y_ref) ** 2)
# check |d_reproj-d_1| / d_1 < 0.01
depth_diff = np.abs(depth_reprojected - depth_ref)
relative_depth_diff = depth_diff / depth_ref
mask = None
masks = []
for i in range(2, 11):
# mask = np.logical_and(dist < i / 4, relative_depth_diff < i / 1300)
mask = np.logical_and(dist < i * args.dist_base, relative_depth_diff < i * args.rel_diff_base)
masks.append(mask)
depth_reprojected[~mask] = 0
return masks, mask, depth_reprojected, x2d_src, y2d_src
def scale_input(intrinsics, img):
if args.img_mode == "crop":
intrinsics[1,2] = intrinsics[1,2] - 28 # 1080 -> 1024
img = img[28:1080-28, :, :]
elif args.img_mode == "resize":
height, width = img.shape[:2]
img = cv2.resize(img, (width, 1024))
scale_h = 1.0 * 1024 / height
intrinsics[1, :] *= scale_h
return intrinsics, img
def filter_depth(scene, root_dir, split, out_dir, plyfilename, fusion_exp):
# num_stage = len(args.ndepths)
# the pair file
pair_file = os.path.join(root_dir, split, scene, "pair.txt")
# for the final point cloud
vertexs = []
vertex_colors = []
pair_data = read_pair_file(pair_file)
nviews = len(pair_data)
# for each reference view and the corresponding source views
for ref_view, src_views in pair_data:
# src_views = src_views[:args.num_view]
# load the camera parameters
if args.cam_mode == 'short_range':
ref_intrinsics, ref_extrinsics = read_camera_parameters(
os.path.join(root_dir, split, scene, 'cams_{}/{:0>8}_cam.txt'.format(scene.lower(), ref_view)))
elif args.cam_mode == 'origin':
ref_intrinsics, ref_extrinsics = read_camera_parameters(
os.path.join(root_dir, split, scene, 'cams/{:0>8}_cam.txt'.format(ref_view)))
ref_img = read_img(os.path.join(root_dir, split, scene, 'images/{:0>8}.jpg'.format(ref_view)))
ref_depth_est = read_pfm(os.path.join(out_dir, scene, 'depth_est/{:0>8}.pfm'.format(ref_view)))[0]
confidence = read_pfm(os.path.join(out_dir, scene, 'confidence/{:0>8}.pfm'.format(ref_view)))[0]
if fusion_exp['param_strategy'] == 'mean':
if ref_view % 50 == 0: print("-- thresh: {}".format(confidence.mean()))
photo_mask = confidence > confidence.mean()
elif fusion_exp['param_strategy'] == 'hyper_param':
conf_thresh = fusion_exp['hyper_param_table'][scene]
if conf_thresh == -1:
photo_mask = confidence > confidence.mean()
if ref_view % 50 == 0: print("-- thresh: mean() {}".format(confidence.mean()))
else:
photo_mask = confidence > conf_thresh
if ref_view % 50 == 0: print("-- thresh: {}".format(conf_thresh))
flag_img = ref_img
ref_intrinsics, _ = scale_input(ref_intrinsics, flag_img)
all_srcview_depth_ests = []
all_srcview_x = []
all_srcview_y = []
all_srcview_geomask = []
# compute the geometric mask
geo_mask_sum = 0
dy_range = len(src_views) + 1
geo_mask_sums = [0] * (dy_range - 2)
for src_view in src_views:
# camera parameters of the source view
if args.cam_mode == 'short_range':
src_intrinsics, src_extrinsics = read_camera_parameters(
os.path.join(root_dir, split, scene, 'cams_{}/{:0>8}_cam.txt'.format(scene.lower(), src_view)))
elif args.cam_mode == 'origin':
src_intrinsics, src_extrinsics = read_camera_parameters(
os.path.join(root_dir, split, scene, 'cams/{:0>8}_cam.txt'.format(src_view)))
# the estimated depth of the source view
src_depth_est = read_pfm(os.path.join(out_dir, scene, 'depth_est/{:0>8}.pfm'.format(src_view)))[0]
src_intrinsics, _ = scale_input(src_intrinsics, flag_img)
masks, geo_mask, depth_reprojected, x2d_src, y2d_src = check_geometric_consistency(ref_depth_est, ref_intrinsics,
ref_extrinsics, src_depth_est,
src_intrinsics, src_extrinsics)
geo_mask_sum += geo_mask.astype(np.int32)
for i in range(2, dy_range):
geo_mask_sums[i - 2] += masks[i - 2].astype(np.int32)
all_srcview_depth_ests.append(depth_reprojected)
all_srcview_x.append(x2d_src)
all_srcview_y.append(y2d_src)
all_srcview_geomask.append(geo_mask)
depth_est_averaged = (sum(all_srcview_depth_ests) + ref_depth_est) / (geo_mask_sum + 1)
# at least args.thres_view source views matched
geo_mask = geo_mask_sum >= dy_range
for i in range(2, dy_range):
geo_mask = np.logical_or(geo_mask, geo_mask_sums[i - 2] >= i)
final_mask = np.logical_and(photo_mask, geo_mask)
if ref_view < 3:
os.makedirs(os.path.join(out_dir, scene, "mask"), exist_ok=True)
save_mask(os.path.join(out_dir, scene, "mask/{:0>8}_photo.png".format(ref_view)), photo_mask)
save_mask(os.path.join(out_dir, scene, "mask/{:0>8}_geo.png".format(ref_view)), geo_mask)
save_mask(os.path.join(out_dir, scene, "mask/{:0>8}_final.png".format(ref_view)), final_mask)
print("processing {}, ref-view{:0>2}, photo/geo/final-mask:{:.3f}/{:.3f}/{:.3f}".format(os.path.join(out_dir, scene), ref_view,
photo_mask.mean(),
geo_mask.mean(), final_mask.mean()))
height, width = depth_est_averaged.shape[:2]
x, y = np.meshgrid(np.arange(0, width), np.arange(0, height))
# valid_points = np.logical_and(final_mask, ~used_mask[ref_view])
valid_points = final_mask
print("valid_points {:.3f}".format(valid_points.mean()))
x, y, depth = x[valid_points], y[valid_points], depth_est_averaged[valid_points]
# color = ref_img[:-24, :, :][valid_points]
color = ref_img[28:1080-28, :, :][valid_points]
xyz_ref = np.matmul(np.linalg.inv(ref_intrinsics),
np.vstack((x, y, np.ones_like(x))) * depth)
xyz_world = np.matmul(np.linalg.inv(ref_extrinsics),
np.vstack((xyz_ref, np.ones_like(x))))[:3]
vertexs.append(xyz_world.transpose((1, 0)))
vertex_colors.append((color * 255).astype(np.uint8))
vertexs = np.concatenate(vertexs, axis=0)
vertex_colors = np.concatenate(vertex_colors, axis=0)
vertexs = np.array([tuple(v) for v in vertexs], dtype=[('x', 'f4'), ('y', 'f4'), ('z', 'f4')])
vertex_colors = np.array([tuple(v) for v in vertex_colors], dtype=[('red', 'u1'), ('green', 'u1'), ('blue', 'u1')])
vertex_all = np.empty(len(vertexs), vertexs.dtype.descr + vertex_colors.dtype.descr)
for prop in vertexs.dtype.names:
vertex_all[prop] = vertexs[prop]
for prop in vertex_colors.dtype.names:
vertex_all[prop] = vertex_colors[prop]
el = PlyElement.describe(vertex_all, 'vertex')
PlyData([el]).write(plyfilename)
print("saving the final model to", plyfilename)
def dypcd_filter_worker(scene):
save_name = '{}.ply'.format(scene)
filter_depth(scene, args.root_dir, args.split, args.out_dir, os.path.join(args.out_dir, fusion_exp['ply_path'], save_name), fusion_exp)
def init_worker():
signal.signal(signal.SIGINT, signal.SIG_IGN)
if __name__ == '__main__':
with open(os.path.join(args.list_file)) as f:
testlist = [line.rstrip() for line in f.readlines()]
for fusion_exp in tnt_fusion_exps:
if not os.path.isdir(os.path.join(args.out_dir, fusion_exp['ply_path'])):
os.mkdir(os.path.join(args.out_dir, fusion_exp['ply_path']))
if args.single_processor:
for scene in testlist:
save_name = '{}.ply'.format(scene)
filter_depth(scene, args.root_dir, args.split, args.out_dir, os.path.join(args.out_dir, fusion_exp['ply_path'], save_name), fusion_exp)
else:
partial_func = partial(dypcd_filter_worker)
p = Pool(args.num_workers, init_worker)
try:
p.map(partial_func, testlist)
except KeyboardInterrupt:
print("....\nCaught KeyboardInterrupt, terminating workers")
p.terminate()
else:
p.close()
p.join()
================================================
FILE: models/__init__.py
================================================
from models.geomvsnet import GeoMVSNet
from models.loss import geomvsnet_loss
================================================
FILE: models/filter.py
================================================
# -*- coding: utf-8 -*-
# @Description: Basic implementation of Frequency Domain Filtering strategy (Sec 3.2 in the paper).
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07
import torch
import numpy as np
import matplotlib.pyplot as plt
def frequency_domain_filter(depth, rho_ratio):
"""
large rho_ratio -> more information filtered
"""
f = torch.fft.fft2(depth)
fshift = torch.fft.fftshift(f)
b, h, w = depth.shape
k_h, k_w = h/rho_ratio, w/rho_ratio
fshift[:,:int(h/2-k_h/2),:] = 0
fshift[:,int(h/2+k_h/2):,:] = 0
fshift[:,:,:int(w/2-k_w/2)] = 0
fshift[:,:,int(w/2+k_w/2):] = 0
ishift = torch.fft.ifftshift(fshift)
idepth = torch.fft.ifft2(ishift)
depth_filtered = torch.abs(idepth)
return depth_filtered
def visual_fft_fig(fshift):
fft_fig = torch.abs(20 * torch.log(fshift))
plt.figure(figsize=(10, 10))
plt.subplot(121)
plt.imshow(fft_fig[0,:,:], cmap = 'gray')
================================================
FILE: models/geometry.py
================================================
# -*- coding: utf-8 -*-
# @Description: Geometric Prior Guided Feature Fusion & Probability Volume Geometry Embedding (Sec 3.1 in the paper).
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07
import math
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from models.submodules import ConvBnReLU3D
class GeoFeatureFusion(nn.Module):
def __init__(self, convolutional_layer_encoding="z", mask_type="basic", add_origin_feat_flag=True):
super(GeoFeatureFusion, self).__init__()
self.convolutional_layer_encoding = convolutional_layer_encoding # std / uv / z / xyz
self.mask_type = mask_type # basic / mean
self.add_origin_feat_flag = add_origin_feat_flag # True / False
if self.convolutional_layer_encoding == "std":
self.geoplanes = 0
elif self.convolutional_layer_encoding == "uv":
self.geoplanes = 2
elif self.convolutional_layer_encoding == "z":
self.geoplanes = 1
elif self.convolutional_layer_encoding == "xyz":
self.geoplanes = 3
self.geofeature = GeometryFeature()
# rgb encoder
self.rgb_conv_init = convbnrelu(in_channels=4, out_channels=8, kernel_size=5, stride=1, padding=2)
self.rgb_encoder_layer1 = BasicBlockGeo(inplanes=8, planes=16, stride=2, geoplanes=self.geoplanes)
self.rgb_encoder_layer2 = BasicBlockGeo(inplanes=16, planes=32, stride=1, geoplanes=self.geoplanes)
self.rgb_encoder_layer3 = BasicBlockGeo(inplanes=32, planes=64, stride=2, geoplanes=self.geoplanes)
self.rgb_encoder_layer4 = BasicBlockGeo(inplanes=64, planes=128, stride=1, geoplanes=self.geoplanes)
self.rgb_encoder_layer5 = BasicBlockGeo(inplanes=128, planes=256, stride=2, geoplanes=self.geoplanes)
self.rgb_decoder_layer4 = deconvbnrelu(in_channels=256, out_channels=128, kernel_size=5, stride=2, padding=2, output_padding=1)
self.rgb_decoder_layer2 = deconvbnrelu(in_channels=128, out_channels=32, kernel_size=5, stride=2, padding=2, output_padding=1)
self.rgb_decoder_layer0 = deconvbnrelu(in_channels=32, out_channels=16, kernel_size=3, stride=1, padding=1, output_padding=0)
self.rgb_decoder_layer= deconvbnrelu(in_channels=16, out_channels=8, kernel_size=5, stride=2, padding=2, output_padding=1)
self.rgb_decoder_output = deconvbnrelu(in_channels=8, out_channels=2, kernel_size=3, stride=1, padding=1, output_padding=0)
# depth encoder
self.depth_conv_init = convbnrelu(in_channels=2, out_channels=8, kernel_size=5, stride=1, padding=2)
self.depth_layer1 = BasicBlockGeo(inplanes=8, planes=16, stride=2, geoplanes=self.geoplanes)
self.depth_layer2 = BasicBlockGeo(inplanes=16, planes=32, stride=1, geoplanes=self.geoplanes)
self.depth_layer3 = BasicBlockGeo(inplanes=64, planes=64, stride=2, geoplanes=self.geoplanes)
self.depth_layer4 = BasicBlockGeo(inplanes=64, planes=128, stride=1, geoplanes=self.geoplanes)
self.depth_layer5 = BasicBlockGeo(inplanes=256, planes=256, stride=2, geoplanes=self.geoplanes)
self.decoder_layer3 = deconvbnrelu(in_channels=256, out_channels=128, kernel_size=5, stride=2, padding=2, output_padding=1)
self.decoder_layer4 = deconvbnrelu(in_channels=128, out_channels=64, kernel_size=3, stride=1, padding=1, output_padding=0)
self.decoder_layer5 = deconvbnrelu(in_channels=64, out_channels=32, kernel_size=5, stride=2, padding=2, output_padding=1)
self.decoder_layer6 = deconvbnrelu(in_channels=32, out_channels=16, kernel_size=3, stride=1, padding=1, output_padding=0)
self.decoder_layer7 = deconvbnrelu(in_channels=16, out_channels=8, kernel_size=5, stride=2, padding=2, output_padding=1)
# output
self.rgbdepth_decoder_stage1 = deconvbnrelu(in_channels=32, out_channels=32, kernel_size=5, stride=2, padding=2, output_padding=1)
self.rgbdepth_decoder_stage2 = deconvbnrelu(in_channels=16, out_channels=16, kernel_size=5, stride=2, padding=2, output_padding=1)
self.rgbdepth_decoder_stage3 = deconvbnrelu(in_channels=8, out_channels=8, kernel_size=3, stride=1, padding=1, output_padding=0)
self.final_decoder_stage1 = deconvbnrelu(in_channels=32, out_channels=32, kernel_size=3, stride=1, padding=1, output_padding=0)
self.final_decoder_stage2 = deconvbnrelu(in_channels=16, out_channels=16, kernel_size=3, stride=1, padding=1, output_padding=0)
self.final_decoder_stage3 = deconvbnrelu(in_channels=8, out_channels=8, kernel_size=3, stride=1, padding=1, output_padding=0)
self.softmax = nn.Softmax(dim=1)
self.pooling = nn.AvgPool2d(kernel_size=2)
self.sparsepooling = SparseDownSampleClose(stride=2)
weights_init(self)
def forward(self, rgb, depth, confidence, depth_values, stage_idx, origin_feat, intrinsics_matrices_stage):
rgb = rgb
depth_min, depth_max = depth_values[:,0,None,None,None], depth_values[:,-1,None,None,None]
d = (depth - depth_min) / (depth_max - depth_min)
if self.mask_type == "basic":
valid_mask = torch.where(d>0, torch.full_like(d, 1.0), torch.full_like(d, 0.0))
elif self.mask_type == "mean":
valid_mask = torch.where(torch.logical_and(d>0, confidence>confidence.mean()), torch.full_like(d, 1.0), torch.full_like(d, 0.0))
# pre-data preparation
if self.convolutional_layer_encoding in ["uv", "xyz"]:
B, _, W, H = rgb.shape
position = AddCoordsNp(H, W)
position = position.call()
position = torch.from_numpy(position).to(rgb.device).repeat(B, 1, 1, 1).transpose(-1, 1)
unorm = position[:, 0:1, :, :]
vnorm = position[:, 1:2, :, :]
vnorm_s2 = self.pooling(vnorm)
vnorm_s3 = self.pooling(vnorm_s2)
vnorm_s4 = self.pooling(vnorm_s3)
unorm_s2 = self.pooling(unorm)
unorm_s3 = self.pooling(unorm_s2)
unorm_s4 = self.pooling(unorm_s3)
if self.convolutional_layer_encoding in ["z", "xyz"]:
d_s2, vm_s2 = self.sparsepooling(d, valid_mask)
d_s3, vm_s3 = self.sparsepooling(d_s2, vm_s2)
d_s4, vm_s4 = self.sparsepooling(d_s3, vm_s3)
if self.convolutional_layer_encoding == "xyz":
K = intrinsics_matrices_stage
f352 = K[:, 1, 1]
f352 = f352.unsqueeze(1)
f352 = f352.unsqueeze(2)
f352 = f352.unsqueeze(3)
c352 = K[:, 1, 2]
c352 = c352.unsqueeze(1)
c352 = c352.unsqueeze(2)
c352 = c352.unsqueeze(3)
f1216 = K[:, 0, 0]
f1216 = f1216.unsqueeze(1)
f1216 = f1216.unsqueeze(2)
f1216 = f1216.unsqueeze(3)
c1216 = K[:, 0, 2]
c1216 = c1216.unsqueeze(1)
c1216 = c1216.unsqueeze(2)
c1216 = c1216.unsqueeze(3)
# geometric info
if self.convolutional_layer_encoding == "std":
geo_s1 = None
geo_s2 = None
geo_s3 = None
geo_s4 = None
elif self.convolutional_layer_encoding == "uv":
geo_s1 = torch.cat((vnorm, unorm), dim=1)
geo_s2 = torch.cat((vnorm_s2, unorm_s2), dim=1)
geo_s3 = torch.cat((vnorm_s3, unorm_s3), dim=1)
geo_s4 = torch.cat((vnorm_s4, unorm_s4), dim=1)
elif self.convolutional_layer_encoding == "z":
geo_s1 = d
geo_s2 = d_s2
geo_s3 = d_s3
geo_s4 = d_s4
elif self.convolutional_layer_encoding == "xyz":
geo_s1 = self.geofeature(d, vnorm, unorm, H, W, c352, c1216, f352, f1216)
geo_s2 = self.geofeature(d_s2, vnorm_s2, unorm_s2, H / 2, W / 2, c352, c1216, f352, f1216)
geo_s3 = self.geofeature(d_s3, vnorm_s3, unorm_s3, H / 4, W / 4, c352, c1216, f352, f1216)
geo_s4 = self.geofeature(d_s4, vnorm_s4, unorm_s4, H / 8, W / 8, c352, c1216, f352, f1216)
# -----------------------------------------------------------------------------------------
# 128*160 -> 256*320 -> 512*640
rgb_feature = self.rgb_conv_init(torch.cat((rgb, d), dim=1)) # b 8 h w
rgb_feature1 = self.rgb_encoder_layer1(rgb_feature, geo_s1, geo_s2) # b 16 h/2 w/2
rgb_feature2 = self.rgb_encoder_layer2(rgb_feature1, geo_s2, geo_s2) # b 32 h/2 w/2
rgb_feature3 = self.rgb_encoder_layer3(rgb_feature2, geo_s2, geo_s3) # b 64 h/4 w/4
rgb_feature4 = self.rgb_encoder_layer4(rgb_feature3, geo_s3, geo_s3) # b 128 h/4 w/4
rgb_feature5 = self.rgb_encoder_layer5(rgb_feature4, geo_s3, geo_s4) # b 256 h/8 w/8
rgb_feature_decoder4 = self.rgb_decoder_layer4(rgb_feature5)
rgb_feature4_plus = rgb_feature_decoder4 + rgb_feature4 # b 128 h/4 w/4
rgb_feature_decoder2 = self.rgb_decoder_layer2(rgb_feature4_plus)
rgb_feature2_plus = rgb_feature_decoder2 + rgb_feature2 # b 32 h/2 w/2
rgb_feature_decoder0 = self.rgb_decoder_layer0(rgb_feature2_plus)
rgb_feature0_plus = rgb_feature_decoder0 + rgb_feature1 # b 16 h/2 w/2
rgb_feature_decoder = self.rgb_decoder_layer(rgb_feature0_plus)
rgb_feature_plus = rgb_feature_decoder + rgb_feature # b 8 h w
rgb_output = self.rgb_decoder_output(rgb_feature_plus) # b 2 h w
rgb_depth = rgb_output[:, 0:1, :, :]
rgb_conf = rgb_output[:, 1:2, :, :]
# -----------------------------------------------------------------------------------------
sparsed_feature = self.depth_conv_init(torch.cat((d, rgb_depth), dim=1)) # b 8 h w
sparsed_feature1 = self.depth_layer1(sparsed_feature, geo_s1, geo_s2) # b 16 h/2 w/2
sparsed_feature2 = self.depth_layer2(sparsed_feature1, geo_s2, geo_s2) # b 32 h/2 w/2
sparsed_feature2_plus = torch.cat([rgb_feature2_plus, sparsed_feature2], 1)
sparsed_feature3 = self.depth_layer3(sparsed_feature2_plus, geo_s2, geo_s3) # b 64 h/4 w/4
sparsed_feature4 = self.depth_layer4(sparsed_feature3, geo_s3, geo_s3) # b 128 h/4 w/4
sparsed_feature4_plus = torch.cat([rgb_feature4_plus, sparsed_feature4], 1)
sparsed_feature5 = self.depth_layer5(sparsed_feature4_plus, geo_s3, geo_s4) # b 256 h/8 w/8
# -----------------------------------------------------------------------------------------
fusion3 = rgb_feature5 + sparsed_feature5
decoder_feature3 = self.decoder_layer3(fusion3) # b 128 h/4 w/4
fusion4 = sparsed_feature4 + decoder_feature3
decoder_feature4 = self.decoder_layer4(fusion4) # b 64 h/4 w/4
if stage_idx >= 1:
decoder_feature5 = self.decoder_layer5(decoder_feature4)
fusion5 = sparsed_feature2 + decoder_feature5 # b 32 h/2 w/2
if stage_idx == 1:
rgbdepth_feature = self.rgbdepth_decoder_stage1(fusion5)
if self.add_origin_feat_flag:
final_feature = self.final_decoder_stage1(rgbdepth_feature + origin_feat)
else:
final_feature = self.final_decoder_stage1(rgbdepth_feature)
if stage_idx >= 2:
decoder_feature6 = self.decoder_layer6(decoder_feature5)
fusion6 = sparsed_feature1 + decoder_feature6 # b 16 h/2 w/2
if stage_idx == 2:
rgbdepth_feature = self.rgbdepth_decoder_stage2(fusion6)
if self.add_origin_feat_flag:
final_feature = self.final_decoder_stage2(rgbdepth_feature + origin_feat)
else:
final_feature = self.final_decoder_stage2(rgbdepth_feature)
if stage_idx >= 3:
decoder_feature7 = self.decoder_layer7(decoder_feature6)
fusion7 = sparsed_feature + decoder_feature7 # b 8 h w
if stage_idx == 3:
rgbdepth_feature = self.rgbdepth_decoder_stage3(fusion7)
if self.add_origin_feat_flag:
final_feature = self.final_decoder_stage3(rgbdepth_feature + origin_feat)
else:
final_feature = self.final_decoder_stage3(rgbdepth_feature)
return final_feature
class GeoRegNet2d(nn.Module):
def __init__(self, input_channel=128, base_channel=32, convolutional_layer_encoding="std"):
super(GeoRegNet2d, self).__init__()
self.convolutional_layer_encoding = convolutional_layer_encoding # std / uv / z / xyz
self.mask_type = "basic" # basic / mean
if self.convolutional_layer_encoding == "std":
self.geoplanes = 0
elif self.convolutional_layer_encoding == "z":
self.geoplanes = 1
self.conv_init = ConvBnReLU3D(input_channel, out_channels=8, kernel_size=(1,3,3), pad=(0,1,1))
self.encoder_layer1 = Reg_BasicBlockGeo(inplanes=8, planes=16, kernel_size=(1,3,3), stride=(1,2,2), padding=(0,1,1), geoplanes=self.geoplanes)
self.encoder_layer2 = Reg_BasicBlockGeo(inplanes=16, planes=32, kernel_size=(1,3,3), stride=1, padding=(0,1,1), geoplanes=self.geoplanes)
self.encoder_layer3 = Reg_BasicBlockGeo(inplanes=32, planes=64, kernel_size=(1,3,3), stride=(1,2,2), padding=(0,1,1), geoplanes=self.geoplanes)
self.encoder_layer4 = Reg_BasicBlockGeo(inplanes=64, planes=128, kernel_size=(1,3,3), stride=1, padding=(0,1,1), geoplanes=self.geoplanes)
self.encoder_layer5 = Reg_BasicBlockGeo(inplanes=128, planes=256, kernel_size=(1,3,3), stride=(1,2,2), padding=(0,1,1), geoplanes=self.geoplanes)
self.decoder_layer4 = reg_deconvbnrelu(in_channels=256, out_channels=128, kernel_size=(1,5,5), stride=(1,2,2), padding=(0,2,2), output_padding=(0,1,1))
self.decoder_layer3 = reg_deconvbnrelu(in_channels=128, out_channels=64, kernel_size=(1,3,3), stride=1, padding=(0,1,1), output_padding=0)
self.decoder_layer2 = reg_deconvbnrelu(in_channels=64, out_channels=32, kernel_size=(1,5,5), stride=(1,2,2), padding=(0,2,2), output_padding=(0,1,1))
self.decoder_layer1 = reg_deconvbnrelu(in_channels=32, out_channels=16, kernel_size=(1,3,3), stride=1, padding=(0,1,1), output_padding=0)
self.decoder_layer = reg_deconvbnrelu(in_channels=16, out_channels=8, kernel_size=(1,5,5), stride=(1,2,2), padding=(0,2,2), output_padding=(0,1,1))
self.prob = reg_deconvbnrelu(in_channels=8, out_channels=1, kernel_size=(1,3,3), stride=1, padding=(0,1,1), output_padding=0)
self.depthpooling = nn.MaxPool3d((2,1,1),(2,1,1))
self.basicpooling = nn.MaxPool3d((1,2,2), (1,2,2))
weights_init(self)
def forward(self, x, stage_idx, geo_reg_data=None):
B, C, D, W, H = x.shape
if stage_idx >= 1 and self.convolutional_layer_encoding == "z":
prob_volume = geo_reg_data["prob_volume_last"].unsqueeze(1) # B 1 D H W
else:
assert self.convolutional_layer_encoding == "std"
# geometric info
if self.convolutional_layer_encoding == "std":
geo_s1 = None
geo_s2 = None
geo_s3 = None
geo_s4 = None
elif self.convolutional_layer_encoding == "z":
if stage_idx == 2:
geo_s1 = self.depthpooling(prob_volume)
else:
geo_s1 = prob_volume # B 1 D H W
geo_s2 = self.basicpooling(geo_s1)
geo_s3 = self.basicpooling(geo_s2)
feature = self.conv_init(x) # B 8 D H W
feature1 = self.encoder_layer1(feature, geo_s1, geo_s1) # B 16 D H/2 W/2
feature2 = self.encoder_layer2(feature1, geo_s2, geo_s2) # B 32 D H/2 W/2
feature3 = self.encoder_layer3(feature2, geo_s2, geo_s2) # B 64 D H/4 W/4
feature4 = self.encoder_layer4(feature3, geo_s3, geo_s3) # B 128 D H/4 W/4
feature5 = self.encoder_layer5(feature4, geo_s3, geo_s3) # B 256 D H/8 W/8
feature_decoder4 = self.decoder_layer4(feature5)
feature4_plus = feature_decoder4 + feature4 # B 128 D H/4 W/4
feature_decoder3 = self.decoder_layer3(feature4_plus)
feature3_plus = feature_decoder3 + feature3 # B 64 D H/4 W/4
feature_decoder2 = self.decoder_layer2(feature3_plus)
feature2_plus = feature_decoder2 + feature2 # B 32 D H/2 W/2
feature_decoder1 = self.decoder_layer1(feature2_plus)
feature1_plus = feature_decoder1 + feature1 # B 16 D H/2 W/2
feature_decoder = self.decoder_layer(feature1_plus)
feature_plus = feature_decoder + feature # B 8 D H W
x = self.prob(feature_plus)
return x.squeeze(1)
# --------------------------------------------------------------
class BasicBlockGeo(nn.Module):
expansion = 1
__constants__ = ['downsample']
def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
base_width=64, dilation=1, norm_layer=None, geoplanes=3):
super(BasicBlockGeo, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2d
if groups != 1 or base_width != 64:
raise ValueError('BasicBlock only supports groups=1 and base_width=64')
if dilation > 1:
raise NotImplementedError("Dilation > 1 not supported in BasicBlock")
self.conv1 = conv3x3(inplanes + geoplanes, planes, stride)
self.bn1 = norm_layer(planes)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(planes+geoplanes, planes)
self.bn2 = norm_layer(planes)
if stride != 1 or inplanes != planes:
downsample = nn.Sequential(
conv1x1(inplanes+geoplanes, planes, stride),
norm_layer(planes),
)
self.downsample = downsample
self.stride = stride
def forward(self, x, g1=None, g2=None):
identity = x
if g1 is not None:
x = torch.cat((x, g1), 1)
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
if g2 is not None:
out = torch.cat((g2,out), 1)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
class GeometryFeature(nn.Module):
def __init__(self):
super(GeometryFeature, self).__init__()
def forward(self, z, vnorm, unorm, h, w, ch, cw, fh, fw):
x = z*(0.5*h*(vnorm+1)-ch)/fh
y = z*(0.5*w*(unorm+1)-cw)/fw
return torch.cat((x, y, z),1)
class SparseDownSampleClose(nn.Module):
def __init__(self, stride):
super(SparseDownSampleClose, self).__init__()
self.pooling = nn.MaxPool2d(stride, stride)
self.large_number = 600
def forward(self, d, mask):
encode_d = - (1-mask)*self.large_number - d
d = - self.pooling(encode_d)
mask_result = self.pooling(mask)
d_result = d - (1-mask_result)*self.large_number
return d_result, mask_result
def convbnrelu(in_channels, out_channels, kernel_size=3,stride=1, padding=1):
return nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding, bias=False),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True)
)
def deconvbnrelu(in_channels, out_channels, kernel_size=5, stride=2, padding=2, output_padding=1):
return nn.Sequential(
nn.ConvTranspose2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding, output_padding=output_padding, bias=False),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True)
)
def weights_init(m):
"""Initialize filters with Gaussian random weights"""
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.ConvTranspose2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.in_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1, bias=False, padding=1):
"""3x3 convolution with padding"""
if padding >= 1:
padding = dilation
return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
padding=padding, groups=groups, bias=bias, dilation=dilation)
def conv1x1(in_planes, out_planes, stride=1, groups=1, bias=False):
"""1x1 convolution"""
return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, groups=groups, bias=bias)
class AddCoordsNp():
"""Add coords to a tensor"""
def __init__(self, x_dim=64, y_dim=64, with_r=False):
self.x_dim = x_dim
self.y_dim = y_dim
self.with_r = with_r
def call(self):
"""
input_tensor: (batch, x_dim, y_dim, c)
"""
xx_ones = np.ones([self.x_dim], dtype=np.int32)
xx_ones = np.expand_dims(xx_ones, 1)
xx_range = np.expand_dims(np.arange(self.y_dim), 0)
xx_channel = np.matmul(xx_ones, xx_range)
xx_channel = np.expand_dims(xx_channel, -1)
yy_ones = np.ones([self.y_dim], dtype=np.int32)
yy_ones = np.expand_dims(yy_ones, 0)
yy_range = np.expand_dims(np.arange(self.x_dim), 1)
yy_channel = np.matmul(yy_range, yy_ones)
yy_channel = np.expand_dims(yy_channel, -1)
xx_channel = xx_channel.astype('float32') / (self.y_dim - 1)
yy_channel = yy_channel.astype('float32') / (self.x_dim - 1)
xx_channel = xx_channel*2 - 1
yy_channel = yy_channel*2 - 1
ret = np.concatenate([xx_channel, yy_channel], axis=-1)
if self.with_r:
rr = np.sqrt( np.square(xx_channel-0.5) + np.square(yy_channel-0.5))
ret = np.concatenate([ret, rr], axis=-1)
return ret
# --------------------------------------------------------------
class Reg_BasicBlockGeo(nn.Module):
def __init__(self, inplanes, planes, kernel_size, stride, padding, downsample=None, groups=1,
base_width=64, dilation=1, norm_layer=nn.BatchNorm3d, geoplanes=3):
super(Reg_BasicBlockGeo, self).__init__()
self.conv1 = regconv3D(inplanes + geoplanes, planes, kernel_size=(1,3,3), stride=1, padding=(0,1,1))
self.bn1 = norm_layer(planes)
self.relu = nn.ReLU(inplace=True)
self.conv2 = regconv3D(planes+geoplanes, planes, kernel_size, stride, padding)
self.bn2 = norm_layer(planes)
if stride != 1 or inplanes != planes:
downsample = nn.Sequential(
regconv1x1(inplanes+geoplanes, planes, kernel_size, stride, padding),
norm_layer(planes),
)
self.downsample = downsample
self.stride = stride
def forward(self, x, g1=None, g2=None):
identity = x
if g1 is not None:
x = torch.cat((x, g1), 1)
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
if g2 is not None:
out = torch.cat((g2,out), 1)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
def regconv3D(in_planes, out_planes, kernel_size, stride, padding, groups=1, dilation=1, bias=False):
return nn.Conv3d(in_planes, out_planes, kernel_size=kernel_size, stride=stride,
padding=padding, groups=groups, bias=bias, dilation=dilation)
def regconv1x1(in_planes, out_planes, kernel_size, stride, padding, groups=1, bias=False):
return nn.Conv3d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=padding, groups=groups, bias=bias)
def reg_deconvbnrelu(in_channels, out_channels, kernel_size, stride, padding, output_padding):
return nn.Sequential(
nn.ConvTranspose3d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding, output_padding=output_padding, bias=False),
nn.BatchNorm3d(out_channels),
nn.ReLU(inplace=True)
)
================================================
FILE: models/geomvsnet.py
================================================
# -*- coding: utf-8 -*-
# @Description: Main network architecture for GeoMVSNet.
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
from models.submodules import homo_warping, init_inverse_range, schedule_inverse_range, FPN, Reg2d
from models.geometry import GeoFeatureFusion, GeoRegNet2d
from models.filter import frequency_domain_filter
class GeoMVSNet(nn.Module):
def __init__(self, levels, hypo_plane_num_stages, depth_interal_ratio_stages,
feat_base_channel, reg_base_channel, group_cor_dim_stages):
super(GeoMVSNet, self).__init__()
self.levels = levels
self.hypo_plane_num_stages = hypo_plane_num_stages
self.depth_interal_ratio_stages = depth_interal_ratio_stages
self.StageNet = StageNet()
# feature settings
self.FeatureNet = FPN(base_channels=feat_base_channel)
self.coarest_separate_flag = True
if self.coarest_separate_flag:
self.CoarestFeatureNet = FPN(base_channels=feat_base_channel)
self.GeoFeatureFusionNet = GeoFeatureFusion(
convolutional_layer_encoding="z", mask_type="basic", add_origin_feat_flag=True)
# cost regularization settings
self.RegNet_stages = nn.ModuleList()
self.group_cor_dim_stages = group_cor_dim_stages
self.geo_reg_flag = True
self.geo_reg_encodings = ['std', 'z', 'z', 'z'] # must use std in idx-0
for stage_idx in range(self.levels):
in_dim = group_cor_dim_stages[stage_idx]
if self.geo_reg_flag:
self.RegNet_stages.append(GeoRegNet2d(input_channel=in_dim, base_channel=reg_base_channel, convolutional_layer_encoding=self.geo_reg_encodings[stage_idx]))
else:
self.RegNet_stages.append(Reg2d(input_channel=in_dim, base_channel=reg_base_channel))
# frequency domain filter settings
self.curriculum_learning_rho_ratios = [9, 4, 2, 1]
def forward(self, imgs, proj_matrices, intrinsics_matrices, depth_values, filename=None):
features = []
if self.coarest_separate_flag:
coarsest_features = []
for nview_idx in range(len(imgs)):
img = imgs[nview_idx]
features.append(self.FeatureNet(img)) # B C H W
if self.coarest_separate_flag:
coarsest_features.append(self.CoarestFeatureNet(img))
# coarse-to-fine
outputs = {}
for stage_idx in range(self.levels):
stage_name = "stage{}".format(stage_idx + 1)
B, C, H, W = features[0][stage_name].shape
proj_matrices_stage = proj_matrices[stage_name]
intrinsics_matrices_stage = intrinsics_matrices[stage_name]
# @Note features
if stage_idx == 0:
if self.coarest_separate_flag:
features_stage = [feat[stage_name] for feat in coarsest_features]
else:
features_stage = [feat[stage_name] for feat in features]
elif stage_idx >= 1:
features_stage = [feat[stage_name] for feat in features]
ref_img_stage = F.interpolate(imgs[0], size=None, scale_factor=1./2**(3-stage_idx), mode="bilinear", align_corners=False)
depth_last = F.interpolate(depth_last.unsqueeze(1), size=None, scale_factor=2, mode="bilinear", align_corners=False)
confidence_last = F.interpolate(confidence_last.unsqueeze(1), size=None, scale_factor=2, mode="bilinear", align_corners=False)
# reference feature
features_stage[0] = self.GeoFeatureFusionNet(
ref_img_stage, depth_last, confidence_last, depth_values,
stage_idx, features_stage[0], intrinsics_matrices_stage
)
# @Note depth hypos
if stage_idx == 0:
depth_hypo = init_inverse_range(depth_values, self.hypo_plane_num_stages[stage_idx], img[0].device, img[0].dtype, H, W)
else:
inverse_min_depth, inverse_max_depth = outputs_stage['inverse_min_depth'].detach(), outputs_stage['inverse_max_depth'].detach()
depth_hypo = schedule_inverse_range(inverse_min_depth, inverse_max_depth, self.hypo_plane_num_stages[stage_idx], H, W) # B D H W
# @Note cost regularization
geo_reg_data = {}
if self.geo_reg_flag:
geo_reg_data['depth_values'] = depth_values
if stage_idx >= 1 and self.geo_reg_encodings[stage_idx] == 'z':
prob_volume_last = F.interpolate(prob_volume_last, size=None, scale_factor=2, mode="bilinear", align_corners=False)
geo_reg_data["prob_volume_last"] = prob_volume_last
outputs_stage = self.StageNet(
stage_idx, features_stage, proj_matrices_stage, depth_hypo=depth_hypo,
regnet=self.RegNet_stages[stage_idx], group_cor_dim=self.group_cor_dim_stages[stage_idx],
depth_interal_ratio=self.depth_interal_ratio_stages[stage_idx],
geo_reg_data=geo_reg_data
)
# @Note frequency domain filter
depth_est = outputs_stage['depth']
depth_est_filtered = frequency_domain_filter(depth_est, rho_ratio=self.curriculum_learning_rho_ratios[stage_idx])
outputs_stage['depth_filtered'] = depth_est_filtered
depth_last = depth_est_filtered
confidence_last = outputs_stage['photometric_confidence']
prob_volume_last = outputs_stage['prob_volume']
outputs[stage_name] = outputs_stage
outputs.update(outputs_stage)
return outputs
class StageNet(nn.Module):
def __init__(self, attn_temp=2):
super(StageNet, self).__init__()
self.attn_temp = attn_temp
def forward(self, stage_idx, features, proj_matrices, depth_hypo, regnet,
group_cor_dim, depth_interal_ratio, geo_reg_data=None):
# @Note step1: feature extraction
proj_matrices = torch.unbind(proj_matrices, 1)
ref_feature, src_features = features[0], features[1:]
ref_proj, src_projs = proj_matrices[0], proj_matrices[1:]
B, D, H, W = depth_hypo.shape
C = ref_feature.shape[1]
# @Note step2: cost aggregation
ref_volume = ref_feature.unsqueeze(2).repeat(1, 1, D, 1, 1)
cor_weight_sum = 1e-8
cor_feats = 0
for src_idx, (src_fea, src_proj) in enumerate(zip(src_features, src_projs)):
save_fn = None
src_proj_new = src_proj[:, 0].clone()
src_proj_new[:, :3, :4] = torch.matmul(src_proj[:, 1, :3, :3], src_proj[:, 0, :3, :4])
ref_proj_new = ref_proj[:, 0].clone()
ref_proj_new[:, :3, :4] = torch.matmul(ref_proj[:, 1, :3, :3], ref_proj[:, 0, :3, :4])
warped_src = homo_warping(src_fea, src_proj_new, ref_proj_new, depth_hypo) # B C D H W
warped_src = warped_src.reshape(B, group_cor_dim, C//group_cor_dim, D, H, W)
ref_volume = ref_volume.reshape(B, group_cor_dim, C//group_cor_dim, D, H, W)
cor_feat = (warped_src * ref_volume).mean(2) # B G D H W
del warped_src, src_proj, src_fea
cor_weight = torch.softmax(cor_feat.sum(1) / self.attn_temp, 1) / math.sqrt(C) # B D H W
cor_weight_sum += cor_weight # B D H W
cor_feats += cor_weight.unsqueeze(1) * cor_feat # B C D H W
del cor_weight, cor_feat
cost_volume = cor_feats / cor_weight_sum.unsqueeze(1) # B C D H W
del cor_weight_sum, src_features
# @Note step3: cost regularization
if geo_reg_data == {}:
# basic
cost_reg = regnet(cost_volume)
else:
# probability volume geometry embedding
cost_reg = regnet(cost_volume, stage_idx, geo_reg_data)
del cost_volume
prob_volume = F.softmax(cost_reg, dim=1) # B D H W
# @Note step4: depth regression
prob_max_indices = prob_volume.max(1, keepdim=True)[1] # B 1 H W
depth = torch.gather(depth_hypo, 1, prob_max_indices).squeeze(1) # B H W
with torch.no_grad():
photometric_confidence = prob_volume.max(1)[0] # B H W
photometric_confidence = F.interpolate(photometric_confidence.unsqueeze(1), scale_factor=1, mode='bilinear', align_corners=True).squeeze(1)
last_depth_itv = 1./depth_hypo[:,2,:,:] - 1./depth_hypo[:,1,:,:]
inverse_min_depth = 1/depth + depth_interal_ratio * last_depth_itv # B H W
inverse_max_depth = 1/depth - depth_interal_ratio * last_depth_itv # B H W
output_stage = {
"depth": depth,
"photometric_confidence": photometric_confidence,
"depth_hypo": depth_hypo,
"prob_volume": prob_volume,
"inverse_min_depth": inverse_min_depth,
"inverse_max_depth": inverse_max_depth,
}
return output_stage
================================================
FILE: models/loss.py
================================================
# -*- coding: utf-8 -*-
# @Description: Loss Functions (Sec 3.4 in the paper).
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07
import torch
def geomvsnet_loss(inputs, depth_gt_ms, mask_ms, **kwargs):
stage_lw = kwargs.get("stage_lw", [1, 1, 1, 1])
depth_values = kwargs.get("depth_values")
depth_min, depth_max = depth_values[:,0], depth_values[:,-1]
total_loss = torch.tensor(0.0, dtype=torch.float32, device=mask_ms["stage1"].device, requires_grad=False)
pw_loss_stages = []
dds_loss_stages = []
for stage_idx, (stage_inputs, stage_key) in enumerate([(inputs[k], k) for k in inputs.keys() if "stage" in k]):
depth = stage_inputs['depth_filtered']
prob_volume = stage_inputs['prob_volume']
depth_value = stage_inputs['depth_hypo']
depth_gt = depth_gt_ms[stage_key]
mask = mask_ms[stage_key] > 0.5
# pw loss
pw_loss = pixel_wise_loss(prob_volume, depth_gt, mask, depth_value)
pw_loss_stages.append(pw_loss)
# dds loss
dds_loss = depth_distribution_similarity_loss(depth, depth_gt, mask, depth_min, depth_max)
dds_loss_stages.append(dds_loss)
# total loss
lam1, lam2 = 0.8, 0.2
total_loss = total_loss + stage_lw[stage_idx] * (lam1 * pw_loss + lam2 * dds_loss)
depth_pred = stage_inputs['depth']
depth_gt = depth_gt_ms[stage_key]
epe = cal_metrics(depth_pred, depth_gt, mask, depth_min, depth_max)
return total_loss, epe, pw_loss_stages, dds_loss_stages
def pixel_wise_loss(prob_volume, depth_gt, mask, depth_value):
mask_true = mask
valid_pixel_num = torch.sum(mask_true, dim=[1,2])+1e-12
shape = depth_gt.shape
depth_num = depth_value.shape[1]
depth_value_mat = depth_value
gt_index_image = torch.argmin(torch.abs(depth_value_mat-depth_gt.unsqueeze(1)), dim=1)
gt_index_image = torch.mul(mask_true, gt_index_image.type(torch.float))
gt_index_image = torch.round(gt_index_image).type(torch.long).unsqueeze(1)
gt_index_volume = torch.zeros(shape[0], depth_num, shape[1], shape[2]).type(mask_true.type()).scatter_(1, gt_index_image, 1)
cross_entropy_image = -torch.sum(gt_index_volume * torch.log(prob_volume+1e-12), dim=1).squeeze(1)
masked_cross_entropy_image = torch.mul(mask_true, cross_entropy_image)
masked_cross_entropy = torch.sum(masked_cross_entropy_image, dim=[1, 2])
masked_cross_entropy = torch.mean(masked_cross_entropy / valid_pixel_num)
pw_loss = masked_cross_entropy
return pw_loss
def depth_distribution_similarity_loss(depth, depth_gt, mask, depth_min, depth_max):
depth_norm = depth * 128 / (depth_max - depth_min)[:,None,None]
depth_gt_norm = depth_gt * 128 / (depth_max - depth_min)[:,None,None]
M_bins = 48
kl_min = torch.min(torch.min(depth_gt), depth.mean()-3.*depth.std())
kl_max = torch.max(torch.max(depth_gt), depth.mean()+3.*depth.std())
bins = torch.linspace(kl_min, kl_max, steps=M_bins)
kl_divs = []
for i in range(len(bins) - 1):
bin_mask = (depth_gt >= bins[i]) & (depth_gt < bins[i+1])
merged_mask = mask & bin_mask
if merged_mask.sum() > 0:
p = depth_norm[merged_mask]
q = depth_gt_norm[merged_mask]
kl_div = torch.nn.functional.kl_div(torch.log(p)-torch.log(q), p, reduction='batchmean')
kl_div = torch.log(kl_div)
kl_divs.append(kl_div)
dds_loss = sum(kl_divs)
return dds_loss
def cal_metrics(depth_pred, depth_gt, mask, depth_min, depth_max):
depth_pred_norm = depth_pred * 128 / (depth_max - depth_min)[:,None,None]
depth_gt_norm = depth_gt * 128 / (depth_max - depth_min)[:,None,None]
abs_err = torch.abs(depth_pred_norm[mask] - depth_gt_norm[mask])
epe = abs_err.mean()
err1= (abs_err<=1).float().mean()*100
err3 = (abs_err<=3).float().mean()*100
return epe # err1, err3
================================================
FILE: models/submodules.py
================================================
# -*- coding: utf-8 -*-
# @Description: Some sub-modules for the network.
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07
import torch
import torch.nn as nn
import torch.nn.functional as F
class FPN(nn.Module):
"""FPN aligncorners downsample 4x"""
def __init__(self, base_channels, gn=False):
super(FPN, self).__init__()
self.base_channels = base_channels
self.conv0 = nn.Sequential(
Conv2d(3, base_channels, 3, 1, padding=1, gn=gn),
Conv2d(base_channels, base_channels, 3, 1, padding=1, gn=gn),
)
self.conv1 = nn.Sequential(
Conv2d(base_channels, base_channels * 2, 5, stride=2, padding=2, gn=gn),
Conv2d(base_channels * 2, base_channels * 2, 3, 1, padding=1, gn=gn),
Conv2d(base_channels * 2, base_channels * 2, 3, 1, padding=1, gn=gn),
)
self.conv2 = nn.Sequential(
Conv2d(base_channels * 2, base_channels * 4, 5, stride=2, padding=2, gn=gn),
Conv2d(base_channels * 4, base_channels * 4, 3, 1, padding=1, gn=gn),
Conv2d(base_channels * 4, base_channels * 4, 3, 1, padding=1, gn=gn),
)
self.conv3 = nn.Sequential(
Conv2d(base_channels * 4, base_channels * 8, 5, stride=2, padding=2, gn=gn),
Conv2d(base_channels * 8, base_channels * 8, 3, 1, padding=1, gn=gn),
Conv2d(base_channels * 8, base_channels * 8, 3, 1, padding=1, gn=gn),
)
self.out_channels = [8 * base_channels]
final_chs = base_channels * 8
self.inner1 = nn.Conv2d(base_channels * 4, final_chs, 1, bias=True)
self.inner2 = nn.Conv2d(base_channels * 2, final_chs, 1, bias=True)
self.inner3 = nn.Conv2d(base_channels * 1, final_chs, 1, bias=True)
self.out1 = nn.Conv2d(final_chs, base_channels * 8, 1, bias=False)
self.out2 = nn.Conv2d(final_chs, base_channels * 4, 3, padding=1, bias=False)
self.out3 = nn.Conv2d(final_chs, base_channels * 2, 3, padding=1, bias=False)
self.out4 = nn.Conv2d(final_chs, base_channels, 3, padding=1, bias=False)
self.out_channels.append(base_channels * 4)
self.out_channels.append(base_channels * 2)
self.out_channels.append(base_channels)
def forward(self, x):
conv0 = self.conv0(x)
conv1 = self.conv1(conv0)
conv2 = self.conv2(conv1)
conv3 = self.conv3(conv2)
intra_feat = conv3
outputs = {}
out1 = self.out1(intra_feat)
intra_feat = F.interpolate(intra_feat, scale_factor=2, mode="bilinear", align_corners=True) + self.inner1(conv2)
out2 = self.out2(intra_feat)
intra_feat = F.interpolate(intra_feat, scale_factor=2, mode="bilinear", align_corners=True) + self.inner2(conv1)
out3 = self.out3(intra_feat)
intra_feat = F.interpolate(intra_feat, scale_factor=2, mode="bilinear", align_corners=True) + self.inner3(conv0)
out4 = self.out4(intra_feat)
outputs["stage1"] = out1
outputs["stage2"] = out2
outputs["stage3"] = out3
outputs["stage4"] = out4
return outputs
class Reg2d(nn.Module):
def __init__(self, input_channel=128, base_channel=32):
super(Reg2d, self).__init__()
self.conv0 = ConvBnReLU3D(input_channel, base_channel, kernel_size=(1,3,3), pad=(0,1,1))
self.conv1 = ConvBnReLU3D(base_channel, base_channel*2, kernel_size=(1,3,3), stride=(1,2,2), pad=(0,1,1))
self.conv2 = ConvBnReLU3D(base_channel*2, base_channel*2)
self.conv3 = ConvBnReLU3D(base_channel*2, base_channel*4, kernel_size=(1,3,3), stride=(1,2,2), pad=(0,1,1))
self.conv4 = ConvBnReLU3D(base_channel*4, base_channel*4)
self.conv5 = ConvBnReLU3D(base_channel*4, base_channel*8, kernel_size=(1,3,3), stride=(1,2,2), pad=(0,1,1))
self.conv6 = ConvBnReLU3D(base_channel*8, base_channel*8)
self.conv7 = nn.Sequential(
nn.ConvTranspose3d(base_channel*8, base_channel*4, kernel_size=(1,3,3), padding=(0,1,1), output_padding=(0,1,1), stride=(1,2,2), bias=False),
nn.BatchNorm3d(base_channel*4),
nn.ReLU(inplace=True))
self.conv9 = nn.Sequential(
nn.ConvTranspose3d(base_channel*4, base_channel*2, kernel_size=(1,3,3), padding=(0,1,1), output_padding=(0,1,1), stride=(1,2,2), bias=False),
nn.BatchNorm3d(base_channel*2),
nn.ReLU(inplace=True))
self.conv11 = nn.Sequential(
nn.ConvTranspose3d(base_channel*2, base_channel, kernel_size=(1,3,3), padding=(0,1,1), output_padding=(0,1,1), stride=(1,2,2), bias=False),
nn.BatchNorm3d(base_channel),
nn.ReLU(inplace=True))
self.prob = nn.Conv3d(8, 1, 1, stride=1, padding=0)
def forward(self, x):
conv0 = self.conv0(x)
conv2 = self.conv2(self.conv1(conv0))
conv4 = self.conv4(self.conv3(conv2))
x = self.conv6(self.conv5(conv4))
x = conv4 + self.conv7(x)
x = conv2 + self.conv9(x)
x = conv0 + self.conv11(x)
x = self.prob(x)
return x.squeeze(1)
def homo_warping(src_fea, src_proj, ref_proj, depth_values):
# src_fea: [B, C, H, W]
# src_proj: [B, 4, 4]
# ref_proj: [B, 4, 4]
# depth_values: [B, Ndepth] o [B, Ndepth, H, W]
# out: [B, C, Ndepth, H, W]
C = src_fea.shape[1]
Hs,Ws = src_fea.shape[-2:]
B,num_depth,Hr,Wr = depth_values.shape
with torch.no_grad():
proj = torch.matmul(src_proj, torch.inverse(ref_proj))
rot = proj[:, :3, :3] # [B,3,3]
trans = proj[:, :3, 3:4] # [B,3,1]
y, x = torch.meshgrid([torch.arange(0, Hr, dtype=torch.float32, device=src_fea.device),
torch.arange(0, Wr, dtype=torch.float32, device=src_fea.device)])
y = y.reshape(Hr*Wr)
x = x.reshape(Hr*Wr)
xyz = torch.stack((x, y, torch.ones_like(x))) # [3, H*W]
xyz = torch.unsqueeze(xyz, 0).repeat(B, 1, 1) # [B, 3, H*W]
rot_xyz = torch.matmul(rot, xyz) # [B, 3, H*W]
rot_depth_xyz = rot_xyz.unsqueeze(2).repeat(1, 1, num_depth, 1) * depth_values.reshape(B, 1, num_depth, -1) # [B, 3, Ndepth, H*W]
proj_xyz = rot_depth_xyz + trans.reshape(B, 3, 1, 1) # [B, 3, Ndepth, H*W]
# FIXME divide 0
temp = proj_xyz[:, 2:3, :, :]
temp[temp==0] = 1e-9
proj_xy = proj_xyz[:, :2, :, :] / temp # [B, 2, Ndepth, H*W]
# proj_xy = proj_xyz[:, :2, :, :] / proj_xyz[:, 2:3, :, :] # [B, 2, Ndepth, H*W]
proj_x_normalized = proj_xy[:, 0, :, :] / ((Ws - 1) / 2) - 1
proj_y_normalized = proj_xy[:, 1, :, :] / ((Hs - 1) / 2) - 1
proj_xy = torch.stack((proj_x_normalized, proj_y_normalized), dim=3) # [B, Ndepth, H*W, 2]
grid = proj_xy
if len(src_fea.shape)==4:
warped_src_fea = F.grid_sample(src_fea, grid.reshape(B, num_depth * Hr, Wr, 2), mode='bilinear', padding_mode='zeros', align_corners=True)
warped_src_fea = warped_src_fea.reshape(B, C, num_depth, Hr, Wr)
elif len(src_fea.shape)==5:
warped_src_fea = []
for d in range(src_fea.shape[2]):
warped_src_fea.append(F.grid_sample(src_fea[:,:,d], grid.reshape(B, num_depth, Hr, Wr, 2)[:,d], mode='bilinear', padding_mode='zeros', align_corners=True))
warped_src_fea = torch.stack(warped_src_fea, dim=2)
return warped_src_fea
def init_inverse_range(cur_depth, ndepths, device, dtype, H, W):
inverse_depth_min = 1. / cur_depth[:, 0] # (B,)
inverse_depth_max = 1. / cur_depth[:, -1]
itv = torch.arange(0, ndepths, device=device, dtype=dtype, requires_grad=False).reshape(1, -1,1,1).repeat(1, 1, H, W) / (ndepths - 1) # 1 D H W
inverse_depth_hypo = inverse_depth_max[:,None, None, None] + (inverse_depth_min - inverse_depth_max)[:,None, None, None] * itv
return 1./inverse_depth_hypo
def schedule_inverse_range(inverse_min_depth, inverse_max_depth, ndepths, H, W):
# cur_depth_min, (B, H, W)
# cur_depth_max: (B, H, W)
itv = torch.arange(0, ndepths, device=inverse_min_depth.device, dtype=inverse_min_depth.dtype, requires_grad=False).reshape(1, -1,1,1).repeat(1, 1, H//2, W//2) / (ndepths - 1) # 1 D H W
inverse_depth_hypo = inverse_max_depth[:,None, :, :] + (inverse_min_depth - inverse_max_depth)[:,None, :, :] * itv # B D H W
inverse_depth_hypo = F.interpolate(inverse_depth_hypo.unsqueeze(1), [ndepths, H, W], mode='trilinear', align_corners=True).squeeze(1)
return 1./inverse_depth_hypo
# --------------------------------------------------------------
def init_bn(module):
if module.weight is not None:
nn.init.ones_(module.weight)
if module.bias is not None:
nn.init.zeros_(module.bias)
return
def init_uniform(module, init_method):
if module.weight is not None:
if init_method == "kaiming":
nn.init.kaiming_uniform_(module.weight)
elif init_method == "xavier":
nn.init.xavier_uniform_(module.weight)
return
class ConvBnReLU3D(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, pad=1):
super(ConvBnReLU3D, self).__init__()
self.conv = nn.Conv3d(in_channels, out_channels, kernel_size, stride=stride, padding=pad, bias=False)
self.bn = nn.BatchNorm3d(out_channels)
def forward(self, x):
return F.relu(self.bn(self.conv(x)), inplace=True)
class Conv2d(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size, stride=1,
relu=True, bn_momentum=0.1, init_method="xavier", gn=False, group_channel=8, **kwargs):
super(Conv2d, self).__init__()
bn = not gn
self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride=stride,
bias=(not bn), **kwargs)
self.kernel_size = kernel_size
self.stride = stride
self.bn = nn.BatchNorm2d(out_channels, momentum=bn_momentum) if bn else None
self.gn = nn.GroupNorm(int(max(1, out_channels / group_channel)), out_channels) if gn else None
self.relu = relu
def forward(self, x):
x = self.conv(x)
if self.bn is not None:
x = self.bn(x)
else:
x = self.gn(x)
if self.relu:
x = F.relu(x, inplace=True)
return x
def init_weights(self, init_method):
init_uniform(self.conv, init_method)
if self.bn is not None:
init_bn(self.bn)
================================================
FILE: models/utils/__init__.py
================================================
from models.utils.utils import *
================================================
FILE: models/utils/opts.py
================================================
# -*- coding: utf-8 -*-
# @Description: Options settings & configurations for GeoMVSNet.
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07
import argparse
def get_opts():
parser = argparse.ArgumentParser(description="args")
# global settings
parser.add_argument('--mode', default='train', help='train or test', choices=['train', 'test', 'val'])
parser.add_argument('--which_dataset', default='dtu', choices=['dtu', 'tnt', 'blendedmvs'], help='which dataset for using')
parser.add_argument('--n_views', type=int, default=5, help='num of view')
parser.add_argument('--levels', type=int, default=4, help='num of stages')
parser.add_argument('--hypo_plane_num_stages', type=str, default="8,8,4,4", help='num of hypothesis planes for each stage')
parser.add_argument('--depth_interal_ratio_stages', type=str, default="0.5,0.5,0.5,1", help='depth interals for each stage')
parser.add_argument("--feat_base_channel", type=int, default=8, help='channel num for base feature')
parser.add_argument("--reg_base_channel", type=int, default=8, help='channel num for regularization')
parser.add_argument('--group_cor_dim_stages', type=str, default="8,8,4,4", help='group correlation dim')
parser.add_argument('--batch_size', type=int, default=1, help='batch size for training')
parser.add_argument('--data_scale', type=str, choices=['mid', 'raw'], help='use mid or raw resolution')
parser.add_argument('--trainpath', help='data path for training')
parser.add_argument('--testpath', help='data path for testing')
parser.add_argument('--trainlist', help='data list for training')
parser.add_argument('--testlist', help='data list for testing')
# training config
parser.add_argument('--stage_lw', type=str, default="1,1,1,1", help='loss weight for different stages')
parser.add_argument('--epochs', type=int, default=10, help='number of epochs to train')
parser.add_argument('--lr_scheduler', type=str, default='MS', help='scheduler for learning rate')
parser.add_argument('--lr', type=float, default=0.001, help='learning rate')
parser.add_argument('--lrepochs', type=str, default="1,3,5,7,9,11,13,15:1.5", help='epoch ids to downscale lr and the downscale rate')
parser.add_argument('--wd', type=float, default=0.0, help='weight decay')
parser.add_argument('--summary_freq', type=int, default=100, help='print and summary frequency')
parser.add_argument('--save_freq', type=int, default=1, help='save checkpoint frequency')
parser.add_argument('--eval_freq', type=int, default=1, help='eval frequency')
parser.add_argument('--robust_train', action='store_true',help='robust training')
# testing config
parser.add_argument('--split', type=str, choices=['intermediate', 'advanced'], help='intermediate|advanced for tanksandtemples')
parser.add_argument('--img_mode', type=str, default='resize', choices=['resize', 'crop'], help='image resolution matching strategy for TNT dataset')
parser.add_argument('--cam_mode', type=str, default='origin', choices=['origin', 'short_range'], help='camera parameter strategy for TNT dataset')
parser.add_argument('--loadckpt', default=None, help='load a specific checkpoint')
parser.add_argument('--logdir', default='./checkpoints/debug', help='the directory to save checkpoints/logs')
parser.add_argument('--nolog', action='store_true', help='do not log into .log file')
parser.add_argument('--notensorboard', action='store_true', help='do not log into tensorboard')
parser.add_argument('--save_conf_all_stages', action='store_true', help='save confidence maps for all stages')
parser.add_argument('--outdir', default='./outputs', help='output dir')
parser.add_argument('--resume', action='store_true', help='continue to train the model')
# pytorch config
parser.add_argument('--device', default='cuda', help='device to use')
parser.add_argument('--seed', type=int, default=1, metavar='S', help='random seed')
parser.add_argument('--pin_m', action='store_true', help='data loader pin memory')
parser.add_argument("--local_rank", type=int, default=0)
return parser.parse_args()
================================================
FILE: models/utils/utils.py
================================================
# -*- coding: utf-8 -*-
# @Description: Some useful utils.
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07
import random
import numpy as np
import torch
import torchvision.utils as vutils
# torch.no_grad warpper for functions
def make_nograd_func(func):
def wrapper(*f_args, **f_kwargs):
with torch.no_grad():
ret = func(*f_args, **f_kwargs)
return ret
return wrapper
# convert a function into recursive style to handle nested dict/list/tuple variables
def make_recursive_func(func):
def wrapper(vars):
if isinstance(vars, list):
return [wrapper(x) for x in vars]
elif isinstance(vars, tuple):
return tuple([wrapper(x) for x in vars])
elif isinstance(vars, dict):
return {k: wrapper(v) for k, v in vars.items()}
else:
return func(vars)
return wrapper
@make_recursive_func
def tensor2float(vars):
if isinstance(vars, float):
return vars
elif isinstance(vars, torch.Tensor):
return vars.data.item()
else:
raise NotImplementedError("invalid input type {} for tensor2float".format(type(vars)))
@make_recursive_func
def tensor2numpy(vars):
if isinstance(vars, np.ndarray):
return vars
elif isinstance(vars, torch.Tensor):
return vars.detach().cpu().numpy().copy()
else:
raise NotImplementedError("invalid input type {} for tensor2numpy".format(type(vars)))
@make_recursive_func
def tocuda(vars):
if isinstance(vars, torch.Tensor):
return vars.to(torch.device("cuda"))
elif isinstance(vars, str):
return vars
else:
raise NotImplementedError("invalid input type {} for tensor2numpy".format(type(vars)))
def tb_save_scalars(logger, mode, scalar_dict, global_step):
scalar_dict = tensor2float(scalar_dict)
for key, value in scalar_dict.items():
if not isinstance(value, (list, tuple)):
name = '{}/{}'.format(mode, key)
logger.add_scalar(name, value, global_step)
else:
for idx in range(len(value)):
name = '{}/{}_{}'.format(mode, key, idx)
logger.add_scalar(name, value[idx], global_step)
def tb_save_images(logger, mode, images_dict, global_step):
images_dict = tensor2numpy(images_dict)
def preprocess(name, img):
if not (len(img.shape) == 3 or len(img.shape) == 4):
raise NotImplementedError("invalid img shape {}:{} in save_images".format(name, img.shape))
if len(img.shape) == 3:
img = img[:, np.newaxis, :, :]
img = torch.from_numpy(img[:1])
return vutils.make_grid(img, padding=0, nrow=1, normalize=True, scale_each=True)
for key, value in images_dict.items():
if not isinstance(value, (list, tuple)):
name = '{}/{}'.format(mode, key)
logger.add_image(name, preprocess(name, value), global_step)
else:
for idx in range(len(value)):
name = '{}/{}_{}'.format(mode, key, idx)
logger.add_image(name, preprocess(name, value[idx]), global_step)
class DictAverageMeter(object):
def __init__(self):
self.data = {}
self.count = 0
def update(self, new_input):
self.count += 1
if len(self.data) == 0:
for k, v in new_input.items():
if not isinstance(v, float):
raise NotImplementedError("invalid data {}: {}".format(k, type(v)))
self.data[k] = v
else:
for k, v in new_input.items():
if not isinstance(v, float):
raise NotImplementedError("invalid data {}: {}".format(k, type(v)))
self.data[k] += v
def mean(self):
return {k: v / self.count for k, v in self.data.items()}
# a wrapper to compute metrics for each image individually
def compute_metrics_for_each_image(metric_func):
def wrapper(depth_est, depth_gt, mask, *args):
batch_size = depth_gt.shape[0]
results = []
# compute result one by one
for idx in range(batch_size):
ret = metric_func(depth_est[idx], depth_gt[idx], mask[idx], *args)
results.append(ret)
return torch.stack(results).mean()
return wrapper
@make_nograd_func
@compute_metrics_for_each_image
def Thres_metrics(depth_est, depth_gt, mask, thres):
assert isinstance(thres, (int, float))
depth_est, depth_gt = depth_est[mask], depth_gt[mask]
errors = torch.abs(depth_est - depth_gt)
err_mask = errors > thres
return torch.mean(err_mask.float())
# NOTE: please do not use this to build up training loss
@make_nograd_func
@compute_metrics_for_each_image
def AbsDepthError_metrics(depth_est, depth_gt, mask, thres=None):
depth_est, depth_gt = depth_est[mask], depth_gt[mask]
error = (depth_est - depth_gt).abs()
if thres is not None:
error = error[(error >= float(thres[0])) & (error <= float(thres[1]))]
if error.shape[0] == 0:
return torch.tensor(0, device=error.device, dtype=error.dtype)
return torch.mean(error)
import torch.distributed as dist
def synchronize():
"""
Helper function to synchronize (barrier) among all processes when
using distributed training
"""
if not dist.is_available():
return
if not dist.is_initialized():
return
world_size = dist.get_world_size()
if world_size == 1:
return
dist.barrier()
def get_world_size():
if not dist.is_available():
return 1
if not dist.is_initialized():
return 1
return dist.get_world_size()
def reduce_scalar_outputs(scalar_outputs):
world_size = get_world_size()
if world_size < 2:
return scalar_outputs
with torch.no_grad():
names = []
scalars = []
for k in sorted(scalar_outputs.keys()):
names.append(k)
scalars.append(scalar_outputs[k])
scalars = torch.stack(scalars, dim=0)
dist.reduce(scalars, dst=0)
if dist.get_rank() == 0:
# only main process gets accumulated, so only divide by
# world_size in this case
scalars /= world_size
reduced_scalars = {k: v for k, v in zip(names, scalars)}
return reduced_scalars
import torch
from bisect import bisect_right
class WarmupMultiStepLR(torch.optim.lr_scheduler._LRScheduler):
def __init__(
self,
optimizer,
milestones,
gamma=0.1,
warmup_factor=1.0 / 3,
warmup_iters=500,
warmup_method="linear",
last_epoch=-1,
):
if not list(milestones) == sorted(milestones):
raise ValueError(
"Milestones should be a list of" " increasing integers. Got {}",
milestones,
)
if warmup_method not in ("constant", "linear"):
raise ValueError(
"Only 'constant' or 'linear' warmup_method accepted"
"got {}".format(warmup_method)
)
self.milestones = milestones
self.gamma = gamma
self.warmup_factor = warmup_factor
self.warmup_iters = warmup_iters
self.warmup_method = warmup_method
super(WarmupMultiStepLR, self).__init__(optimizer, last_epoch)
def get_lr(self):
warmup_factor = 1
if self.last_epoch < self.warmup_iters:
if self.warmup_method == "constant":
warmup_factor = self.warmup_factor
elif self.warmup_method == "linear":
alpha = float(self.last_epoch) / self.warmup_iters
warmup_factor = self.warmup_factor * (1 - alpha) + alpha
return [
base_lr
* warmup_factor
* self.gamma ** bisect_right(self.milestones, self.last_epoch)
for base_lr in self.base_lrs
]
def set_random_seed(seed):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
================================================
FILE: outputs/visual.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- @Description: Juputer notebook for visualizing depth maps.\n",
"- @Author: Zhe Zhang (doublez@stu.pku.edu.cn)\n",
"- @Affiliation: Peking University (PKU)\n",
"- @LastEditDate: 2023-09-07"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecutionIndicator": {
"show": true
},
"tags": []
},
"outputs": [],
"source": [
"import sys, os\n",
"sys.path.append('../')\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import re\n",
"\n",
"\n",
"def read_pfm(filename):\n",
" file = open(filename, 'rb')\n",
" color = None\n",
" width = None\n",
" height = None\n",
" scale = None\n",
" endian = None\n",
"\n",
" header = file.readline().decode('utf-8').rstrip()\n",
" if header == 'PF':\n",
" color = True\n",
" elif header == 'Pf':\n",
" color = False\n",
" else:\n",
" raise Exception('Not a PFM file.')\n",
"\n",
" dim_match = re.match(r'^(\\d+)\\s(\\d+)\\s$', file.readline().decode('utf-8'))\n",
" if dim_match:\n",
" width, height = map(int, dim_match.groups())\n",
" else:\n",
" raise Exception('Malformed PFM header.')\n",
"\n",
" scale = float(file.readline().rstrip())\n",
" if scale < 0: # little-endian\n",
" endian = '<'\n",
" scale = -scale\n",
" else:\n",
" endian = '>' # big-endian\n",
"\n",
" data = np.fromfile(file, endian + 'f')\n",
" shape = (height, width, 3) if color else (height, width)\n",
"\n",
" data = np.reshape(data, shape)\n",
" data = np.flipud(data)\n",
" file.close()\n",
" return data, scale\n",
"\n",
"\n",
"def read_depth(filename):\n",
" depth = read_pfm(filename)[0]\n",
" return np.array(depth, dtype=np.float32)\n",
"\n",
"\n",
"assert False"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## DTU"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecutionIndicator": {
"show": true
},
"tags": []
},
"outputs": [],
"source": [
"exp_name = 'dtu/geomvsnet'\n",
"depth_name = \"00000009.pfm\"\n",
"\n",
"scans = os.listdir(os.path.join(exp_name))\n",
"scans = list(filter(lambda x: x.startswith(\"scan\"), scans))\n",
"scans.sort(key=lambda x: int(x[4:]))\n",
"for scan in scans:\n",
" depth_filename = os.path.join(exp_name, scan, \"depth_est\", depth_name)\n",
" if not os.path.exists(depth_filename): continue\n",
" depth = read_depth(depth_filename)\n",
"\n",
" confidence_filename = os.path.join(exp_name, scan, \"confidence\", depth_name)\n",
" confidence = read_depth(confidence_filename)\n",
"\n",
" print(scan, depth_name)\n",
"\n",
" plt.figure(figsize=(12, 12))\n",
" plt.subplot(1, 2, 1)\n",
" plt.xticks([]), plt.yticks([]), plt.axis('off')\n",
" plt.imshow(depth, 'viridis', vmin=500, vmax=830)\n",
"\n",
" plt.subplot(1, 2, 2)\n",
" plt.xticks([]), plt.yticks([]), plt.axis('off')\n",
" plt.imshow(confidence, 'viridis')\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## TNT"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"exp_name = './tnt/blend/geomvsnet/'\n",
"depth_name = \"00000009.pfm\"\n",
"\n",
"with open(\"../datasets/lists/tnt/intermediate.txt\") as f:\n",
" scans_i = [line.rstrip() for line in f.readlines()]\n",
"\n",
"with open(\"../datasets/lists/tnt/advanced.txt\") as f:\n",
" scans_a = [line.rstrip() for line in f.readlines()]\n",
"\n",
"scans = scans_i + scans_a\n",
"\n",
"for scan in scans:\n",
"\n",
" depth_filename = os.path.join(exp_name, scan, \"depth_est\", depth_name)\n",
" if not os.path.exists(depth_filename): continue\n",
" depth = read_depth(depth_filename)\n",
"\n",
" print(scan, depth_name, depth.shape)\n",
"\n",
" plt.figure(figsize=(12, 12))\n",
" plt.xticks([]), plt.yticks([]), plt.axis('off')\n",
" plt.imshow(depth, 'viridis', vmin=0, vmax=10)\n",
"\n",
" plt.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.12"
},
"vscode": {
"interpreter": {
"hash": "d253918f84404206ad3cf9c22ee3709ef6e34cbea610b0ac9787033d60da5e03"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}
================================================
FILE: requirements.txt
================================================
torch==1.10.0
torchvision
opencv-python
numpy==1.18.1
pillow
scipy
tensorboardX
plyfile
open3d
jupyter
notebook
================================================
FILE: scripts/blend/train_blend.sh
================================================
#!/usr/bin/env bash
source scripts/data_path.sh
THISNAME="geomvsnet"
LOG_DIR="./checkpoints/blend/"$THISNAME
if [ ! -d $LOG_DIR ]; then
mkdir -p $LOG_DIR
fi
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node=4 train.py ${@} \
--which_dataset="blendedmvs" --epochs=16 --logdir=$LOG_DIR \
--trainpath=$BLENDEDMVS_ROOT --testpath=$BLENDEDMVS_ROOT \
--trainlist="datasets/lists/blendedmvs/low_res_all.txt" --testlist="datasets/lists/blendedmvs/val.txt" \
\
--n_views="7" --batch_size=2 --lr=0.001 --robust_train \
--lr_scheduler="onecycle"
================================================
FILE: scripts/data_path.sh
================================================
#!/usr/bin/env bash
# DTU
DTU_TRAIN_ROOT="[/path/to/]dtu"
DTU_TEST_ROOT="[/path/to/]dtu-test"
DTU_QUANTITATIVE_ROOT="[/path/to/]dtu-evaluation"
# Tanks and Temples
TNT_ROOT="[/path/to/]tnt"
# BlendedMVS
BLENDEDMVS_ROOT="[/path/to/]blendmvs"
================================================
FILE: scripts/dtu/fusion_dtu.sh
================================================
#!/usr/bin/env bash
source scripts/data_path.sh
THISNAME="geomvsnet"
FUSION_METHOD="open3d"
LOG_DIR="./checkpoints/dtu/"$THISNAME
DTU_OUT_DIR="./outputs/dtu/"$THISNAME
if [ $FUSION_METHOD = "pcd" ] ; then
python3 fusions/dtu/pcd.py ${@} \
--testpath=$DTU_TEST_ROOT --testlist="datasets/lists/dtu/test.txt" \
--outdir=$DTU_OUT_DIR --logdir=$LOG_DIR --nolog \
--num_worker=1 \
\
--thres_view=4 --conf=0.5 \
\
--plydir=$DTU_OUT_DIR"/pcd_fusion_plys/"
elif [ $FUSION_METHOD = "gipuma" ] ; then
# source [/path/to/]anaconda3/etc/profile.d/conda.sh
# conda activate fusibile
CUDA_VISIBLE_DEVICES=0 python2 fusions/dtu/gipuma.py \
--root_dir=$DTU_TEST_ROOT --list_file="datasets/lists/dtu/test.txt" \
--fusibile_exe_path="fusions/fusibile" --out_folder="fusibile_fused" \
--depth_folder=$DTU_OUT_DIR \
--downsample_factor=1 \
\
--prob_threshold=0.5 --disp_threshold=0.25 --num_consistent=3 \
\
--plydir=$DTU_OUT_DIR"/gipuma_fusion_plys/"
elif [ $FUSION_METHOD = "open3d" ] ; then
CUDA_VISIBLE_DEVICES=0 python fusions/dtu/_open3d.py --device="cuda" \
--root_path=$DTU_TEST_ROOT \
--depth_path=$DTU_OUT_DIR \
--data_list="datasets/lists/dtu/test.txt" \
\
--prob_thresh=0.3 --dist_thresh=0.2 --num_consist=4 \
\
--ply_path=$DTU_OUT_DIR"/open3d_fusion_plys/"
fi
================================================
FILE: scripts/dtu/matlab_quan_dtu.sh
================================================
#!/usr/bin/env bash
source scripts/data_path.sh
OUTNAME="geomvsnet"
FUSIONMETHOD="open3d"
# Evaluation
echo "<<<<<<<<<< start parallel evaluation"
METHOD='mvsnet'
PLYPATH='../../../outputs/dtu/'$OUTNAME'/'$FUSIONMETHOD'_fusion_plys/'
RESULTPATH='../../../outputs/dtu/'$OUTNAME'/'$FUSIONMETHOD'_quantitative/'
LOGPATH='outputs/dtu/'$OUTNAME'/'$FUSIONMETHOD'_quantitative/'$OUTNAME'.log'
mkdir -p 'outputs/dtu/'$OUTNAME'/'$FUSIONMETHOD'_quantitative/'
set_array=(1 4 9 10 11 12 13 15 23 24 29 32 33 34 48 49 62 75 77 110 114 118)
num_at_once=2 # 1 2 4 5 7 11 22
times=`expr $((${#set_array[*]} / $num_at_once))`
remain=`expr $((${#set_array[*]} - $num_at_once * $times))`
this_group_num=0
pos=0
for ((t=0; t<$times; t++))
do
if [ "$t" -ge `expr $(($times-$remain))` ] ; then
this_group_num=`expr $(($num_at_once + 1))`
else
this_group_num=$num_at_once
fi
for set in "${set_array[@]:pos:this_group_num}"
do
matlab -nodesktop -nosplash -r "cd datasets/evaluations/dtu_parallel; dataPath='$DTU_QUANTITATIVE_ROOT'; plyPath='$PLYPATH'; resultsPath='$RESULTPATH'; method_string='$METHOD'; thisset='$set'; BaseEvalMain_web" &
done
wait
pos=`expr $(($pos + $this_group_num))`
done
wait
SET=[1,4,9,10,11,12,13,15,23,24,29,32,33,34,48,49,62,75,77,110,114,118]
matlab -nodesktop -nosplash -r "cd datasets/evaluations/dtu_parallel; resultsPath='$RESULTPATH'; method_string='$METHOD'; set='$SET'; ComputeStat_web" > $LOGPATH
================================================
FILE: scripts/dtu/test_dtu.sh
================================================
#!/usr/bin/env bash
source scripts/data_path.sh
THISNAME="geomvsnet"
BESTEPOCH="geomvsnet_release"
LOG_DIR="./checkpoints/dtu/"$THISNAME
DTU_CKPT_FILE=$LOG_DIR"/model_"$BESTEPOCH".ckpt"
DTU_OUT_DIR="./outputs/dtu/"$THISNAME
CUDA_VISIBLE_DEVICES=0 python3 test.py ${@} \
--which_dataset="dtu" --loadckpt=$DTU_CKPT_FILE --batch_size=1 \
--outdir=$DTU_OUT_DIR --logdir=$LOG_DIR --nolog \
--testpath=$DTU_TEST_ROOT --testlist="datasets/lists/dtu/test.txt" \
\
--data_scale="raw" --n_views="5"
================================================
FILE: scripts/dtu/train_dtu.sh
================================================
#!/usr/bin/env bash
source scripts/data_path.sh
THISNAME="geomvsnet"
LOG_DIR="./checkpoints/dtu/"$THISNAME
if [ ! -d $LOG_DIR ]; then
mkdir -p $LOG_DIR
fi
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node=4 train.py ${@} \
--which_dataset="dtu" --epochs=16 --logdir=$LOG_DIR \
--trainpath=$DTU_TRAIN_ROOT --testpath=$DTU_TRAIN_ROOT \
--trainlist="datasets/lists/dtu/train.txt" --testlist="datasets/lists/dtu/test.txt" \
\
--data_scale="mid" --n_views="5" --batch_size=4 --lr=0.002 --robust_train \
--lrepochs="1,3,5,7,9,11,13,15:1.5"
================================================
FILE: scripts/dtu/train_dtu_raw.sh
================================================
#!/usr/bin/env bash
source scripts/data_path.sh
THISNAME="geomvsnet_raw"
LOG_DIR="./checkpoints/dtu/"$THISNAME
if [ ! -d $LOG_DIR ]; then
mkdir -p $LOG_DIR
fi
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node=4 train.py ${@} \
--which_dataset="dtu" --epochs=16 --logdir=$LOG_DIR \
--trainpath=$DTU_TRAIN_ROOT --testpath=$DTU_TRAIN_ROOT \
--trainlist="datasets/lists/dtu/train.txt" --testlist="datasets/lists/dtu/test.txt" \
\
--data_scale="raw" --n_views="5" --batch_size=1 --lr=0.0005 --robust_train \
--lrepochs="1,3,5,7,9,11,13,15:1.5"
================================================
FILE: scripts/tnt/fusion_tnt.sh
================================================
#!/usr/bin/env bash
source scripts/data_path.sh
THISNAME="blend/geomvsnet"
LOG_DIR="./checkpoints/tnt/"$THISNAME
TNT_OUT_DIR="./outputs/tnt/"$THISNAME
# Intermediate
python3 fusions/tnt/dypcd.py ${@} \
--root_dir=$TNT_ROOT --list_file="datasets/lists/tnt/intermediate.txt" --split="intermediate" \
--out_dir=$TNT_OUT_DIR --ply_path=$TNT_OUT_DIR"/dypcd_fusion_plys" \
--img_mode="resize" --cam_mode="origin" --single_processor
# Advanced
python3 fusions/tnt/dypcd.py ${@} \
--root_dir=$TNT_ROOT --list_file="datasets/lists/tnt/advanced.txt" --split="advanced" \
--out_dir=$TNT_OUT_DIR --ply_path=$TNT_OUT_DIR"/dypcd_fusion_plys" \
--img_mode="resize" --cam_mode="origin" --single_processor
================================================
FILE: scripts/tnt/test_tnt.sh
================================================
#!/usr/bin/env bash
source scripts/data_path.sh
THISNAME="blend/geomvsnet"
BESTEPOCH="15"
LOG_DIR="./checkpoints/"$THISNAME
CKPT_FILE=$LOG_DIR"/model_"$BESTEPOCH".ckpt"
TNT_OUT_DIR="./outputs/tnt/"$THISNAME
# Intermediate
CUDA_VISIBLE_DEVICES=0 python3 test.py ${@} \
--which_dataset="tnt" --loadckpt=$CKPT_FILE --batch_size=1 \
--outdir=$TNT_OUT_DIR --logdir=$LOG_DIR --nolog \
--testpath=$TNT_ROOT --testlist="datasets/lists/tnt/intermediate.txt" --split="intermediate" \
\
--n_views="11" --img_mode="resize" --cam_mode="origin"
# Advanced
CUDA_VISIBLE_DEVICES=0 python3 test.py ${@} \
--which_dataset="tnt" --loadckpt=$CKPT_FILE --batch_size=1 \
--outdir=$TNT_OUT_DIR --logdir=$LOG_DIR --nolog \
--testpath=$TNT_ROOT --testlist="datasets/lists/tnt/advanced.txt" --split="advanced" \
\
--n_views="11" --img_mode="resize" --cam_mode="origin"
================================================
FILE: test.py
================================================
# -*- coding: utf-8 -*-
# @Description: Main process of network testing.
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07
import os, time, sys, gc, cv2, logging
import numpy as np
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
from torch.utils.data import DataLoader
from datasets.data_io import *
from datasets.dtu import DTUDataset
from datasets.tnt import TNTDataset
from models.geomvsnet import GeoMVSNet
from models.utils import *
from models.utils.opts import get_opts
cudnn.benchmark = True
args = get_opts()
def test():
total_time = 0
with torch.no_grad():
for batch_idx, sample in enumerate(TestImgLoader):
sample_cuda = tocuda(sample)
start_time = time.time()
# @Note GeoMVSNet main
outputs = model(
sample_cuda["imgs"],
sample_cuda["proj_matrices"], sample_cuda["intrinsics_matrices"],
sample_cuda["depth_values"],
sample["filename"]
)
end_time = time.time()
total_time += end_time - start_time
outputs = tensor2numpy(outputs)
del sample_cuda
filenames = sample["filename"]
cams = sample["proj_matrices"]["stage{}".format(args.levels)].numpy()
imgs = sample["imgs"]
logger.info('Iter {}/{}, Time:{:.3f} Res:{}'.format(batch_idx, len(TestImgLoader), end_time - start_time, imgs[0].shape))
for filename, cam, img, depth_est, photometric_confidence in zip(filenames, cams, imgs, outputs["depth"], outputs["photometric_confidence"]):
img = img[0].numpy() # ref view
cam = cam[0] # ref cam
depth_filename = os.path.join(args.outdir, filename.format('depth_est', '.pfm'))
confidence_filename = os.path.join(args.outdir, filename.format('confidence', '.pfm'))
cam_filename = os.path.join(args.outdir, filename.format('cams', '_cam.txt'))
img_filename = os.path.join(args.outdir, filename.format('images', '.jpg'))
os.makedirs(depth_filename.rsplit('/', 1)[0], exist_ok=True)
os.makedirs(confidence_filename.rsplit('/', 1)[0], exist_ok=True)
if args.which_dataset == 'dtu':
os.makedirs(cam_filename.rsplit('/', 1)[0], exist_ok=True)
os.makedirs(img_filename.rsplit('/', 1)[0], exist_ok=True)
# save depth maps
save_pfm(depth_filename, depth_est)
# save confidence maps
confidence_list = [outputs['stage{}'.format(i)]['photometric_confidence'].squeeze(0) for i in range(1,5)]
photometric_confidence = confidence_list[-1]
if not args.save_conf_all_stages:
save_pfm(confidence_filename, photometric_confidence)
else:
for stage_idx, photometric_confidence in enumerate(confidence_list):
if stage_idx != args.levels - 1:
confidence_filename = os.path.join(args.outdir, filename.format('confidence', "_stage"+str(stage_idx)+'.pfm'))
else:
confidence_filename = os.path.join(args.outdir, filename.format('confidence', '.pfm'))
save_pfm(confidence_filename, photometric_confidence)
# save cams, img
if args.which_dataset == 'dtu':
write_cam(cam_filename, cam)
img = np.clip(np.transpose(img, (1, 2, 0)) * 255, 0, 255).astype(np.uint8)
img_bgr = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
cv2.imwrite(img_filename, img_bgr)
torch.cuda.empty_cache()
gc.collect()
return total_time, len(TestImgLoader)
def initLogger():
logger = logging.getLogger()
logger.setLevel(logging.INFO)
curTime = time.strftime('%Y%m%d-%H%M', time.localtime(time.time()))
if args.which_dataset == 'tnt':
logfile = os.path.join(args.logdir, 'TNT-test-' + curTime + '.log')
else:
logfile = os.path.join(args.logdir, 'test-' + curTime + '.log')
formatter = logging.Formatter("%(asctime)s - %(filename)s[line:%(lineno)d] - %(levelname)s: %(message)s")
if not args.nolog:
fileHandler = logging.FileHandler(logfile, mode='a')
fileHandler.setFormatter(formatter)
logger.addHandler(fileHandler)
consoleHandler = logging.StreamHandler(sys.stdout)
consoleHandler.setFormatter(formatter)
logger.addHandler(consoleHandler)
logger.info("Logger initialized.")
logger.info("Writing logs to file: {}".format(logfile))
logger.info("Current time: {}".format(curTime))
settings_str = "All settings:\n"
for k,v in vars(args).items():
settings_str += '{0}: {1}\n'.format(k,v)
logger.info(settings_str)
return logger
if __name__ == '__main__':
logger = initLogger()
# dataset, dataloader
if args.which_dataset == 'dtu':
test_dataset = DTUDataset(args.testpath, args.testlist, "test", args.n_views, max_wh=(1600, 1200))
elif args.which_dataset == 'tnt':
test_dataset = TNTDataset(args.testpath, args.testlist, split=args.split, n_views=args.n_views, img_wh=(-1, 1024), cam_mode=args.cam_mode, img_mode=args.img_mode)
TestImgLoader = DataLoader(test_dataset, args.batch_size, shuffle=False, num_workers=4, drop_last=False)
# @Note GeoMVSNet model
model = GeoMVSNet(
levels=args.levels,
hypo_plane_num_stages=[int(n) for n in args.hypo_plane_num_stages.split(",")],
depth_interal_ratio_stages=[float(ir) for ir in args.depth_interal_ratio_stages.split(",")],
feat_base_channel=args.feat_base_channel,
reg_base_channel=args.reg_base_channel,
group_cor_dim_stages=[int(n) for n in args.group_cor_dim_stages.split(",")],
)
logger.info("loading model {}".format(args.loadckpt))
state_dict = torch.load(args.loadckpt, map_location=torch.device("cpu"))
model.load_state_dict(state_dict['model'], strict=False)
model.cuda()
model.eval()
test()
================================================
FILE: train.py
================================================
# -*- coding: utf-8 -*-
# @Description: Main process of network training & evaluation.
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07
import os, sys, time, gc, datetime, logging, json
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim as optim
import torch.distributed as dist
from torch.utils.data import DataLoader
from tensorboardX import SummaryWriter
from datasets.dtu import DTUDataset
from datasets.blendedmvs import BlendedMVSDataset
from models.geomvsnet import GeoMVSNet
from models.loss import geomvsnet_loss
from models.utils import *
from models.utils.opts import get_opts
cudnn.benchmark = True
num_gpus = int(os.environ["WORLD_SIZE"]) if "WORLD_SIZE" in os.environ else 1
is_distributed = num_gpus > 1
args = get_opts()
def train(model, model_loss, optimizer, TrainImgLoader, TestImgLoader, start_epoch, args):
if args.lr_scheduler == 'MS':
milestones = [len(TrainImgLoader) * int(epoch_idx) for epoch_idx in args.lrepochs.split(':')[0].split(',')]
lr_gamma = 1 / float(args.lrepochs.split(':')[1])
lr_scheduler = WarmupMultiStepLR(optimizer, milestones, gamma=lr_gamma, warmup_factor=1.0/3, warmup_iters=500, last_epoch=len(TrainImgLoader) * start_epoch - 1)
elif args.lr_scheduler == 'cos':
lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=int(args.epochs*len(TrainImgLoader)), eta_min=0)
elif args.lr_scheduler == 'onecycle':
lr_scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=args.lr, total_steps=int(args.epochs*len(TrainImgLoader)))
elif args.lr_scheduler == 'lambda':
lr_scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda epoch: 0.9 ** ((epoch-1) / len(TrainImgLoader)), last_epoch=len(TrainImgLoader)*start_epoch-1)
for epoch_idx in range(start_epoch, args.epochs):
logger.info('Epoch {}:'.format(epoch_idx))
global_step = len(TrainImgLoader) * epoch_idx
# training
for batch_idx, sample in enumerate(TrainImgLoader):
start_time = time.time()
global_step = len(TrainImgLoader) * epoch_idx + batch_idx
do_summary = global_step % args.summary_freq == 0
loss, scalar_outputs, image_outputs = train_sample(model, model_loss, optimizer, sample, args)
lr_scheduler.step()
if (not is_distributed) or (dist.get_rank() == 0):
if do_summary:
if not args.notensorboard:
tb_save_scalars(tb_writer, 'train', scalar_outputs, global_step)
tb_save_images(tb_writer, 'train', image_outputs, global_step)
logger.info("Epoch {}/{}, Iter {}/{}, 2mm_err={:.3f} | lr={:.6f}, train_loss={:.3f}, abs_err={:.3f}, pw_loss={:.3f}, dds_loss={:.3f}, time={:.3f}".format(
epoch_idx, args.epochs, batch_idx, len(TrainImgLoader),
scalar_outputs["thres2mm_error"],
optimizer.param_groups[0]["lr"],
loss,
scalar_outputs["abs_depth_error"],
scalar_outputs["s3_pw_loss"],
scalar_outputs["s3_dds_loss"],
time.time() - start_time))
del scalar_outputs, image_outputs
# save checkpoint
if (not is_distributed) or (dist.get_rank() == 0):
if ((epoch_idx + 1) % args.save_freq == 0) or (epoch_idx == args.epochs-1):
torch.save({
'epoch': epoch_idx,
'model': model.module.state_dict(),
'optimizer': optimizer.state_dict()},
"{}/model_{:0>2}.ckpt".format(args.logdir, epoch_idx))
gc.collect()
# testing
if (epoch_idx % args.eval_freq == 0) or (epoch_idx == args.epochs - 1):
avg_test_scalars = DictAverageMeter()
for batch_idx, sample in enumerate(TestImgLoader):
start_time = time.time()
global_step = len(TrainImgLoader) * epoch_idx + batch_idx
do_summary = global_step % args.summary_freq == 0
loss, scalar_outputs, image_outputs = test_sample_depth(model, model_loss, sample, args)
if (not is_distributed) or (dist.get_rank() == 0):
if do_summary:
if not args.notensorboard:
tb_save_scalars(tb_writer, 'test', scalar_outputs, global_step)
tb_save_images(tb_writer, 'test', image_outputs, global_step)
logger.info(
"Epoch {}/{}, Iter {}/{}, 2mm_err={:.3f} | lr={:.6f}, test_loss={:.3f}, abs_err={:.3f}, pw_loss={:.3f}, dds_loss={:.3f}, time={:.3f}".format(
epoch_idx, args.epochs, batch_idx, len(TestImgLoader),
scalar_outputs["thres2mm_error"],
optimizer.param_groups[0]["lr"],
loss,
scalar_outputs["abs_depth_error"],
scalar_outputs["s3_pw_loss"],
scalar_outputs["s3_dds_loss"],
time.time() - start_time))
avg_test_scalars.update(scalar_outputs)
del scalar_outputs, image_outputs
if (not is_distributed) or (dist.get_rank() == 0):
if not args.notensorboard:
tb_save_scalars(tb_writer, 'fulltest', avg_test_scalars.mean(), global_step)
logger.info("avg_test_scalars: " + json.dumps(avg_test_scalars.mean()))
gc.collect()
def train_sample(model, model_loss, optimizer, sample, args):
model.train()
optimizer.zero_grad()
sample_cuda = tocuda(sample)
depth_gt_ms, mask_ms = sample_cuda["depth"], sample_cuda["mask"]
depth_gt, mask = depth_gt_ms["stage{}".format(args.levels)], mask_ms["stage{}".format(args.levels)]
# @Note GeoMVSNet main
outputs = model(
sample_cuda["imgs"],
sample_cuda["proj_matrices"], sample_cuda["intrinsics_matrices"],
sample_cuda["depth_values"]
)
depth_est = outputs["depth"]
loss, epe, pw_loss_stages, dds_loss_stages = model_loss(
outputs, depth_gt_ms, mask_ms,
stage_lw=[float(e) for e in args.stage_lw.split(",") if e], depth_values=sample_cuda["depth_values"]
)
loss.backward()
optimizer.step()
scalar_outputs = {
"loss": loss,
"epe": epe,
"s0_pw_loss": pw_loss_stages[0],
"s1_pw_loss": pw_loss_stages[1],
"s2_pw_loss": pw_loss_stages[2],
"s3_pw_loss": pw_loss_stages[3],
"s0_dds_loss": dds_loss_stages[0],
"s1_dds_loss": dds_loss_stages[1],
"s2_dds_loss": dds_loss_stages[2],
"s3_dds_loss": dds_loss_stages[3],
"abs_depth_error": AbsDepthError_metrics(depth_est, depth_gt, mask > 0.5),
"thres2mm_error": Thres_metrics(depth_est, depth_gt, mask > 0.5, 2),
"thres4mm_error": Thres_metrics(depth_est, depth_gt, mask > 0.5, 4),
"thres8mm_error": Thres_metrics(depth_est, depth_gt, mask > 0.5, 8),
}
image_outputs = {
"depth_est": depth_est * mask,
"depth_est_nomask": depth_est,
"depth_gt": sample["depth"]["stage1"],
"ref_img": sample["imgs"][0],
"mask": sample["mask"]["stage1"],
"errormap": (depth_est - depth_gt).abs() * mask,
}
if is_distributed:
scalar_outputs = reduce_scalar_outputs(scalar_outputs)
return tensor2float(scalar_outputs["loss"]), tensor2float(scalar_outputs), tensor2numpy(image_outputs)
@make_nograd_func
def test_sample_depth(model, model_loss, sample, args):
if is_distributed:
model_eval = model.module
else:
model_eval = model
model_eval.eval()
sample_cuda = tocuda(sample)
depth_gt_ms, mask_ms = sample_cuda["depth"], sample_cuda["mask"]
depth_gt, mask = depth_gt_ms["stage{}".format(args.levels)], mask_ms["stage{}".format(args.levels)]
outputs = model_eval(
sample_cuda["imgs"],
sample_cuda["proj_matrices"], sample_cuda["intrinsics_matrices"],
sample_cuda["depth_values"]
)
depth_est = outputs["depth"]
loss, epe, pw_loss_stages, dds_loss_stages = model_loss(
outputs, depth_gt_ms, mask_ms,
stage_lw=[float(e) for e in args.stage_lw.split(",") if e], depth_values=sample_cuda["depth_values"]
)
scalar_outputs = {
"loss": loss,
"epe": epe,
"s0_pw_loss": pw_loss_stages[0],
"s1_pw_loss": pw_loss_stages[1],
"s2_pw_loss": pw_loss_stages[2],
"s3_pw_loss": pw_loss_stages[3],
"s0_dds_loss": dds_loss_stages[0],
"s1_dds_loss": dds_loss_stages[1],
"s2_dds_loss": dds_loss_stages[2],
"s3_dds_loss": dds_loss_stages[3],
"abs_depth_error": AbsDepthError_metrics(depth_est, depth_gt, mask > 0.5),
"thres2mm_error": Thres_metrics(depth_est, depth_gt, mask > 0.5, 2),
"thres4mm_error": Thres_metrics(depth_est, depth_gt, mask > 0.5, 4),
"thres8mm_error": Thres_metrics(depth_est, depth_gt, mask > 0.5, 8),
}
image_outputs = {
"depth_est": depth_est * mask,
"depth_est_nomask": depth_est,
"depth_gt": sample["depth"]["stage1"],
"ref_img": sample["imgs"][0],
"mask": sample["mask"]["stage1"],
"errormap": (depth_est - depth_gt).abs() * mask
}
if is_distributed:
scalar_outputs = reduce_scalar_outputs(scalar_outputs)
return tensor2float(scalar_outputs["loss"]), tensor2float(scalar_outputs), tensor2numpy(image_outputs)
def initLogger():
logger = logging.getLogger()
logger.setLevel(logging.INFO)
curTime = time.strftime('%Y%m%d-%H%M', time.localtime(time.time()))
logfile = os.path.join(args.logdir, 'train-' + curTime + '.log')
formatter = logging.Formatter("%(asctime)s - %(filename)s[line:%(lineno)d] - %(levelname)s: %(message)s")
fileHandler = logging.FileHandler(logfile, mode='a')
fileHandler.setFormatter(formatter)
logger.addHandler(fileHandler)
consoleHandler = logging.StreamHandler(sys.stdout)
consoleHandler.setFormatter(formatter)
logger.addHandler(consoleHandler)
logger.info("Logger initialized.")
logger.info("Writing logs to file: {}".format(logfile))
logger.info("Current time: {}".format(curTime))
settings_str = "All settings:\n"
for k,v in vars(args).items():
settings_str += '{0}: {1}\n'.format(k,v)
logger.info(settings_str)
return logger
if __name__ == '__main__':
logger = initLogger()
if args.resume:
assert args.mode == "train"
assert args.loadckpt is None
if is_distributed:
torch.cuda.set_device(args.local_rank)
torch.distributed.init_process_group(backend="nccl", init_method="env://")
synchronize()
set_random_seed(args.seed)
device = torch.device(args.device)
# tensorboard
if (not is_distributed) or (dist.get_rank() == 0):
if not os.path.isdir(args.logdir):
os.makedirs(args.logdir)
current_time_str = str(datetime.datetime.now().strftime('%Y%m%d_%H%M%S'))
logger.info("current time " + current_time_str)
logger.info("creating new summary file")
if not args.notensorboard:
tb_writer = SummaryWriter(args.logdir)
# @Note GeoMVSNet model
model = GeoMVSNet(
levels=args.levels,
hypo_plane_num_stages=[int(n) for n in args.hypo_plane_num_stages.split(",")],
depth_interal_ratio_stages=[float(ir) for ir in args.depth_interal_ratio_stages.split(",")],
feat_base_channel=args.feat_base_channel,
reg_base_channel=args.reg_base_channel,
group_cor_dim_stages=[int(n) for n in args.group_cor_dim_stages.split(",")],
)
model.to(device)
model_loss = geomvsnet_loss
# optimizer
optimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=args.lr, betas=(0.9, 0.999), weight_decay=args.wd)
# load parameters
start_epoch = 0
if args.resume:
saved_models = [fn for fn in os.listdir(args.logdir) if fn.endswith(".ckpt")]
saved_models = sorted(saved_models, key=lambda x: int(x.split('_')[-1].split('.')[0]))
loadckpt = os.path.join(args.logdir, saved_models[-1])
logger.info("resuming: " + loadckpt)
state_dict = torch.load(loadckpt, map_location=torch.device("cpu"))
model.load_state_dict(state_dict['model'])
optimizer.load_state_dict(state_dict['optimizer'])
start_epoch = state_dict['epoch'] + 1
# distributed
if (not is_distributed) or (dist.get_rank() == 0):
logger.info("start at epoch {}".format(start_epoch))
logger.info('Number of model parameters: {}'.format(sum([p.data.nelement() for p in model.parameters()])))
if is_distributed:
if dist.get_rank() == 0:
logger.info("Let's use {} GPUs in distributed mode!".format(torch.cuda.device_count()))
model = torch.nn.parallel.DistributedDataParallel(
model, device_ids=[args.local_rank], output_device=args.local_rank,
find_unused_parameters=True,
)
else:
if torch.cuda.is_available():
logger.info("Let's use {} GPUs in parallel mode.".format(torch.cuda.device_count()))
model = nn.DataParallel(model)
# dataset, dataloader
if args.which_dataset == "dtu":
train_dataset = DTUDataset(args.trainpath, args.trainlist, "train", args.n_views, data_scale=args.data_scale, robust_train=args.robust_train)
test_dataset = DTUDataset(args.testpath, args.testlist, "val", args.n_views, data_scale=args.data_scale)
elif args.which_dataset == "blendedmvs":
train_dataset = BlendedMVSDataset(args.trainpath, args.trainlist, "train", args.n_views, img_wh=(768, 576), robust_train=args.robust_train, augment=False)
test_dataset = BlendedMVSDataset(args.testpath, args.testlist, "val", args.n_views, img_wh=(768, 576))
if is_distributed:
train_sampler = torch.utils.data.DistributedSampler(train_dataset, num_replicas=dist.get_world_size(), rank=dist.get_rank())
test_sampler = torch.utils.data.DistributedSampler(test_dataset, num_replicas=dist.get_world_size(), rank=dist.get_rank())
TrainImgLoader = DataLoader(train_dataset, args.batch_size, sampler=train_sampler, num_workers=8, drop_last=True, pin_memory=args.pin_m)
TestImgLoader = DataLoader(test_dataset, args.batch_size, sampler=test_sampler, num_workers=8, drop_last=False, pin_memory=args.pin_m)
else:
TrainImgLoader = DataLoader(train_dataset, args.batch_size, shuffle=True, num_workers=8, drop_last=True, pin_memory=args.pin_m)
TestImgLoader = DataLoader(test_dataset, args.batch_size, shuffle=False, num_workers=8, drop_last=False, pin_memory=args.pin_m)
train(model, model_loss, optimizer, TrainImgLoader, TestImgLoader, start_epoch, args)