Repository: doubleZ0108/GeoMVSNet Branch: master Commit: 09167fc95f04 Files: 48 Total size: 230.0 KB Directory structure: gitextract_c1590sn0/ ├── .gitignore ├── LICENSE ├── README.md ├── datasets/ │ ├── __init__.py │ ├── blendedmvs.py │ ├── data_io.py │ ├── dtu.py │ ├── evaluations/ │ │ └── dtu_parallel/ │ │ ├── BaseEval2Obj_web.m │ │ ├── BaseEvalMain_web.m │ │ ├── ComputeStat_web.m │ │ ├── MaxDistCP.m │ │ ├── PointCompareMain.m │ │ ├── plyread.m │ │ └── reducePts_haa.m │ ├── lists/ │ │ ├── blendedmvs/ │ │ │ ├── low_res_all.txt │ │ │ └── val.txt │ │ ├── dtu/ │ │ │ ├── test.txt │ │ │ ├── train.txt │ │ │ └── val.txt │ │ └── tnt/ │ │ ├── advanced.txt │ │ └── intermediate.txt │ └── tnt.py ├── fusions/ │ ├── dtu/ │ │ ├── _open3d.py │ │ ├── gipuma.py │ │ └── pcd.py │ └── tnt/ │ └── dypcd.py ├── models/ │ ├── __init__.py │ ├── filter.py │ ├── geometry.py │ ├── geomvsnet.py │ ├── loss.py │ ├── submodules.py │ └── utils/ │ ├── __init__.py │ ├── opts.py │ └── utils.py ├── outputs/ │ └── visual.ipynb ├── requirements.txt ├── scripts/ │ ├── blend/ │ │ └── train_blend.sh │ ├── data_path.sh │ ├── dtu/ │ │ ├── fusion_dtu.sh │ │ ├── matlab_quan_dtu.sh │ │ ├── test_dtu.sh │ │ ├── train_dtu.sh │ │ └── train_dtu_raw.sh │ └── tnt/ │ ├── fusion_tnt.sh │ └── test_tnt.sh ├── test.py └── train.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ .DS_Store __pycache__ ================================================ FILE: LICENSE ================================================ Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ================================================ FILE: README.md ================================================

GeoMVSNet: Learning Multi-View Stereo With Geometry Perception (CVPR 2023)

Zhe Zhang, Rui Peng, Yuxi Hu, Ronggang Wang*

     

## 🔨 Setup ### 1.1 Requirements Use the following commands to build the `conda` environment. ```bash conda create -n geomvsnet python=3.8 conda activate geomvsnet pip install -r requirements.txt ``` ### 1.2 Datasets Download the following datasets and modify the corresponding local path in `scripts/data_path.sh`. #### DTU Dataset **Training data**. We use the same DTU training data as mentioned in MVSNet and CasMVSNet, please refer to [DTU training data](https://drive.google.com/file/d/1eDjh-_bxKKnEuz5h-HXS7EDJn59clx6V/view) and [Depth raw](https://virutalbuy-public.oss-cn-hangzhou.aliyuncs.com/share/cascade-stereo/CasMVSNet/dtu_data/dtu_train_hr/Depths_raw.zip) for data download. Optional, you should download the [Recitfied raw](http://roboimagedata2.compute.dtu.dk/data/MVS/Rectified.zip) if you want to train the model in raw image resolution. Unzip and organize them as: ``` dtu/ ├── Cameras ├── Depths ├── Depths_raw ├── Rectified └── Rectified_raw (optional) ``` **Testing data**. For convenience, we use the [DTU testing data](https://drive.google.com/file/d/1rX0EXlUL4prRxrRu2DgLJv2j7-tpUD4D/view?usp=sharing) processed by CVP-MVSNet. Also unzip and organize it as: ``` dtu-test/ ├── Cameras ├── Depths └── Rectified ``` > Please note that the images and lighting here are consistent with the original dataset. #### BlendedMVS Dataset Download the low image resolution version of [BlendedMVS dataset](https://drive.google.com/file/d/1ilxls-VJNvJnB7IaFj7P0ehMPr7ikRCb/view) and unzip it as: ``` blendedmvs/ └── dataset_low_res ├── ... └── 5c34529873a8df509ae57b58 ``` #### Tanks and Temples Dataset Download the intermediate and advanced subsets of [Tanks and Temples dataset](https://drive.google.com/file/d/1YArOJaX9WVLJh4757uE8AEREYkgszrCo/view) and unzip them. If you want to use the short range version of camera parameters for `Intermediate` subset, unzip `short_range_caemeras_for_mvsnet.zip` and move `cam_[]` to the corresponding scenarios. ``` tnt/ ├── advanced │ ├── ... │ └── Temple │ ├── cams │ ├── images │ ├── pair.txt │ └── Temple.log └── intermediate ├── ... └── Train ├── cams ├── cams_train ├── images ├── pair.txt └── Train.log ``` ## 🚂 Training You can train GeoMVSNet from scratch on DTU dataset and BlendedMVS dataset. After suitable setting and training, you can get the training checkpoints model in `checkpoints/[Dataset]/[THISNAME]`, and the following outputs lied in the folder: - `events.out.tfevents*`: you can use `tensorboard` to monitor the training process. - `model_[epoch].ckpt`: we save a checkpoint every `--save_freq`. - `train-[TIME].log`: logged the detailed training message, you can refer to appropiate indicators to judge the quality of training. ### 2.1 DTU To train GeoMVSNet on DTU dataset, you can refer to `scripts/dtu/train_dtu.sh`, specify `THISNAME`, `CUDA_VISIBLE_DEVICES`, `batch_size`, etc. to meet your demand. And run: ```bash bash scripts/dtu/train_dtu.sh ``` The default training strategy we provide is the *distributed* training mode. If you want to use the *general* training mode, you can refer to the following code.
general training script ```bash CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train.py ${@} \ --which_dataset="dtu" --epochs=16 --logdir=$LOG_DIR \ --trainpath=$DTU_TRAIN_ROOT --testpath=$DTU_TRAIN_ROOT \ --trainlist="datasets/lists/dtu/train.txt" --testlist="datasets/lists/dtu/test.txt" \ \ --data_scale="mid" --n_views="5" --batch_size=16 --lr=0.025 --robust_train \ --lrepochs="1,3,5,7,9,11,13,15:1.5" ```
> It should be noted that two different training strategies need to adjust the `batch_size` and `lr` parameters to achieve the best training results. ### 2.2 BlendedMVS To train GeoMVSNet on BlendedMVS dataset, you can refer to `scripts/bled/train_blend.sh`, and also specify `THISNAME`, `CUDA_VISIBLE_DEVICES`, `batch_size`, etc. to meet your demand. And run: ```bash bash scripts/blend/train_blend.sh ``` By default, we use `7` viewpoints as input for the BlendedMVS training. Similarly, you can choose to use the *distributed* training mode or the *general* one as mentioned in 2.1. ## ⚗️ Testing ### 3.1 DTU For DTU testing, we use model trained on DTU training dataset. You can basically download our [DTU pretrained model](https://drive.google.com/file/d/147_UbjE87E-HB9sZ5yLDbckynH825nJd/view?usp=sharing) and put it into `checkpoints/dtu/geomvsnet/`. And perform *depth map estimation, point cloud fusion, and result evaluation* according to the following steps. 1. Run `bash scripts/dtu/test_dtu.sh` for depth map estimation. The results will be stored in `outputs/dtu/[THISNAME]/`, each scan folder holding `depth_est` and `confidence`, etc. - Use `outputs/visual.ipynb` for depth map visualization. 2. Run `bash scripts/dtu/fusion_dtu.sh` for point cloud fusion. We provide 3 different fusion methods, and we recommend the `open3d` option by default. After fusion, you can get `[FUSION_METHOD]_fusion_plys` under the experiment output folder, point clouds of each testing scan are there.
(Optional) If you want to use the "Gipuma" fusion method. 1. Clone the [edited fusibile repo](https://github.com/YoYo000/fusibile). 2. Refer to [fusibile configuration blog (Chinese)](https://zhuanlan.zhihu.com/p/460212787) for building details. 3. Create a new python2.7 conda env. ```bash conda create -n fusibile python=2.7 conda install scipy matplotlib conda install tensorflow==1.14.0 conda install -c https://conda.anaconda.org/menpo opencv ``` 4. Use the `fusibile` conda environment for `gipuma` fusion method.
3. Download the [ObsMask](http://roboimagedata2.compute.dtu.dk/data/MVS/SampleSet.zip) and [Points](http://roboimagedata2.compute.dtu.dk/data/MVS/Points.zip) of DTU GT point clouds from the official website and organize them as: ``` dtu-evaluation/ ├── ObsMask └── Points ``` 4. Setup `Matlab` in command line mode, and run `bash scripts/dtu/matlab_quan_dtu.sh`. You can adjust the `num_at_once` config according to your machine's CPU and memory ceiling. After quantitative evaluation, you will get `[FUSION_METHOD]_quantitative/` and `[THISNAME].log` just store the quantitative results. ### 3.2 Tanks and Temples For testing on [Tanks and Temples benchmark](https://www.tanksandtemples.org/leaderboard/), you can use any of the following configurations: - Only train on DTU training dataset. - Only train on BlendedMVS dataset. - Pretrained on DTU training dataset and finetune on BlendedMVS dataset. (Recommend) After your personal training, also follow these steps: 1. Run `bash scripts/tnt/test_tnt.sh` for depth map estimation. The results will be stored in `outputs/[TRAINING_DATASET]/[THISNAME]/`. - Use `outputs/visual.ipynb` for depth map visualization. 2. Run `bash scripts/tnt/fusion_tnt.sh` for point cloud fusion. We provide the popular dynamic fusion strategy, and you can tune the fusion threshold in `fusions/tnt/dypcd.py`. 3. Follow the *Upload Instructions* on the [T&T official website](https://www.tanksandtemples.org/submit/) to make online submissions. ### 3.3 Custom Data (TODO) GeoMVSNet can reconstruct on custom data. At present, you can refer to [MVSNet](https://github.com/YoYo000/MVSNet#file-formats) to organize your data, and refer to the same steps as above for *depth estimation* and *point cloud fusion*. ## 💡 Results Our results on DTU and Tanks and Temples Dataset are listed in the tables. | DTU Dataset | Acc. ↓ | Comp. ↓ | Overall ↓ | | ----------- | ------ | ------- | --------- | | GeoMVSNet | 0.3309 | 0.2593 | 0.2951 | | T&T (Intermediate) | Mean ↑ | Family | Francis | Horse | Lighthouse | M60 | Panther | Playground | Train | | ------------------ | ------ | ------ | ------- | ----- | ---------- | ----- | ------- | ---------- | ----- | | GeoMVSNet | 65.89 | 81.64 | 67.53 | 55.78 | 68.02 | 65.49 | 67.19 | 63.27 | 58.22 | | T&T (Advanced) | Mean ↑ | Auditorium | Ballroom | Courtroom | Museum | Palace | Temple | | -------------- | ------ | ---------- | -------- | --------- | ------ | ------ | ------ | | GeoMVSNet | 41.52 | 30.23 | 46.53 | 39.98 | 53.05 | 35.98 | 43.34 | And you can download our [Point Cloud](https://disk.pku.edu.cn:443/link/69D473126C509C8DCBCC7E233FAAEEAA) and [Estimated Depth](https://disk.pku.edu.cn:443/link/4217EB2F063D2B10EDC711F54A12B5F7) for academic usage.
🌟 About Reproduce Paper Results In our experiment, we found that the reproduction of MVS network is relatively difficult. Therefore, we summarize some of the problems encountered in our experiment as follows, hoping to be helpful to you. **Q1. GPU Architecture Matters.** There are two commonly used NVIDIA GPU series: GeForce RTX (e.g. 4090Ti, 3090Ti, 2090Ti) and Tesla (e.g. V100, T4). We find that there is generally no performance degradation in training and testing on the same series of GPUs. But on the contrary, for example, if you train on V100 and test on 3090Ti, the visual effect of the depth map looks exactly the same, but each pixel value is not exactly the same. We conjecture that the two series or architectures differ in numerical computation and processing precision. > Our pretrained model is trained on NVIDIA V100 GPUs. **Q2. Pytorch Version Matters.** Different Cuda versions will result in different optional Pytorch versions. Different torch versions will affect the accuracy of network training and testing. One of the reasons we found is that the implementation and parameter control of the `F.grid_sample()` are various in different versions of Pytorch. **Q3. Training Hyperparameters Matters.** In the era of neural network, hyperparameters really matter. We made some network hyperparameters tuning, but it may not be the same as your configuration. Most fundamentally, due to differences in GPU graphics memory, you need to synchronize `batch_size` and `lr`. And the schedule of learning rate also matters. **Q4. Testing Epoch Matters.** By default, our model will train 16 epochs. But how to select the best training model for testing to achieve the best performance? One solution is to use [PyTorch-lightning](https://lightning.ai/docs/pytorch/latest/starter/introduction.html). For simplicity, you can decide which checkpoint to use based on the `.log` file we provide. **Q5. Fusion Hyperparameters Matters.** For both DTU and T&T datasets, the hyperparameters of point cloud fusion greatly affect the final performance. We have provided different fusion strategies and easy access to adjust parameters. Maybe you need to know the temperament of your model. Qx. Others, you can [raise an issue](https://github.com/doubleZ0108/GeoMVSNet/issues/new/choose) if you meet other problems.

## ⚖️ Citation ``` @InProceedings{zhe2023geomvsnet, title={GeoMVSNet: Learning Multi-View Stereo With Geometry Perception}, author={Zhang, Zhe and Peng, Rui and Hu, Yuxi and Wang, Ronggang}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={21508--21518}, year={2023} } ``` ## 💌 Acknowledgements This repository is partly based on [MVSNet](https://github.com/YoYo000/MVSNet), [MVSNet-pytorch](https://github.com/xy-guo/MVSNet_pytorch), [CVP-MVSNet](https://github.com/JiayuYANG/CVP-MVSNet), [cascade-stereo](https://github.com/alibaba/cascade-stereo), [MVSTER](https://github.com/JeffWang987/MVSTER). We appreciate their contributions to the MVS community. ================================================ FILE: datasets/__init__.py ================================================ ================================================ FILE: datasets/blendedmvs.py ================================================ # -*- coding: utf-8 -*- # @Description: Data preprocessing and organization for BlendedMVS dataset. # @Author: Zhe Zhang (doublez@stu.pku.edu.cn) # @Affiliation: Peking University (PKU) # @LastEditDate: 2023-09-07 import os import cv2 import random import numpy as np from PIL import Image from torch.utils.data import Dataset from torchvision import transforms as T from datasets.data_io import * def motion_blur(img: np.ndarray, max_kernel_size=3): # Either vertial, hozirontal or diagonal blur mode = np.random.choice(['h', 'v', 'diag_down', 'diag_up']) ksize = np.random.randint(0, (max_kernel_size + 1) / 2) * 2 + 1 # make sure is odd center = int((ksize - 1) / 2) kernel = np.zeros((ksize, ksize)) if mode == 'h': kernel[center, :] = 1. elif mode == 'v': kernel[:, center] = 1. elif mode == 'diag_down': kernel = np.eye(ksize) elif mode == 'diag_up': kernel = np.flip(np.eye(ksize), 0) var = ksize * ksize / 16. grid = np.repeat(np.arange(ksize)[:, np.newaxis], ksize, axis=-1) gaussian = np.exp(-(np.square(grid - center) + np.square(grid.T - center)) / (2. * var)) kernel *= gaussian kernel /= np.sum(kernel) img = cv2.filter2D(img, -1, kernel) return img class BlendedMVSDataset(Dataset): def __init__(self, root_dir, list_file, split, n_views, **kwargs): super(BlendedMVSDataset, self).__init__() self.levels = 4 self.root_dir = root_dir self.list_file = list_file self.split = split self.n_views = n_views assert self.split in ['train', 'val', 'all'] self.scale_factors = {} self.scale_factor = 0 self.img_wh = kwargs.get("img_wh", (768, 576)) assert self.img_wh[0]%32==0 and self.img_wh[1]%32==0, \ 'img_wh must both be multiples of 2^5!' self.robust_train = kwargs.get("robust_train", True) self.augment = kwargs.get("augment", True) if self.augment: self.color_augment = T.ColorJitter(brightness=0.25, contrast=(0.3, 1.5)) self.metas = self.build_metas() def build_metas(self): metas = [] with open(self.list_file) as f: self.scans = [line.rstrip() for line in f.readlines()] for scan in self.scans: with open(os.path.join(self.root_dir, scan, "cams/pair.txt")) as f: num_viewpoint = int(f.readline()) for _ in range(num_viewpoint): ref_view = int(f.readline().rstrip()) src_views = [int(x) for x in f.readline().rstrip().split()[1::2]] if len(src_views) >= self.n_views-1: metas += [(scan, ref_view, src_views)] return metas def read_cam_file(self, scan, filename): with open(filename) as f: lines = f.readlines() lines = [line.rstrip() for line in lines] # extrinsics: line [1,5), 4x4 matrix extrinsics = np.fromstring(' '.join(lines[1:5]), dtype=np.float32, sep=' ').reshape((4, 4)) # intrinsics: line [7-10), 3x3 matrix intrinsics = np.fromstring(' '.join(lines[7:10]), dtype=np.float32, sep=' ').reshape((3, 3)) depth_min = float(lines[11].split()[0]) depth_max = float(lines[11].split()[-1]) if scan not in self.scale_factors: self.scale_factors[scan] = 100.0 / depth_min depth_min *= self.scale_factors[scan] depth_max *= self.scale_factors[scan] extrinsics[:3, 3] *= self.scale_factors[scan] return intrinsics, extrinsics, depth_min, depth_max def read_depth_mask(self, scan, filename, depth_min, depth_max, scale): depth = np.array(read_pfm(filename)[0], dtype=np.float32) depth = (depth * self.scale_factors[scan]) * scale mask = (depth>=depth_min) & (depth<=depth_max) assert mask.sum() > 0 mask = mask.astype(np.float32) if self.img_wh is not None: depth = cv2.resize(depth, self.img_wh, interpolation=cv2.INTER_NEAREST) h, w = depth.shape depth_ms = {} mask_ms = {} for i in range(self.levels): depth_cur = cv2.resize(depth, (w//(2**i), h//(2**i)), interpolation=cv2.INTER_NEAREST) mask_cur = cv2.resize(mask, (w//(2**i), h//(2**i)), interpolation=cv2.INTER_NEAREST) depth_ms[f"stage{self.levels-i}"] = depth_cur mask_ms[f"stage{self.levels-i}"] = mask_cur return depth_ms, mask_ms def read_img(self, filename): img = Image.open(filename) if self.augment: img = self.color_augment(img) img = motion_blur(np.array(img, dtype=np.float32)) np_img = np.array(img, dtype=np.float32) / 255. return np_img def __len__(self): return len(self.metas) def __getitem__(self, idx): meta = self.metas[idx] scan, ref_view, src_views = meta if self.robust_train: num_src_views = len(src_views) index = random.sample(range(num_src_views), self.n_views - 1) view_ids = [ref_view] + [src_views[i] for i in index] scale_ratio = random.uniform(0.8, 1.25) else: view_ids = [ref_view] + src_views[:self.n_views - 1] scale_ratio = 1 imgs = [] mask = None depth = None depth_min = None depth_max = None proj={} proj_matrices_0 = [] proj_matrices_1 = [] proj_matrices_2 = [] proj_matrices_3 = [] for i, vid in enumerate(view_ids): img_filename = os.path.join(self.root_dir, '{}/blended_images/{:0>8}.jpg'.format(scan, vid)) depth_filename = os.path.join(self.root_dir, '{}/rendered_depth_maps/{:0>8}.pfm'.format(scan, vid)) proj_mat_filename = os.path.join(self.root_dir, '{}/cams/{:0>8}_cam.txt'.format(scan, vid)) img = self.read_img(img_filename) imgs.append(img.transpose(2,0,1)) intrinsics, extrinsics, depth_min_, depth_max_ = self.read_cam_file(scan, proj_mat_filename) proj_mat_0 = np.zeros(shape=(2, 4, 4), dtype=np.float32) proj_mat_1 = np.zeros(shape=(2, 4, 4), dtype=np.float32) proj_mat_2 = np.zeros(shape=(2, 4, 4), dtype=np.float32) proj_mat_3 = np.zeros(shape=(2, 4, 4), dtype=np.float32) extrinsics[:3, 3] *= scale_ratio intrinsics[:2,:] *= 0.125 proj_mat_0[0,:4,:4] = extrinsics.copy() proj_mat_0[1,:3,:3] = intrinsics.copy() int_mat_0 = intrinsics.copy() intrinsics[:2,:] *= 2 proj_mat_1[0,:4,:4] = extrinsics.copy() proj_mat_1[1,:3,:3] = intrinsics.copy() int_mat_1 = intrinsics.copy() intrinsics[:2,:] *= 2 proj_mat_2[0,:4,:4] = extrinsics.copy() proj_mat_2[1,:3,:3] = intrinsics.copy() int_mat_2 = intrinsics.copy() intrinsics[:2,:] *= 2 proj_mat_3[0,:4,:4] = extrinsics.copy() proj_mat_3[1,:3,:3] = intrinsics.copy() int_mat_3 = intrinsics.copy() proj_matrices_0.append(proj_mat_0) proj_matrices_1.append(proj_mat_1) proj_matrices_2.append(proj_mat_2) proj_matrices_3.append(proj_mat_3) # reference view if i == 0: depth_min = depth_min_ * scale_ratio depth_max = depth_max_ * scale_ratio depth, mask = self.read_depth_mask(scan, depth_filename, depth_min, depth_max, scale_ratio) for l in range(self.levels): mask[f'stage{l+1}'] = mask[f'stage{l+1}'] depth[f'stage{l+1}'] = depth[f'stage{l+1}'] proj['stage1'] = np.stack(proj_matrices_0) proj['stage2'] = np.stack(proj_matrices_1) proj['stage3'] = np.stack(proj_matrices_2) proj['stage4'] = np.stack(proj_matrices_3) intrinsics_matrices = { "stage1": int_mat_0, "stage2": int_mat_1, "stage3": int_mat_2, "stage4": int_mat_3 } sample = { "imgs": imgs, "proj_matrices": proj, "intrinsics_matrices": intrinsics_matrices, "depth": depth, "depth_values": np.array([depth_min, depth_max], dtype=np.float32), "mask": mask } return sample ================================================ FILE: datasets/data_io.py ================================================ # -*- coding: utf-8 -*- # @Description: I/O functions for depth maps and camera files. # @Author: Zhe Zhang (doublez@stu.pku.edu.cn) # @Affiliation: Peking University (PKU) # @LastEditDate: 2023-09-07 import sys, re import numpy as np def read_pfm(filename): file = open(filename, 'rb') color = None width = None height = None scale = None endian = None header = file.readline().decode('utf-8').rstrip() if header == 'PF': color = True elif header == 'Pf': color = False else: raise Exception('Not a PFM file.') dim_match = re.match(r'^(\d+)\s(\d+)\s$', file.readline().decode('utf-8')) if dim_match: width, height = map(int, dim_match.groups()) else: raise Exception('Malformed PFM header.') scale = float(file.readline().rstrip()) if scale < 0: # little-endian endian = '<' scale = -scale else: endian = '>' # big-endian data = np.fromfile(file, endian + 'f') shape = (height, width, 3) if color else (height, width) data = np.reshape(data, shape) data = np.flipud(data) file.close() return data, scale def save_pfm(filename, image, scale=1): file = open(filename, "wb") color = None image = np.flipud(image) if image.dtype.name != 'float32': raise Exception('Image dtype must be float32.') if len(image.shape) == 3 and image.shape[2] == 3: # color image color = True elif len(image.shape) == 2 or len(image.shape) == 3 and image.shape[2] == 1: # greyscale color = False else: raise Exception('Image must have H x W x 3, H x W x 1 or H x W dimensions.') file.write('PF\n'.encode('utf-8') if color else 'Pf\n'.encode('utf-8')) file.write('{} {}\n'.format(image.shape[1], image.shape[0]).encode('utf-8')) endian = image.dtype.byteorder if endian == '<' or endian == '=' and sys.byteorder == 'little': scale = -scale file.write(('%f\n' % scale).encode('utf-8')) image.tofile(file) file.close() def write_cam(file, cam): f = open(file, "w") f.write('extrinsic\n') for i in range(0, 4): for j in range(0, 4): f.write(str(cam[0][i][j]) + ' ') f.write('\n') f.write('\n') f.write('intrinsic\n') for i in range(0, 3): for j in range(0, 3): f.write(str(cam[1][i][j]) + ' ') f.write('\n') f.write('\n' + str(cam[1][3][0]) + ' ' + str(cam[1][3][1]) + ' ' + str(cam[1][3][2]) + ' ' + str(cam[1][3][3]) + '\n') f.close() ================================================ FILE: datasets/dtu.py ================================================ # -*- coding: utf-8 -*- # @Description: Data preprocessing and organization for DTU dataset. # @Author: Zhe Zhang (doublez@stu.pku.edu.cn) # @Affiliation: Peking University (PKU) # @LastEditDate: 2023-09-07 import os import cv2 import random import numpy as np from PIL import Image from torchvision import transforms from torch.utils.data import Dataset from datasets.data_io import * class DTUDataset(Dataset): def __init__(self, root_dir, list_file, mode, n_views, **kwargs): super(DTUDataset, self).__init__() self.root_dir = root_dir self.list_file = list_file self.mode = mode self.n_views = n_views assert self.mode in ["train", "val", "test"] self.total_depths = 192 self.interval_scale = 1.06 self.data_scale = kwargs.get("data_scale", "mid") # mid / raw self.robust_train = kwargs.get("robust_train", False) # True / False self.color_augment = transforms.ColorJitter(brightness=0.5, contrast=0.5) if self.mode == "test": self.max_wh = kwargs.get("max_wh", (1600, 1200)) self.metas = self.build_metas() def build_metas(self): metas = [] with open(os.path.join(self.list_file)) as f: scans = [line.rstrip() for line in f.readlines()] pair_file = "Cameras/pair.txt" for scan in scans: with open(os.path.join(self.root_dir, pair_file)) as f: num_viewpoint = int(f.readline()) # viewpoints (49) for _ in range(num_viewpoint): ref_view = int(f.readline().rstrip()) src_views = [int(x) for x in f.readline().rstrip().split()[1::2]] if self.mode == "train": # light conditions 0-6 for light_idx in range(7): metas.append((scan, light_idx, ref_view, src_views)) elif self.mode in ["test", "val"]: if len(src_views) < self.n_views: print("{} < num_views:{}".format(len(src_views), self.n_views)) src_views += [src_views[0]] * (self.n_views - len(src_views)) metas.append((scan, 3, ref_view, src_views)) print("DTU Dataset in", self.mode, "mode metas:", len(metas)) return metas def __len__(self): return len(self.metas) def read_cam_file(self, filename): with open(filename) as f: lines = f.readlines() lines = [line.rstrip() for line in lines] # extrinsics: line [1,5), 4x4 matrix extrinsics = np.fromstring(' '.join(lines[1:5]), dtype=np.float32, sep=' ').reshape((4, 4)) # intrinsics: line [7-10), 3x3 matrix intrinsics = np.fromstring(' '.join(lines[7:10]), dtype=np.float32, sep=' ').reshape((3, 3)) if self.mode == "test": intrinsics[:2, :] /= 4.0 # depth_min & depth_interval: line 11 depth_min = float(lines[11].split()[0]) depth_interval = float(lines[11].split()[1]) if len(lines[11].split()) >= 3: num_depth = lines[11].split()[2] depth_max = depth_min + int(float(num_depth)) * depth_interval depth_interval = (depth_max - depth_min) / self.total_depths depth_interval *= self.interval_scale return intrinsics, extrinsics, depth_min, depth_interval def read_img(self, filename): img = Image.open(filename) if self.mode == "train" and self.robust_train: img = self.color_augment(img) # scale 0~255 to 0~1 np_img = np.array(img, dtype=np.float32) / 255. return np_img def crop_img(self, img): raw_h, raw_w = img.shape[:2] start_h = (raw_h-1024)//2 start_w = (raw_w-1280)//2 return img[start_h:start_h+1024, start_w:start_w+1280, :] # (1024, 1280) def prepare_img(self, hr_img): h, w = hr_img.shape if self.data_scale == "mid": hr_img_ds = cv2.resize(hr_img, (w//2, h//2), interpolation=cv2.INTER_NEAREST) h, w = hr_img_ds.shape target_h, target_w = 512, 640 start_h, start_w = (h - target_h)//2, (w - target_w)//2 hr_img_crop = hr_img_ds[start_h: start_h + target_h, start_w: start_w + target_w] elif self.data_scale == "raw": hr_img_crop = hr_img[h//2-1024//2:h//2+1024//2, w//2-1280//2:w//2+1280//2] # (1024, 1280) return hr_img_crop def scale_mvs_input(self, img, intrinsics, max_w, max_h, base=64): h, w = img.shape[:2] if h > max_h or w > max_w: scale = 1.0 * max_h / h if scale * w > max_w: scale = 1.0 * max_w / w new_w, new_h = scale * w // base * base, scale * h // base * base else: new_w, new_h = 1.0 * w // base * base, 1.0 * h // base * base scale_w = 1.0 * new_w / w scale_h = 1.0 * new_h / h intrinsics[0, :] *= scale_w intrinsics[1, :] *= scale_h img = cv2.resize(img, (int(new_w), int(new_h))) return img, intrinsics def read_mask_hr(self, filename): img = Image.open(filename) np_img = np.array(img, dtype=np.float32) np_img = (np_img > 10).astype(np.float32) np_img = self.prepare_img(np_img) h, w = np_img.shape np_img_ms = { "stage1": cv2.resize(np_img, (w//8, h//8), interpolation=cv2.INTER_NEAREST), "stage2": cv2.resize(np_img, (w//4, h//4), interpolation=cv2.INTER_NEAREST), "stage3": cv2.resize(np_img, (w//2, h//2), interpolation=cv2.INTER_NEAREST), "stage4": np_img, } return np_img_ms def read_depth_hr(self, filename, scale): depth_hr = np.array(read_pfm(filename)[0], dtype=np.float32) * scale depth_lr = self.prepare_img(depth_hr) h, w = depth_lr.shape depth_lr_ms = { "stage1": cv2.resize(depth_lr, (w//8, h//8), interpolation=cv2.INTER_NEAREST), "stage2": cv2.resize(depth_lr, (w//4, h//4), interpolation=cv2.INTER_NEAREST), "stage3": cv2.resize(depth_lr, (w//2, h//2), interpolation=cv2.INTER_NEAREST), "stage4": depth_lr, } return depth_lr_ms def __getitem__(self, idx): scan, light_idx, ref_view, src_views = self.metas[idx] if self.mode == "train" and self.robust_train: num_src_views = len(src_views) index = random.sample(range(num_src_views), self.n_views-1) view_ids = [ref_view] + [src_views[i] for i in index] scale_ratio = random.uniform(0.8, 1.25) else: view_ids = [ref_view] + src_views[:self.n_views-1] scale_ratio = 1 imgs = [] mask = None depth_values = None proj_matrices = [] for i, vid in enumerate(view_ids): # @Note image & cam if self.mode in ["train", "val"]: if self.data_scale == "mid": img_filename = os.path.join(self.root_dir, 'Rectified/{}_train/rect_{:0>3}_{}_r5000.png'.format(scan, vid+1, light_idx)) elif self.data_scale == "raw": img_filename = os.path.join(self.root_dir, 'Rectified_raw/{}/rect_{:0>3}_{}_r5000.png'.format(scan, vid + 1, light_idx)) proj_mat_filename = os.path.join(self.root_dir, 'Cameras/train/{:0>8}_cam.txt').format(vid) elif self.mode == "test": img_filename = os.path.join(self.root_dir, 'Rectified/{}/rect_{:0>3}_3_r5000.png'.format(scan, vid+1)) proj_mat_filename = os.path.join(self.root_dir, 'Cameras/{:0>8}_cam.txt'.format(vid)) img = self.read_img(img_filename) intrinsics, extrinsics, depth_min, depth_interval = self.read_cam_file(proj_mat_filename) if self.mode in ["train", "val"]: if self.data_scale == "raw": img = self.crop_img(img) intrinsics[:2, :] *= 2.0 if self.mode == "train" and self.robust_train: extrinsics[:3,3] *= scale_ratio elif self.mode == "test": img, intrinsics = self.scale_mvs_input(img, intrinsics, self.max_wh[0], self.max_wh[1]) imgs.append(img.transpose(2,0,1)) # reference view if i == 0: # @Note depth values diff = 0.5 if self.mode in ["test", "val"] else 0 depth_max = depth_interval * (self.total_depths - diff) + depth_min depth_values = np.array([depth_min * scale_ratio, depth_max * scale_ratio], dtype=np.float32) # @Note depth & mask if self.mode in ["train", "val"]: depth_filename_hr = os.path.join(self.root_dir, 'Depths_raw/{}/depth_map_{:0>4}.pfm'.format(scan, vid)) depth = self.read_depth_hr(depth_filename_hr, scale_ratio) mask_filename_hr = os.path.join(self.root_dir, 'Depths_raw/{}/depth_visual_{:0>4}.png'.format(scan, vid)) mask = self.read_mask_hr(mask_filename_hr) proj_mat = np.zeros(shape=(2, 4, 4), dtype=np.float32) proj_mat[0, :4, :4] = extrinsics proj_mat[1, :3, :3] = intrinsics proj_matrices.append(proj_mat) proj_matrices = np.stack(proj_matrices) intrinsics = np.stack(intrinsics) stage1_pjmats = proj_matrices.copy() stage1_pjmats[:, 1, :2, :] = proj_matrices[:, 1, :2, :] / 2.0 stage1_ins = intrinsics.copy() stage1_ins[:2, :] = intrinsics[:2, :] / 2.0 stage3_pjmats = proj_matrices.copy() stage3_pjmats[:, 1, :2, :] = proj_matrices[:, 1, :2, :] * 2 stage3_ins = intrinsics.copy() stage3_ins[:2, :] = intrinsics[:2, :] * 2.0 stage4_pjmats = proj_matrices.copy() stage4_pjmats[:, 1, :2, :] = proj_matrices[:, 1, :2, :] * 4 stage4_ins = intrinsics.copy() stage4_ins[:2, :] = intrinsics[:2, :] * 4.0 proj_matrices = { "stage1": stage1_pjmats, "stage2": proj_matrices, "stage3": stage3_pjmats, "stage4": stage4_pjmats } intrinsics_matrices = { "stage1": stage1_ins, "stage2": intrinsics, "stage3": stage3_ins, "stage4": stage4_ins } sample = { "imgs": imgs, "proj_matrices": proj_matrices, "intrinsics_matrices": intrinsics_matrices, "depth_values": depth_values } if self.mode in ["train", "val"]: sample["depth"] = depth sample["mask"] = mask elif self.mode == "test": sample["filename"] = scan + '/{}/' + '{:0>8}'.format(view_ids[0]) + "{}" return sample ================================================ FILE: datasets/evaluations/dtu_parallel/BaseEval2Obj_web.m ================================================ function BaseEval2Obj_web(BaseEval,method_string,outputPath) if(nargin<3) outputPath='./'; end % tresshold for coloring alpha channel in the range of 0-10 mm dist_tresshold=10; cSet=BaseEval.cSet; Qdata=BaseEval.Qdata; alpha=min(BaseEval.Ddata,dist_tresshold)/dist_tresshold; fid=fopen([outputPath method_string '2Stl_' num2str(cSet) ' .obj'],'w+'); for cP=1:size(Qdata,2) if(BaseEval.DataInMask(cP)) C=[1 0 0]*alpha(cP)+[1 1 1]*(1-alpha(cP)); %coloring from red to white in the range of 0-10 mm (0 to dist_tresshold) else C=[0 1 0]*alpha(cP)+[0 0 1]*(1-alpha(cP)); %green to blue for points outside the mask (which are not included in the analysis) end fprintf(fid,'v %f %f %f %f %f %f\n',[Qdata(1,cP) Qdata(2,cP) Qdata(3,cP) C(1) C(2) C(3)]); end fclose(fid); disp('Data2Stl saved as obj') Qstl=BaseEval.Qstl; fid=fopen([outputPath 'Stl2' method_string '_' num2str(cSet) '.obj'],'w+'); alpha=min(BaseEval.Dstl,dist_tresshold)/dist_tresshold; for cP=1:size(Qstl,2) if(BaseEval.StlAbovePlane(cP)) C=[1 0 0]*alpha(cP)+[1 1 1]*(1-alpha(cP)); %coloring from red to white in the range of 0-10 mm (0 to dist_tresshold) else C=[0 1 0]*alpha(cP)+[0 0 1]*(1-alpha(cP)); %green to blue for points below plane (which are not included in the analysis) end fprintf(fid,'v %f %f %f %f %f %f\n',[Qstl(1,cP) Qstl(2,cP) Qstl(3,cP) C(1) C(2) C(3)]); end fclose(fid); disp('Stl2Data saved as obj') ================================================ FILE: datasets/evaluations/dtu_parallel/BaseEvalMain_web.m ================================================ format compact representation_string='Points'; %mvs representation 'Points' or 'Surfaces' switch representation_string case 'Points' eval_string='_Eval_'; %results naming settings_string=''; end dst=0.2; %Min dist between points when reducing % start this evaluation cSet = str2num(thisset) %input data name DataInName = [plyPath sprintf('%s%03d.ply', lower(method_string), cSet)] %results name EvalName=[resultsPath method_string eval_string num2str(cSet) '.mat'] %check if file is already computed if(~exist(EvalName,'file')) disp(DataInName); time=clock;time(4:5), drawnow tic Mesh = plyread(DataInName); Qdata=[Mesh.vertex.x Mesh.vertex.y Mesh.vertex.z]'; toc BaseEval=PointCompareMain(cSet,Qdata,dst,dataPath); disp('Saving results'), drawnow toc save(EvalName,'BaseEval'); toc % write obj-file of evaluation % BaseEval2Obj_web(BaseEval,method_string, resultsPath) % toc time=clock;time(4:5), drawnow BaseEval.MaxDist=20; %outlier threshold of 20 mm BaseEval.FilteredDstl=BaseEval.Dstl(BaseEval.StlAbovePlane); %use only points that are above the plane BaseEval.FilteredDstl=BaseEval.FilteredDstl(BaseEval.FilteredDstl=Low(1) & Qfrom(2,:)>=Low(2) & Qfrom(3,:)>=Low(3) &... Qfrom(1,:)=Low(1) & Qto(2,:)>=Low(2) & Qto(3,:)>=Low(3) &... Qto(1,:)3)] end ================================================ FILE: datasets/evaluations/dtu_parallel/PointCompareMain.m ================================================ function BaseEval=PointCompareMain(cSet,Qdata,dst,dataPath) % evaluation function the calculates the distantes from the reference data (stl) to the evalution points (Qdata) and the % distances from the evaluation points to the reference tic % reduce points 0.2 mm neighbourhood density Qdata=reducePts_haa(Qdata,dst); toc StlInName=[dataPath '/Points/stl/stl' sprintf('%03d',cSet) '_total.ply']; StlMesh = plyread(StlInName); %STL points already reduced 0.2 mm neighbourhood density Qstl=[StlMesh.vertex.x StlMesh.vertex.y StlMesh.vertex.z]'; %Load Mask (ObsMask) and Bounding box (BB) and Resolution (Res) Margin=10; MaskName=[dataPath '/ObsMask/ObsMask' num2str(cSet) '_' num2str(Margin) '.mat']; load(MaskName) MaxDist=60; disp('Computing Data 2 Stl distances') Ddata = MaxDistCP(Qstl,Qdata,BB,MaxDist); toc disp('Computing Stl 2 Data distances') Dstl=MaxDistCP(Qdata,Qstl,BB,MaxDist); disp('Distances computed') toc %use mask %From Get mask - inverted & modified. One=ones(1,size(Qdata,2)); Qv=(Qdata-BB(1,:)'*One)/Res+1; Qv=round(Qv); Midx1=find(Qv(1,:)>0 & Qv(1,:)<=size(ObsMask,1) & Qv(2,:)>0 & Qv(2,:)<=size(ObsMask,2) & Qv(3,:)>0 & Qv(3,:)<=size(ObsMask,3)); MidxA=sub2ind(size(ObsMask),Qv(1,Midx1),Qv(2,Midx1),Qv(3,Midx1)); Midx2=find(ObsMask(MidxA)); BaseEval.DataInMask(1:size(Qv,2))=false; BaseEval.DataInMask(Midx1(Midx2))=true; %If Data is within the mask BaseEval.cSet=cSet; BaseEval.Margin=Margin; %Margin of masks BaseEval.dst=dst; %Min dist between points when reducing BaseEval.Qdata=Qdata; %Input data points BaseEval.Ddata=Ddata; %distance from data to stl BaseEval.Qstl=Qstl; %Input stl points BaseEval.Dstl=Dstl; %Distance from the stl to data load([dataPath '/ObsMask/Plane' num2str(cSet)],'P') BaseEval.GroundPlane=P; % Plane used to destinguise which Stl points are 'used' BaseEval.StlAbovePlane=(P'*[Qstl;ones(1,size(Qstl,2))])>0; %Is stl above 'ground plane' BaseEval.Time=clock; %Time when computation is finished ================================================ FILE: datasets/evaluations/dtu_parallel/plyread.m ================================================ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [Elements,varargout] = plyread(Path,Str) %PLYREAD Read a PLY 3D data file. % [DATA,COMMENTS] = PLYREAD(FILENAME) reads a version 1.0 PLY file % FILENAME and returns a structure DATA. The fields in this structure % are defined by the PLY header; each element type is a field and each % element property is a subfield. If the file contains any comments, % they are returned in a cell string array COMMENTS. % % [TRI,PTS] = PLYREAD(FILENAME,'tri') or % [TRI,PTS,DATA,COMMENTS] = PLYREAD(FILENAME,'tri') converts vertex % and face data into triangular connectivity and vertex arrays. The % mesh can then be displayed using the TRISURF command. % % Note: This function is slow for large mesh files (+50K faces), % especially when reading data with list type properties. % % Example: % [Tri,Pts] = PLYREAD('cow.ply','tri'); % trisurf(Tri,Pts(:,1),Pts(:,2),Pts(:,3)); % colormap(gray); axis equal; % % See also: PLYWRITE % Pascal Getreuer 2004 [fid,Msg] = fopen(Path,'rt'); % open file in read text mode if fid == -1, error(Msg); end Buf = fscanf(fid,'%s',1); if ~strcmp(Buf,'ply') fclose(fid); error('Not a PLY file.'); end %%% read header %%% Position = ftell(fid); Format = ''; NumComments = 0; Comments = {}; % for storing any file comments NumElements = 0; NumProperties = 0; Elements = []; % structure for holding the element data ElementCount = []; % number of each type of element in file PropertyTypes = []; % corresponding structure recording property types ElementNames = {}; % list of element names in the order they are stored in the file PropertyNames = []; % structure of lists of property names while 1 Buf = fgetl(fid); % read one line from file BufRem = Buf; Token = {}; Count = 0; while ~isempty(BufRem) % split line into tokens [tmp,BufRem] = strtok(BufRem); if ~isempty(tmp) Count = Count + 1; % count tokens Token{Count} = tmp; end end if Count % parse line switch lower(Token{1}) case 'format' % read data format if Count >= 2 Format = lower(Token{2}); if Count == 3 & ~strcmp(Token{3},'1.0') fclose(fid); error('Only PLY format version 1.0 supported.'); end end case 'comment' % read file comment NumComments = NumComments + 1; Comments{NumComments} = ''; for i = 2:Count Comments{NumComments} = [Comments{NumComments},Token{i},' ']; end case 'element' % element name if Count >= 3 if isfield(Elements,Token{2}) fclose(fid); error(['Duplicate element name, ''',Token{2},'''.']); end NumElements = NumElements + 1; NumProperties = 0; Elements = setfield(Elements,Token{2},[]); PropertyTypes = setfield(PropertyTypes,Token{2},[]); ElementNames{NumElements} = Token{2}; PropertyNames = setfield(PropertyNames,Token{2},{}); CurElement = Token{2}; ElementCount(NumElements) = str2double(Token{3}); if isnan(ElementCount(NumElements)) fclose(fid); error(['Bad element definition: ',Buf]); end else error(['Bad element definition: ',Buf]); end case 'property' % element property if ~isempty(CurElement) & Count >= 3 NumProperties = NumProperties + 1; eval(['tmp=isfield(Elements.',CurElement,',Token{Count});'],... 'fclose(fid);error([''Error reading property: '',Buf])'); if tmp error(['Duplicate property name, ''',CurElement,'.',Token{2},'''.']); end % add property subfield to Elements eval(['Elements.',CurElement,'.',Token{Count},'=[];'], ... 'fclose(fid);error([''Error reading property: '',Buf])'); % add property subfield to PropertyTypes and save type eval(['PropertyTypes.',CurElement,'.',Token{Count},'={Token{2:Count-1}};'], ... 'fclose(fid);error([''Error reading property: '',Buf])'); % record property name order eval(['PropertyNames.',CurElement,'{NumProperties}=Token{Count};'], ... 'fclose(fid);error([''Error reading property: '',Buf])'); else fclose(fid); if isempty(CurElement) error(['Property definition without element definition: ',Buf]); else error(['Bad property definition: ',Buf]); end end case 'end_header' % end of header, break from while loop break; end end end %%% set reading for specified data format %%% if isempty(Format) warning('Data format unspecified, assuming ASCII.'); Format = 'ascii'; end switch Format case 'ascii' Format = 0; case 'binary_little_endian' Format = 1; case 'binary_big_endian' Format = 2; otherwise fclose(fid); error(['Data format ''',Format,''' not supported.']); end if ~Format Buf = fscanf(fid,'%f'); % read the rest of the file as ASCII data BufOff = 1; else % reopen the file in read binary mode fclose(fid); if Format == 1 fid = fopen(Path,'r','ieee-le.l64'); % little endian else fid = fopen(Path,'r','ieee-be.l64'); % big endian end % find the end of the header again (using ftell on the old handle doesn't give the correct position) BufSize = 8192; Buf = [blanks(10),char(fread(fid,BufSize,'uchar')')]; i = []; tmp = -11; while isempty(i) i = findstr(Buf,['end_header',13,10]); % look for end_header + CR/LF i = [i,findstr(Buf,['end_header',10])]; % look for end_header + LF if isempty(i) tmp = tmp + BufSize; Buf = [Buf(BufSize+1:BufSize+10),char(fread(fid,BufSize,'uchar')')]; end end % seek to just after the line feed fseek(fid,i + tmp + 11 + (Buf(i + 10) == 13),-1); end %%% read element data %%% % PLY and MATLAB data types (for fread) PlyTypeNames = {'char','uchar','short','ushort','int','uint','float','double', ... 'char8','uchar8','short16','ushort16','int32','uint32','float32','double64'}; MatlabTypeNames = {'schar','uchar','int16','uint16','int32','uint32','single','double'}; SizeOf = [1,1,2,2,4,4,4,8]; % size in bytes of each type for i = 1:NumElements % get current element property information eval(['CurPropertyNames=PropertyNames.',ElementNames{i},';']); eval(['CurPropertyTypes=PropertyTypes.',ElementNames{i},';']); NumProperties = size(CurPropertyNames,2); % fprintf('Reading %s...\n',ElementNames{i}); if ~Format %%% read ASCII data %%% for j = 1:NumProperties Token = getfield(CurPropertyTypes,CurPropertyNames{j}); if strcmpi(Token{1},'list') Type(j) = 1; else Type(j) = 0; end end % parse buffer if ~any(Type) % no list types Data = reshape(Buf(BufOff:BufOff+ElementCount(i)*NumProperties-1),NumProperties,ElementCount(i))'; BufOff = BufOff + ElementCount(i)*NumProperties; else ListData = cell(NumProperties,1); for k = 1:NumProperties ListData{k} = cell(ElementCount(i),1); end % list type for j = 1:ElementCount(i) for k = 1:NumProperties if ~Type(k) Data(j,k) = Buf(BufOff); BufOff = BufOff + 1; else tmp = Buf(BufOff); ListData{k}{j} = Buf(BufOff+(1:tmp))'; BufOff = BufOff + tmp + 1; end end end end else %%% read binary data %%% % translate PLY data type names to MATLAB data type names ListFlag = 0; % = 1 if there is a list type SameFlag = 1; % = 1 if all types are the same for j = 1:NumProperties Token = getfield(CurPropertyTypes,CurPropertyNames{j}); if ~strcmp(Token{1},'list') % non-list type tmp = rem(strmatch(Token{1},PlyTypeNames,'exact')-1,8)+1; if ~isempty(tmp) TypeSize(j) = SizeOf(tmp); Type{j} = MatlabTypeNames{tmp}; TypeSize2(j) = 0; Type2{j} = ''; SameFlag = SameFlag & strcmp(Type{1},Type{j}); else fclose(fid); error(['Unknown property data type, ''',Token{1},''', in ', ... ElementNames{i},'.',CurPropertyNames{j},'.']); end else % list type if length(Token) == 3 ListFlag = 1; SameFlag = 0; tmp = rem(strmatch(Token{2},PlyTypeNames,'exact')-1,8)+1; tmp2 = rem(strmatch(Token{3},PlyTypeNames,'exact')-1,8)+1; if ~isempty(tmp) & ~isempty(tmp2) TypeSize(j) = SizeOf(tmp); Type{j} = MatlabTypeNames{tmp}; TypeSize2(j) = SizeOf(tmp2); Type2{j} = MatlabTypeNames{tmp2}; else fclose(fid); error(['Unknown property data type, ''list ',Token{2},' ',Token{3},''', in ', ... ElementNames{i},'.',CurPropertyNames{j},'.']); end else fclose(fid); error(['Invalid list syntax in ',ElementNames{i},'.',CurPropertyNames{j},'.']); end end end % read file if ~ListFlag if SameFlag % no list types, all the same type (fast) Data = fread(fid,[NumProperties,ElementCount(i)],Type{1})'; else % no list types, mixed type Data = zeros(ElementCount(i),NumProperties); for j = 1:ElementCount(i) for k = 1:NumProperties Data(j,k) = fread(fid,1,Type{k}); end end end else ListData = cell(NumProperties,1); for k = 1:NumProperties ListData{k} = cell(ElementCount(i),1); end if NumProperties == 1 BufSize = 512; SkipNum = 4; j = 0; % list type, one property (fast if lists are usually the same length) while j < ElementCount(i) Position = ftell(fid); % read in BufSize count values, assuming all counts = SkipNum [Buf,BufSize] = fread(fid,BufSize,Type{1},SkipNum*TypeSize2(1)); Miss = find(Buf ~= SkipNum); % find first count that is not SkipNum fseek(fid,Position + TypeSize(1),-1); % seek back to after first count if isempty(Miss) % all counts are SkipNum Buf = fread(fid,[SkipNum,BufSize],[int2str(SkipNum),'*',Type2{1}],TypeSize(1))'; fseek(fid,-TypeSize(1),0); % undo last skip for k = 1:BufSize ListData{1}{j+k} = Buf(k,:); end j = j + BufSize; BufSize = floor(1.5*BufSize); else if Miss(1) > 1 % some counts are SkipNum Buf2 = fread(fid,[SkipNum,Miss(1)-1],[int2str(SkipNum),'*',Type2{1}],TypeSize(1))'; for k = 1:Miss(1)-1 ListData{1}{j+k} = Buf2(k,:); end j = j + k; end % read in the list with the missed count SkipNum = Buf(Miss(1)); j = j + 1; ListData{1}{j} = fread(fid,[1,SkipNum],Type2{1}); BufSize = ceil(0.6*BufSize); end end else % list type(s), multiple properties (slow) Data = zeros(ElementCount(i),NumProperties); for j = 1:ElementCount(i) for k = 1:NumProperties if isempty(Type2{k}) Data(j,k) = fread(fid,1,Type{k}); else tmp = fread(fid,1,Type{k}); ListData{k}{j} = fread(fid,[1,tmp],Type2{k}); end end end end end end % put data into Elements structure for k = 1:NumProperties if (~Format & ~Type(k)) | (Format & isempty(Type2{k})) eval(['Elements.',ElementNames{i},'.',CurPropertyNames{k},'=Data(:,k);']); else eval(['Elements.',ElementNames{i},'.',CurPropertyNames{k},'=ListData{k};']); end end end clear Data ListData; fclose(fid); if (nargin > 1 & strcmpi(Str,'Tri')) | nargout > 2 % find vertex element field Name = {'vertex','Vertex','point','Point','pts','Pts'}; Names = []; for i = 1:length(Name) if any(strcmp(ElementNames,Name{i})) Names = getfield(PropertyNames,Name{i}); Name = Name{i}; break; end end if any(strcmp(Names,'x')) & any(strcmp(Names,'y')) & any(strcmp(Names,'z')) eval(['varargout{1}=[Elements.',Name,'.x,Elements.',Name,'.y,Elements.',Name,'.z];']); else varargout{1} = zeros(1,3); end varargout{2} = Elements; varargout{3} = Comments; Elements = []; % find face element field Name = {'face','Face','poly','Poly','tri','Tri'}; Names = []; for i = 1:length(Name) if any(strcmp(ElementNames,Name{i})) Names = getfield(PropertyNames,Name{i}); Name = Name{i}; break; end end if ~isempty(Names) % find vertex indices property subfield PropertyName = {'vertex_indices','vertex_indexes','vertex_index','indices','indexes'}; for i = 1:length(PropertyName) if any(strcmp(Names,PropertyName{i})) PropertyName = PropertyName{i}; break; end end if ~iscell(PropertyName) % convert face index lists to triangular connectivity eval(['FaceIndices=varargout{2}.',Name,'.',PropertyName,';']); N = length(FaceIndices); Elements = zeros(N*2,3); Extra = 0; for k = 1:N Elements(k,:) = FaceIndices{k}(1:3); for j = 4:length(FaceIndices{k}) Extra = Extra + 1; Elements(N + Extra,:) = [Elements(k,[1,j-1]),FaceIndices{k}(j)]; end end Elements = Elements(1:N+Extra,:) + 1; end end else varargout{1} = Comments; end ================================================ FILE: datasets/evaluations/dtu_parallel/reducePts_haa.m ================================================ function [ptsOut,indexSet] = reducePts_haa(pts, dst) %Reduces a point set, pts, in a stochastic manner, such that the minimum sdistance % between points is 'dst'. Writen by abd, edited by haa, then by raje nPoints=size(pts,2); indexSet=true(nPoints,1); RandOrd=randperm(nPoints); %tic NS = KDTreeSearcher(pts'); %toc % search the KNTree for close neighbours in a chunk-wise fashion to save memory if point cloud is really big Chunks=1:min(4e6,nPoints-1):nPoints; Chunks(end)=nPoints; for cChunk=1:(length(Chunks)-1) Range=Chunks(cChunk):Chunks(cChunk+1); idx = rangesearch(NS,pts(:,RandOrd(Range))',dst); for i = 1:size(idx,1) id =RandOrd(i-1+Chunks(cChunk)); if (indexSet(id)) indexSet(idx{i}) = 0; indexSet(id) = 1; end end end ptsOut = pts(:,indexSet); disp(['downsample factor: ' num2str(nPoints/sum(indexSet))]); ================================================ FILE: datasets/lists/blendedmvs/low_res_all.txt ================================================ 5c1f33f1d33e1f2e4aa6dda4 5bfe5ae0fe0ea555e6a969ca 5bff3c5cfe0ea555e6bcbf3a 58eaf1513353456af3a1682a 5bfc9d5aec61ca1dd69132a2 5bf18642c50e6f7f8bdbd492 5bf26cbbd43923194854b270 5bf17c0fd439231948355385 5be3ae47f44e235bdbbc9771 5be3a5fb8cfdd56947f6b67c 5bbb6eb2ea1cfa39f1af7e0c 5ba75d79d76ffa2c86cf2f05 5bb7a08aea1cfa39f1a947ab 5b864d850d072a699b32f4ae 5b6eff8b67b396324c5b2672 5b6e716d67b396324c2d77cb 5b69cc0cb44b61786eb959bf 5b62647143840965efc0dbde 5b60fa0c764f146feef84df0 5b558a928bbfb62204e77ba2 5b271079e0878c3816dacca4 5b08286b2775267d5b0634ba 5afacb69ab00705d0cefdd5b 5af28cea59bc705737003253 5af02e904c8216544b4ab5a2 5aa515e613d42d091d29d300 5c34529873a8df509ae57b58 5c34300a73a8df509add216d 5c1af2e2bee9a723c963d019 5c1892f726173c3a09ea9aeb 5c0d13b795da9479e12e2ee9 5c062d84a96e33018ff6f0a6 5bfd0f32ec61ca1dd69dc77b 5bf21799d43923194842c001 5bf3a82cd439231948877aed 5bf03590d4392319481971dc 5beb6e66abd34c35e18e66b9 5be883a4f98cee15019d5b83 5be47bf9b18881428d8fbc1d 5bcf979a6d5f586b95c258cd 5bce7ac9ca24970bce4934b6 5bb8a49aea1cfa39f1aa7f75 5b78e57afc8fcf6781d0c3ba 5b21e18c58e2823a67a10dd8 5b22269758e2823a67a3bd03 5b192eb2170cf166458ff886 5ae2e9c5fe405c5076abc6b2 5adc6bd52430a05ecb2ffb85 5ab8b8e029f5351f7f2ccf59 5abc2506b53b042ead637d86 5ab85f1dac4291329b17cb50 5a969eea91dfc339a9a3ad2c 5a8aa0fab18050187cbe060e 5a7d3db14989e929563eb153 5a69c47d0d5d0a7f3b2e9752 5a618c72784780334bc1972d 5a6464143d809f1d8208c43c 5a588a8193ac3d233f77fbca 5a57542f333d180827dfc132 5a572fd9fc597b0478a81d14 5a563183425d0f5186314855 5a4a38dad38c8a075495b5d2 5a48d4b2c7dab83a7d7b9851 5a489fb1c7dab83a7d7b1070 5a48ba95c7dab83a7d7b44ed 5a3ca9cb270f0e3f14d0eddb 5a3cb4e4270f0e3f14d12f43 5a3f4aba5889373fbbc5d3b5 5a0271884e62597cdee0d0eb 59e864b2a9e91f2c5529325f 599aa591d5b41f366fed0d58 59350ca084b7f26bf5ce6eb8 59338e76772c3e6384afbb15 5c20ca3a0843bc542d94e3e2 5c1dbf200843bc542d8ef8c4 5c1b1500bee9a723c96c3e78 5bea87f4abd34c35e1860ab5 5c2b3ed5e611832e8aed46bf 57f8d9bbe73f6760f10e916a 5bf7d63575c26f32dbf7413b 5be4ab93870d330ff2dce134 5bd43b4ba6b28b1ee86b92dd 5bccd6beca24970bce448134 5bc5f0e896b66a2cd8f9bd36 5b908d3dc6ab78485f3d24a9 5b2c67b5e0878c381608b8d8 5b4933abf2b5f44e95de482a 5b3b353d8d46a939f93524b9 5acf8ca0f3d8a750097e4b15 5ab8713ba3799a1d138bd69a 5aa235f64a17b335eeaf9609 5aa0f9d7a9efce63548c69a1 5a8315f624b8e938486e0bd8 5a48c4e9c7dab83a7d7b5cc7 59ecfd02e225f6492d20fcc9 59f87d0bfa6280566fb38c9a 59f363a8b45be22330016cad 59f70ab1e5c5d366af29bf3e 59e75a2ca9e91f2c5526005d 5947719bf1b45630bd096665 5947b62af1b45630bd0c2a02 59056e6760bb961de55f3501 58f7f7299f5b5647873cb110 58cf4771d0f5fb221defe6da 58d36897f387231e6c929903 58c4bb4f4a69c55606122be4 5b7a3890fc8fcf6781e2593a 5c189f2326173c3a09ed7ef3 5b950c71608de421b1e7318f 5a6400933d809f1d8200af15 59d2657f82ca7774b1ec081d 5ba19a8a360c7c30c1c169df 59817e4a1bd4b175e7038d19 ================================================ FILE: datasets/lists/blendedmvs/val.txt ================================================ 5b7a3890fc8fcf6781e2593a 5c189f2326173c3a09ed7ef3 5b950c71608de421b1e7318f 5a6400933d809f1d8200af15 59d2657f82ca7774b1ec081d 5ba19a8a360c7c30c1c169df 59817e4a1bd4b175e7038d19 ================================================ FILE: datasets/lists/dtu/test.txt ================================================ scan1 scan4 scan9 scan10 scan11 scan12 scan13 scan15 scan23 scan24 scan29 scan32 scan33 scan34 scan48 scan49 scan62 scan75 scan77 scan110 scan114 scan118 ================================================ FILE: datasets/lists/dtu/train.txt ================================================ scan2 scan6 scan7 scan8 scan14 scan16 scan18 scan19 scan20 scan22 scan30 scan31 scan36 scan39 scan41 scan42 scan44 scan45 scan46 scan47 scan50 scan51 scan52 scan53 scan55 scan57 scan58 scan60 scan61 scan63 scan64 scan65 scan68 scan69 scan70 scan71 scan72 scan74 scan76 scan83 scan84 scan85 scan87 scan88 scan89 scan90 scan91 scan92 scan93 scan94 scan95 scan96 scan97 scan98 scan99 scan100 scan101 scan102 scan103 scan104 scan105 scan107 scan108 scan109 scan111 scan112 scan113 scan115 scan116 scan119 scan120 scan121 scan122 scan123 scan124 scan125 scan126 scan127 scan128 ================================================ FILE: datasets/lists/dtu/val.txt ================================================ scan3 scan5 scan17 scan21 scan28 scan35 scan37 scan38 scan40 scan43 scan56 scan59 scan66 scan67 scan82 scan86 scan106 scan117 ================================================ FILE: datasets/lists/tnt/advanced.txt ================================================ Auditorium Ballroom Courtroom Museum Palace Temple ================================================ FILE: datasets/lists/tnt/intermediate.txt ================================================ Family Horse Francis Lighthouse M60 Panther Playground Train ================================================ FILE: datasets/tnt.py ================================================ # -*- coding: utf-8 -*- # @Description: Data preprocessing and organization for Tanks and Temples dataset. # @Author: Zhe Zhang (doublez@stu.pku.edu.cn) # @Affiliation: Peking University (PKU) # @LastEditDate: 2023-09-07 import os import cv2 import numpy as np from PIL import Image from torch.utils.data import Dataset from datasets.data_io import * class TNTDataset(Dataset): def __init__(self, root_dir, list_file, split, n_views, **kwargs): super(TNTDataset, self).__init__() self.root_dir = root_dir self.list_file = list_file self.split = split self.n_views = n_views self.cam_mode = kwargs.get("cam_mode", "origin") # origin / short_range if self.cam_mode == 'short_range': assert self.split == "intermediate" self.img_mode = kwargs.get("img_mode", "resize") # resize / crop self.total_depths = 192 self.depth_interval_table = { # intermediate 'Family': 2.5e-3, 'Francis': 1e-2, 'Horse': 1.5e-3, 'Lighthouse': 1.5e-2, 'M60': 5e-3, 'Panther': 5e-3, 'Playground': 7e-3, 'Train': 5e-3, # advanced 'Auditorium': 3e-2, 'Ballroom': 2e-2, 'Courtroom': 2e-2, 'Museum': 2e-2, 'Palace': 1e-2, 'Temple': 1e-2 } self.img_wh = kwargs.get("img_wh", (-1, 1024)) self.metas = self.build_metas() def build_metas(self): metas = [] with open(os.path.join(self.list_file)) as f: scans = [line.rstrip() for line in f.readlines()] for scan in scans: with open(os.path.join(self.root_dir, self.split, scan, 'pair.txt')) as f: num_viewpoint = int(f.readline()) for view_idx in range(num_viewpoint): ref_view = int(f.readline().rstrip()) src_views = [int(x) for x in f.readline().rstrip().split()[1::2]] if len(src_views) != 0: metas += [(scan, -1, ref_view, src_views)] return metas def read_cam_file(self, filename): with open(filename) as f: lines = [line.rstrip() for line in f.readlines()] # extrinsics: line [1,5), 4x4 matrix extrinsics = np.fromstring(' '.join(lines[1:5]), dtype=np.float32, sep=' ') extrinsics = extrinsics.reshape((4, 4)) # intrinsics: line [7-10), 3x3 matrix intrinsics = np.fromstring(' '.join(lines[7:10]), dtype=np.float32, sep=' ') intrinsics = intrinsics.reshape((3, 3)) depth_min = float(lines[11].split()[0]) depth_max = float(lines[11].split()[-1]) return intrinsics, extrinsics, depth_min, depth_max def read_img(self, filename): img = Image.open(filename) np_img = np.array(img, dtype=np.float32) / 255. return np_img def scale_tnt_input(self, intrinsics, img): if self.img_mode == "crop": intrinsics[1,2] = intrinsics[1,2] - 28 # 1080 -> 1024 img = img[28:1080-28, :, :] elif self.img_mode == "resize": height, width = img.shape[:2] max_w, max_h = self.img_wh[0], self.img_wh[1] if max_w == -1: max_w = width img = cv2.resize(img, (max_w, max_h)) scale_w = 1.0 * max_w / width intrinsics[0, :] *= scale_w scale_h = 1.0 * max_h / height intrinsics[1, :] *= scale_h return intrinsics, img def __len__(self): return len(self.metas) def __getitem__(self, idx): scan, _, ref_view, src_views = self.metas[idx] view_ids = [ref_view] + src_views[:self.n_views-1] imgs = [] depth_min = None depth_max = None proj_matrices_0 = [] proj_matrices_1 = [] proj_matrices_2 = [] proj_matrices_3 = [] for i, vid in enumerate(view_ids): img_filename = os.path.join(self.root_dir, self.split, scan, f'images/{vid:08d}.jpg') if self.cam_mode == 'short_range': # can only use for Intermediate proj_mat_filename = os.path.join(self.root_dir, self.split, scan, f'cams_{scan.lower()}/{vid:08d}_cam.txt') elif self.cam_mode == 'origin': proj_mat_filename = os.path.join(self.root_dir, self.split, scan, f'cams/{vid:08d}_cam.txt') img = self.read_img(img_filename) intrinsics, extrinsics, depth_min_, depth_max_ = self.read_cam_file(proj_mat_filename) intrinsics, img = self.scale_tnt_input(intrinsics, img) imgs.append(img.transpose(2,0,1)) proj_mat_0 = np.zeros(shape=(2, 4, 4), dtype=np.float32) proj_mat_1 = np.zeros(shape=(2, 4, 4), dtype=np.float32) proj_mat_2 = np.zeros(shape=(2, 4, 4), dtype=np.float32) proj_mat_3 = np.zeros(shape=(2, 4, 4), dtype=np.float32) intrinsics[:2,:] *= 0.125 proj_mat_0[0,:4,:4] = extrinsics.copy() proj_mat_0[1,:3,:3] = intrinsics.copy() int_mat_0 = intrinsics.copy() intrinsics[:2,:] *= 2 proj_mat_1[0,:4,:4] = extrinsics.copy() proj_mat_1[1,:3,:3] = intrinsics.copy() int_mat_1 = intrinsics.copy() intrinsics[:2,:] *= 2 proj_mat_2[0,:4,:4] = extrinsics.copy() proj_mat_2[1,:3,:3] = intrinsics.copy() int_mat_2 = intrinsics.copy() intrinsics[:2,:] *= 2 proj_mat_3[0,:4,:4] = extrinsics.copy() proj_mat_3[1,:3,:3] = intrinsics.copy() int_mat_3 = intrinsics.copy() proj_matrices_0.append(proj_mat_0) proj_matrices_1.append(proj_mat_1) proj_matrices_2.append(proj_mat_2) proj_matrices_3.append(proj_mat_3) # reference view if i == 0: depth_min = depth_min_ if self.cam_mode == 'short_range': depth_max = depth_min + self.total_depths * self.depth_interval_table[scan] elif self.cam_mode == 'origin': depth_max = depth_max_ proj={} proj['stage1'] = np.stack(proj_matrices_0) proj['stage2'] = np.stack(proj_matrices_1) proj['stage3'] = np.stack(proj_matrices_2) proj['stage4'] = np.stack(proj_matrices_3) intrinsics_matrices = { "stage1": int_mat_0, "stage2": int_mat_1, "stage3": int_mat_2, "stage4": int_mat_3 } sample = { "imgs": imgs, "proj_matrices": proj, "intrinsics_matrices": intrinsics_matrices, "depth_values": np.array([depth_min, depth_max], dtype=np.float32), "filename": scan + '/{}/' + '{:0>8}'.format(view_ids[0]) + "{}" } return sample ================================================ FILE: fusions/dtu/_open3d.py ================================================ # -*- coding: utf-8 -*- # @Description: Point cloud fusion strategy for DTU dataset based on Open3D Library. # @Author: Zhe Zhang (doublez@stu.pku.edu.cn) # @Affiliation: Peking University (PKU) # @LastEditDate: 2023-09-07 import torch import numpy as np import sys import argparse import errno, os import glob import os.path as osp import re import cv2 from PIL import Image import gc import open3d as o3d import torch import torch.nn.functional as F import numpy as np parser = argparse.ArgumentParser(description='Depth fusion with consistency check.') parser.add_argument('--root_path', type=str, default='[/path/to/]dtu-test-1200') parser.add_argument('--depth_path', type=str, default='') parser.add_argument('--data_list', type=str, default='') parser.add_argument('--ply_path', type=str, default='') parser.add_argument('--dist_thresh', type=float, default=0.001) parser.add_argument('--prob_thresh', type=float, default=0.6) parser.add_argument('--num_consist', type=int, default=10) parser.add_argument('--device', type=str, default='cpu') args = parser.parse_args() def homo_warping(src_fea, src_proj, ref_proj, depth_values): # src_fea: [B, C, H, W] # src_proj: [B, 4, 4] # ref_proj: [B, 4, 4] # depth_values: [B, Ndepth] o [B, Ndepth, H, W] # out: [B, C, Ndepth, H, W] batch, channels = src_fea.shape[0], src_fea.shape[1] height, width = src_fea.shape[2], src_fea.shape[3] with torch.no_grad(): proj = torch.matmul(src_proj, torch.inverse(ref_proj)) rot = proj[:, :3, :3] # [B,3,3] trans = proj[:, :3, 3:4] # [B,3,1] y, x = torch.meshgrid([torch.arange(0, height, dtype=torch.float32, device=src_fea.device), torch.arange(0, width, dtype=torch.float32, device=src_fea.device)]) y, x = y.contiguous(), x.contiguous() y, x = y.view(height * width), x.view(height * width) xyz = torch.stack((x, y, torch.ones_like(x))) # [3, H*W] xyz = torch.unsqueeze(xyz, 0).repeat(batch, 1, 1) # [B, 3, H*W] rot_xyz = torch.matmul(rot, xyz) # [B, 3, H*W] rot_depth_xyz = rot_xyz.unsqueeze(2) * depth_values.view(-1, 1, 1, height*width) # [B, 3, 1, H*W] proj_xyz = rot_depth_xyz + trans.view(batch, 3, 1, 1) # [B, 3, Ndepth, H*W] proj_xy = proj_xyz[:, :2, :, :] / proj_xyz[:, 2:3, :, :] # [B, 2, Ndepth, H*W] proj_x_normalized = proj_xy[:, 0, :, :] / ((width - 1) / 2) - 1 proj_y_normalized = proj_xy[:, 1, :, :] / ((height - 1) / 2) - 1 proj_xy = torch.stack((proj_x_normalized, proj_y_normalized), dim=3) # [B, Ndepth, H*W, 2] grid = proj_xy warped_src_fea = F.grid_sample(src_fea, grid.view(batch, height, width, 2), mode='bilinear', padding_mode='zeros') warped_src_fea = warped_src_fea.view(batch, channels, height, width) return warped_src_fea def generate_points_from_depth(depth, proj): ''' :param depth: (B, 1, H, W) :param proj: (B, 4, 4) :return: point_cloud (B, 3, H, W) ''' batch, height, width = depth.shape[0], depth.shape[2], depth.shape[3] inv_proj = torch.inverse(proj) rot = inv_proj[:, :3, :3] # [B,3,3] trans = inv_proj[:, :3, 3:4] # [B,3,1] y, x = torch.meshgrid([torch.arange(0, height, dtype=torch.float32, device=depth.device), torch.arange(0, width, dtype=torch.float32, device=depth.device)]) y, x = y.contiguous(), x.contiguous() y, x = y.view(height * width), x.view(height * width) xyz = torch.stack((x, y, torch.ones_like(x))) # [3, H*W] xyz = torch.unsqueeze(xyz, 0).repeat(batch, 1, 1) # [B, 3, H*W] rot_xyz = torch.matmul(rot, xyz) # [B, 3, H*W] rot_depth_xyz = rot_xyz * depth.view(batch, 1, -1) proj_xyz = rot_depth_xyz + trans.view(batch, 3, 1) # [B, 3, H*W] proj_xyz = proj_xyz.view(batch, 3, height, width) return proj_xyz def mkdir_p(path): try: os.makedirs(path) except OSError as exc: if exc.errno == errno.EEXIST and os.path.isdir(path): pass else: raise def read_pfm(filename): file = open(filename, 'rb') color = None width = None height = None scale = None endian = None header = file.readline().decode('utf-8').rstrip() if header == 'PF': color = True elif header == 'Pf': color = False else: raise Exception('Not a PFM file.') dim_match = re.match(r'^(\d+)\s(\d+)\s$', file.readline().decode('utf-8')) if dim_match: width, height = map(int, dim_match.groups()) else: raise Exception('Malformed PFM header.') scale = float(file.readline().rstrip()) if scale < 0: # little-endian endian = '<' scale = -scale else: endian = '>' # big-endian data = np.fromfile(file, endian + 'f') shape = (height, width, 3) if color else (height, width) data = np.reshape(data, shape) data = np.flipud(data) file.close() return data, scale def write_pfm(file, image, scale=1): file = open(file, 'wb') color = None if image.dtype.name != 'float32': raise Exception('Image dtype must be float32.') image = np.flipud(image) if len(image.shape) == 3 and image.shape[2] == 3: # color image color = True elif len(image.shape) == 2 or len(image.shape) == 3 and image.shape[2] == 1: # greyscale color = False else: raise Exception('Image must have H x W x 3, H x W x 1 or H x W dimensions.') file.write('PF\n'.encode() if color else 'Pf\n'.encode()) file.write('%d %d\n'.encode() % (image.shape[1], image.shape[0])) endian = image.dtype.byteorder if endian == '<' or endian == '=' and sys.byteorder == 'little': scale = -scale file.write('%f\n'.encode() % scale) image_string = image.tostring() file.write(image_string) file.close() def write_ply(file, points): pcd = o3d.geometry.PointCloud() pcd.points = o3d.utility.Vector3dVector(points[:, :3]) pcd.colors = o3d.utility.Vector3dVector(points[:, 3:] / 255.) o3d.io.write_point_cloud(file, pcd, write_ascii=False) def filter_depth(ref_depth, src_depths, ref_proj, src_projs): ''' :param ref_depth: (1, 1, H, W) :param src_depths: (B, 1, H, W) :param ref_proj: (1, 4, 4) :param src_proj: (B, 4, 4) :return: ref_pc: (1, 3, H, W), aligned_pcs: (B, 3, H, W), dist: (B, 1, H, W) ''' ref_pc = generate_points_from_depth(ref_depth, ref_proj) src_pcs = generate_points_from_depth(src_depths, src_projs) aligned_pcs = homo_warping(src_pcs, src_projs, ref_proj, ref_depth) x_2 = (ref_pc[:, 0] - aligned_pcs[:, 0])**2 y_2 = (ref_pc[:, 1] - aligned_pcs[:, 1])**2 z_2 = (ref_pc[:, 2] - aligned_pcs[:, 2])**2 dist = torch.sqrt(x_2 + y_2 + z_2).unsqueeze(1) return ref_pc, aligned_pcs, dist def parse_cameras(path): cam_txt = open(path).readlines() f = lambda xs: list(map(lambda x: list(map(float, x.strip().split())), xs)) extr_mat = f(cam_txt[1:5]) intr_mat = f(cam_txt[7:10]) extr_mat = np.array(extr_mat, np.float32) intr_mat = np.array(intr_mat, np.float32) return extr_mat, intr_mat def load_data(root_path, depth_path, scene_name, thresh): depths = [] projs = [] rgbs = [] for view in range(49): img_filename = "{}/{}/images/{:08d}.jpg".format(depth_path, scene_name, view) cam_filename = "{}/{}/cams/{:08d}_cam.txt".format(depth_path, scene_name, view) depth_filename = "{}/{}/depth_est/{:08d}.pfm".format(depth_path, scene_name, view) confidence_filename = "{}/{}/confidence/{:08d}.pfm".format(depth_path, scene_name, view) extr_mat, intr_mat = parse_cameras(cam_filename) proj_mat = np.eye(4) proj_mat[:3, :4] = np.dot(intr_mat[:3, :3], extr_mat[:3, :4]) projs.append(torch.from_numpy(proj_mat)) dep_map, _ = read_pfm(depth_filename) h, w = dep_map.shape conf_map, _ = read_pfm(confidence_filename) conf_map = cv2.resize(conf_map, (w, h), interpolation=cv2.INTER_LINEAR) dep_map = dep_map * (conf_map>thresh).astype(np.float32) depths.append(torch.from_numpy(dep_map).unsqueeze(0)) rgb = np.array(Image.open(img_filename)) rgbs.append(rgb) depths = torch.stack(depths).float() projs = torch.stack(projs).float() if args.device == 'cuda' and torch.cuda.is_available(): depths = depths.cuda() projs = projs.cuda() return depths, projs, rgbs def extract_points(pc, mask, rgb): pc = pc.cpu().numpy() mask = mask.cpu().numpy() mask = np.reshape(mask, (-1,)) pc = np.reshape(pc, (-1, 3)) rgb = np.reshape(rgb, (-1, 3)) points = pc[np.where(mask)] colors = rgb[np.where(mask)] points_with_color = np.concatenate([points, colors], axis=1) return points_with_color def open3d_filter(): with torch.no_grad(): mkdir_p(args.ply_path) all_scenes = open(args.data_list, 'r').readlines() all_scenes = list(map(str.strip, all_scenes)) for i, scene in enumerate(all_scenes): print("{}/{} {}:".format(i, len(all_scenes), scene), '------------------------') depths, projs, rgbs = load_data(args.root_path, args.depth_path, scene, args.prob_thresh) tot_frame = depths.shape[0] height, width = depths.shape[2], depths.shape[3] points = [] print('Scene: {} total: {} frames'.format(scene, tot_frame)) for i in range(tot_frame): pc_buff = torch.zeros((3, height, width), device=depths.device, dtype=depths.dtype) val_cnt = torch.zeros((1, height, width), device=depths.device, dtype=depths.dtype) j = 0 batch_size = 20 while True: ref_pc, pcs, dist = filter_depth(ref_depth=depths[i:i+1], src_depths=depths[j:min(j+batch_size, tot_frame)], ref_proj=projs[i:i+1], src_projs=projs[j:min(j+batch_size, tot_frame)]) masks = (dist < args.dist_thresh).float() masked_pc = pcs * masks pc_buff += masked_pc.sum(dim=0, keepdim=False) val_cnt += masks.sum(dim=0, keepdim=False) j += batch_size if j >= tot_frame: break final_mask = (val_cnt >= args.num_consist).squeeze(0) avg_points = torch.div(pc_buff, val_cnt).permute(1, 2, 0) final_pc = extract_points(avg_points, final_mask, rgbs[i]) points.append(final_pc) if i==0 or i==tot_frame-1: print('Processing {} {}/{} ...'.format(scene, i+1, tot_frame)) ply_id = int(scene[4:]) write_ply('{}/mvsnet{:03d}.ply'.format(args.ply_path, ply_id), np.concatenate(points, axis=0)) del points, depths, rgbs, projs gc.collect() print('Save {}/mvsnet{:03d}.ply successful.'.format(args.ply_path, ply_id)) if __name__ == '__main__': open3d_filter() ================================================ FILE: fusions/dtu/gipuma.py ================================================ # -*- coding: utf-8 -*- # @Description: Point cloud fusion strategy for DTU dataset: Gipuma (fusibile). # Refer to: https://github.com/YoYo000/MVSNet/blob/master/mvsnet/depthfusion.py # @Author: Zhe Zhang (doublez@stu.pku.edu.cn) # @Affiliation: Peking University (PKU) # @LastEditDate: 2023-09-07 from __future__ import print_function import os, re, sys, shutil from struct import * import numpy as np import argparse import cv2 from tensorflow.python.lib.io import file_io parser = argparse.ArgumentParser() parser.add_argument('--root_dir', type=str, default='[/path/to]/dtu-test-1200', help='root directory of dtu dataset') parser.add_argument('--list_file', type=str, default='datasets/lists/dtu/train.txt', help='file contains the scans list') parser.add_argument('--depth_folder', type=str, default = './outputs/') parser.add_argument('--out_folder', type=str, default = 'fusibile_fused') parser.add_argument('--plydir', type=str, default='') parser.add_argument('--quandir', type=str, default='') parser.add_argument('--fusibile_exe_path', type=str, default = 'fusion/fusibile') parser.add_argument('--prob_threshold', type=float, default = '0.8') parser.add_argument('--disp_threshold', type=float, default = '0.13') parser.add_argument('--num_consistent', type=float, default = '3') parser.add_argument('--downsample_factor', type=int, default='1') args = parser.parse_args() # preprocess ==================================== def load_cam(file, interval_scale=1): """ read camera txt file """ cam = np.zeros((2, 4, 4)) words = file.read().split() # read extrinsic for i in range(0, 4): for j in range(0, 4): extrinsic_index = 4 * i + j + 1 cam[0][i][j] = words[extrinsic_index] # read intrinsic for i in range(0, 3): for j in range(0, 3): intrinsic_index = 3 * i + j + 18 cam[1][i][j] = words[intrinsic_index] if len(words) == 29: cam[1][3][0] = words[27] cam[1][3][1] = float(words[28]) * interval_scale cam[1][3][2] = 1100 cam[1][3][3] = cam[1][3][0] + cam[1][3][1] * cam[1][3][2] elif len(words) == 30: cam[1][3][0] = words[27] cam[1][3][1] = float(words[28]) * interval_scale cam[1][3][2] = words[29] cam[1][3][3] = cam[1][3][0] + cam[1][3][1] * cam[1][3][2] elif len(words) == 31: cam[1][3][0] = words[27] cam[1][3][1] = float(words[28]) * interval_scale cam[1][3][2] = words[29] cam[1][3][3] = words[30] else: cam[1][3][0] = 0 cam[1][3][1] = 0 cam[1][3][2] = 0 cam[1][3][3] = 0 return cam def load_pfm(file): color = None width = None height = None scale = None data_type = None header = file.readline().decode('UTF-8').rstrip() if header == 'PF': color = True elif header == 'Pf': color = False else: raise Exception('Not a PFM file.') dim_match = re.match(r'^(\d+)\s(\d+)\s$', file.readline().decode('UTF-8')) if dim_match: width, height = map(int, dim_match.groups()) else: raise Exception('Malformed PFM header.') # scale = float(file.readline().rstrip()) scale = float((file.readline()).decode('UTF-8').rstrip()) if scale < 0: # little-endian data_type = ' 0, 1, 0)) mask_image = np.reshape(mask_image, (image_shape[0], image_shape[1], 1)) mask_image = np.tile(mask_image, [1, 1, 3]) mask_image = np.float32(mask_image) normal_image = np.multiply(normal_image, mask_image) normal_image = np.float32(normal_image) write_gipuma_dmb(out_normal_path, normal_image) return def mvsnet_to_gipuma(scan_folder, scan, root_dir, gipuma_point_folder): image_folder = os.path.join(root_dir, 'Rectified', scan) cam_folder = os.path.join(root_dir, 'Cameras') depth_folder = os.path.join(scan_folder, 'depth_est') gipuma_cam_folder = os.path.join(gipuma_point_folder, 'cams') gipuma_image_folder = os.path.join(gipuma_point_folder, 'images') if not os.path.isdir(gipuma_point_folder): os.mkdir(gipuma_point_folder) if not os.path.isdir(gipuma_cam_folder): os.mkdir(gipuma_cam_folder) if not os.path.isdir(gipuma_image_folder): os.mkdir(gipuma_image_folder) # convert cameras for view in range(0,49): in_cam_file = os.path.join(cam_folder, "{:08d}_cam.txt".format(view)) out_cam_file = os.path.join(gipuma_cam_folder, "{:08d}.png.P".format(view)) mvsnet_to_gipuma_cam(in_cam_file, out_cam_file) # copy images to gipuma image folder for view in range(0,49): in_image_file = os.path.join(image_folder, "rect_{:03d}_3_r5000.png".format(view+1))# Our image start from 1 out_image_file = os.path.join(gipuma_image_folder, "{:08d}.png".format(view)) # shutil.copy(in_image_file, out_image_file) in_image = cv2.imread(in_image_file) out_image = cv2.resize(in_image, None, fx=1.0/args.downsample_factor, fy=1.0/args.downsample_factor, interpolation=cv2.INTER_LINEAR) cv2.imwrite(out_image_file, out_image) # convert depth maps and fake normal maps gipuma_prefix = '2333__' for view in range(0,49): sub_depth_folder = os.path.join(gipuma_point_folder, gipuma_prefix+"{:08d}".format(view)) if not os.path.isdir(sub_depth_folder): os.mkdir(sub_depth_folder) in_depth_pfm = os.path.join(depth_folder, "{:08d}_prob_filtered.pfm".format(view)) out_depth_dmb = os.path.join(sub_depth_folder, 'disp.dmb') fake_normal_dmb = os.path.join(sub_depth_folder, 'normals.dmb') mvsnet_to_gipuma_dmb(in_depth_pfm, out_depth_dmb) fake_gipuma_normal(out_depth_dmb, fake_normal_dmb) def probability_filter(scan_folder, prob_threshold): depth_folder = os.path.join(scan_folder, 'depth_est') prob_folder = os.path.join(scan_folder, 'confidence') # convert cameras for view in range(0,49): init_depth_map_path = os.path.join(depth_folder, "{:08d}.pfm".format(view)) # New dataset outputs depth start from 0. prob_map_path = os.path.join(prob_folder, "{:08d}.pfm".format(view)) # Same as above out_depth_map_path = os.path.join(depth_folder, "{:08d}_prob_filtered.pfm".format(view)) # Gipuma start from 0 depth_map = load_pfm(open(init_depth_map_path)) prob_map = load_pfm(open(prob_map_path)) depth_map[prob_map < prob_threshold] = 0 write_pfm(out_depth_map_path, depth_map) def depth_map_fusion(point_folder, fusibile_exe_path, disp_thresh, num_consistent): cam_folder = os.path.join(point_folder, 'cams') image_folder = os.path.join(point_folder, 'images') depth_min = 0.001 depth_max = 100000 normal_thresh = 360 cmd = fusibile_exe_path cmd = cmd + ' -input_folder ' + point_folder + '/' cmd = cmd + ' -p_folder ' + cam_folder + '/' cmd = cmd + ' -images_folder ' + image_folder + '/' cmd = cmd + ' --depth_min=' + str(depth_min) cmd = cmd + ' --depth_max=' + str(depth_max) cmd = cmd + ' --normal_thresh=' + str(normal_thresh) cmd = cmd + ' --disp_thresh=' + str(disp_thresh) cmd = cmd + ' --num_consistent=' + str(num_consistent) print (cmd) os.system(cmd) return def collectPly(point_folder, scan_id): model_name = 'final3d_model.ply' model_dir = [item for item in os.listdir(point_folder) if item.startswith("consistencyCheck")][-1] old = os.path.join(point_folder, model_dir, model_name) fresh = os.path.join(args.plydir, "mvsnet") + scan_id.zfill(3) + ".ply" shutil.move(old, fresh) if __name__ == '__main__': root_dir = args.root_dir depth_folder = args.depth_folder out_folder = args.out_folder fusibile_exe_path = args.fusibile_exe_path prob_threshold = args.prob_threshold disp_threshold = args.disp_threshold num_consistent = args.num_consistent # Read test list testlist = args.list_file with open(testlist) as f: scans = f.readlines() scans = [line.rstrip() for line in scans] print("Start Gipuma(GPU) fusion!") if not os.path.isdir(args.plydir): os.mkdir(args.plydir) # Fusion for i, scan in enumerate(scans): print("{}/{} {}:".format(i, len(scans), scan), '------------------------') scan_folder = os.path.join(depth_folder, scan) fusibile_workspace = os.path.join(depth_folder, out_folder, scan) if not os.path.isdir(os.path.join(depth_folder, out_folder)): os.mkdir(os.path.join(depth_folder, out_folder)) if not os.path.isdir(fusibile_workspace): os.mkdir(fusibile_workspace) # probability filtering print ('filter depth map with probability map') probability_filter(scan_folder, prob_threshold) # convert to gipuma format print ('Convert mvsnet output to gipuma input') mvsnet_to_gipuma(scan_folder, scan, root_dir, fusibile_workspace) # depth map fusion with gipuma print ('Run depth map fusion & filter') depth_map_fusion(fusibile_workspace, fusibile_exe_path, disp_threshold, num_consistent) # collect .ply results to summary folder print('Collect {} ply'.format(scan)) collectPly(fusibile_workspace, scan[4:]) print("Gipuma(GPU) fusion done!") shutil.rmtree(os.path.join(depth_folder, out_folder)) print("fusibile_fused remove done!") ================================================ FILE: fusions/dtu/pcd.py ================================================ # -*- coding: utf-8 -*- # @Description: Point cloud fusion strategy for DTU dataset: Basic PCD. # Refer to: https://github.com/xy-guo/MVSNet_pytorch/blob/master/eval.py # @Author: Zhe Zhang (doublez@stu.pku.edu.cn) # @Affiliation: Peking University (PKU) # @LastEditDate: 2023-09-07 import argparse, os, sys, cv2, re, logging, time import numpy as np from plyfile import PlyData, PlyElement from PIL import Image from multiprocessing import Pool from functools import partial import signal parser = argparse.ArgumentParser(description='filter, and fuse') parser.add_argument('--testpath', default='[/path/to]/dtu-test-1200', help='testing data dir for some scenes') parser.add_argument('--testlist', default="datasets/lists/dtu/test.txt", help='testing scene list') parser.add_argument('--outdir', default='./outputs/[exp_name]', help='output dir') parser.add_argument('--logdir', default='./checkpoints/debug', help='the directory to save checkpoints/logs') parser.add_argument('--nolog', action='store_true', help='do not logging into .log file') parser.add_argument('--plydir', default='./outputs/[exp_name]/pcd_fusion_plys/', help='output dir') parser.add_argument('--num_worker', type=int, default=4, help='depth_filer worker') parser.add_argument('--conf', type=float, default=0.9, help='prob confidence') parser.add_argument('--thres_view', type=int, default=5, help='threshold of num view') args = parser.parse_args() def read_pfm(filename): file = open(filename, 'rb') color = None width = None height = None scale = None endian = None header = file.readline().decode('utf-8').rstrip() if header == 'PF': color = True elif header == 'Pf': color = False else: raise Exception('Not a PFM file.') dim_match = re.match(r'^(\d+)\s(\d+)\s$', file.readline().decode('utf-8')) if dim_match: width, height = map(int, dim_match.groups()) else: raise Exception('Malformed PFM header.') scale = float(file.readline().rstrip()) if scale < 0: # little-endian endian = '<' scale = -scale else: endian = '>' # big-endian data = np.fromfile(file, endian + 'f') shape = (height, width, 3) if color else (height, width) data = np.reshape(data, shape) data = np.flipud(data) file.close() return data, scale def read_camera_parameters(filename): with open(filename) as f: lines = f.readlines() lines = [line.rstrip() for line in lines] # extrinsics: line [1,5), 4x4 matrix extrinsics = np.fromstring(' '.join(lines[1:5]), dtype=np.float32, sep=' ').reshape((4, 4)) # intrinsics: line [7-10), 3x3 matrix intrinsics = np.fromstring(' '.join(lines[7:10]), dtype=np.float32, sep=' ').reshape((3, 3)) return intrinsics, extrinsics def read_img(filename): img = Image.open(filename) # scale 0~255 to 0~1 np_img = np.array(img, dtype=np.float32) / 255. return np_img def read_mask(filename): return read_img(filename) > 0.5 def save_mask(filename, mask): assert mask.dtype == np.bool mask = mask.astype(np.uint8) * 255 Image.fromarray(mask).save(filename) def read_pair_file(filename): data = [] with open(filename) as f: num_viewpoint = int(f.readline()) # 49 viewpoints for view_idx in range(num_viewpoint): ref_view = int(f.readline().rstrip()) src_views = [int(x) for x in f.readline().rstrip().split()[1::2]] if len(src_views) > 0: data.append((ref_view, src_views)) return data # project the reference point cloud into the source view, then project back def reproject_with_depth(depth_ref, intrinsics_ref, extrinsics_ref, depth_src, intrinsics_src, extrinsics_src): width, height = depth_ref.shape[1], depth_ref.shape[0] ## step1. project reference pixels to the source view # reference view x, y x_ref, y_ref = np.meshgrid(np.arange(0, width), np.arange(0, height)) x_ref, y_ref = x_ref.reshape([-1]), y_ref.reshape([-1]) # reference 3D space xyz_ref = np.matmul(np.linalg.inv(intrinsics_ref), np.vstack((x_ref, y_ref, np.ones_like(x_ref))) * depth_ref.reshape([-1])) # source 3D space xyz_src = np.matmul(np.matmul(extrinsics_src, np.linalg.inv(extrinsics_ref)), np.vstack((xyz_ref, np.ones_like(x_ref))))[:3] # source view x, y K_xyz_src = np.matmul(intrinsics_src, xyz_src) xy_src = K_xyz_src[:2] / K_xyz_src[2:3] ## step2. reproject the source view points with source view depth estimation # find the depth estimation of the source view x_src = xy_src[0].reshape([height, width]).astype(np.float32) y_src = xy_src[1].reshape([height, width]).astype(np.float32) sampled_depth_src = cv2.remap(depth_src, x_src, y_src, interpolation=cv2.INTER_LINEAR) # mask = sampled_depth_src > 0 # source 3D space # NOTE that we should use sampled source-view depth_here to project back xyz_src = np.matmul(np.linalg.inv(intrinsics_src), np.vstack((xy_src, np.ones_like(x_ref))) * sampled_depth_src.reshape([-1])) # reference 3D space xyz_reprojected = np.matmul(np.matmul(extrinsics_ref, np.linalg.inv(extrinsics_src)), np.vstack((xyz_src, np.ones_like(x_ref))))[:3] # source view x, y, depth depth_reprojected = xyz_reprojected[2].reshape([height, width]).astype(np.float32) K_xyz_reprojected = np.matmul(intrinsics_ref, xyz_reprojected) xy_reprojected = K_xyz_reprojected[:2] / K_xyz_reprojected[2:3] x_reprojected = xy_reprojected[0].reshape([height, width]).astype(np.float32) y_reprojected = xy_reprojected[1].reshape([height, width]).astype(np.float32) return depth_reprojected, x_reprojected, y_reprojected, x_src, y_src def check_geometric_consistency(depth_ref, intrinsics_ref, extrinsics_ref, depth_src, intrinsics_src, extrinsics_src): width, height = depth_ref.shape[1], depth_ref.shape[0] x_ref, y_ref = np.meshgrid(np.arange(0, width), np.arange(0, height)) depth_reprojected, x2d_reprojected, y2d_reprojected, x2d_src, y2d_src = reproject_with_depth(depth_ref, intrinsics_ref, extrinsics_ref, depth_src, intrinsics_src, extrinsics_src) # check |p_reproj-p_1| < 1 dist = np.sqrt((x2d_reprojected - x_ref) ** 2 + (y2d_reprojected - y_ref) ** 2) # check |d_reproj-d_1| / d_1 < 0.01 depth_diff = np.abs(depth_reprojected - depth_ref) relative_depth_diff = depth_diff / depth_ref mask = np.logical_and(dist < 1, relative_depth_diff < 0.01) depth_reprojected[~mask] = 0 return mask, depth_reprojected, x2d_src, y2d_src def filter_depth(pair_folder, scan_folder, out_folder, plyfilename): # the pair file pair_file = os.path.join(pair_folder, "pair.txt") # for the final point cloud vertexs = [] vertex_colors = [] pair_data = read_pair_file(pair_file) # for each reference view and the corresponding source views for ref_view, src_views in pair_data: # src_views = src_views[:args.num_view] # load the camera parameters ref_intrinsics, ref_extrinsics = read_camera_parameters( os.path.join(scan_folder, 'cams/{:0>8}_cam.txt'.format(ref_view))) # load the reference image ref_img = read_img(os.path.join(scan_folder, 'images/{:0>8}.jpg'.format(ref_view))) # load the estimated depth of the reference view ref_depth_est = read_pfm(os.path.join(out_folder, 'depth_est/{:0>8}.pfm'.format(ref_view)))[0] # load the photometric mask of the reference view confidence = read_pfm(os.path.join(out_folder, 'confidence/{:0>8}.pfm'.format(ref_view)))[0] photo_mask = confidence > args.conf all_srcview_depth_ests = [] all_srcview_x = [] all_srcview_y = [] all_srcview_geomask = [] # compute the geometric mask geo_mask_sum = 0 for src_view in src_views: # camera parameters of the source view src_intrinsics, src_extrinsics = read_camera_parameters( os.path.join(scan_folder, 'cams/{:0>8}_cam.txt'.format(src_view))) # the estimated depth of the source view src_depth_est = read_pfm(os.path.join(out_folder, 'depth_est/{:0>8}.pfm'.format(src_view)))[0] geo_mask, depth_reprojected, x2d_src, y2d_src = check_geometric_consistency(ref_depth_est, ref_intrinsics, ref_extrinsics, src_depth_est, src_intrinsics, src_extrinsics) geo_mask_sum += geo_mask.astype(np.int32) all_srcview_depth_ests.append(depth_reprojected) all_srcview_x.append(x2d_src) all_srcview_y.append(y2d_src) all_srcview_geomask.append(geo_mask) depth_est_averaged = (sum(all_srcview_depth_ests) + ref_depth_est) / (geo_mask_sum + 1) # at least 3 source views matched geo_mask = geo_mask_sum >= args.thres_view final_mask = np.logical_and(photo_mask, geo_mask) os.makedirs(os.path.join(out_folder, "mask"), exist_ok=True) save_mask(os.path.join(out_folder, "mask/{:0>8}_photo.png".format(ref_view)), photo_mask) save_mask(os.path.join(out_folder, "mask/{:0>8}_geo.png".format(ref_view)), geo_mask) save_mask(os.path.join(out_folder, "mask/{:0>8}_final.png".format(ref_view)), final_mask) logger.info("processing {}, ref-view{:0>2}, photo/geo/final-mask:{:.3f}/{:.3f}/{:.3f}".format(scan_folder, ref_view, photo_mask.mean(), geo_mask.mean(), final_mask.mean())) height, width = depth_est_averaged.shape[:2] x, y = np.meshgrid(np.arange(0, width), np.arange(0, height)) # valid_points = np.logical_and(final_mask, ~used_mask[ref_view]) valid_points = final_mask logger.info("valid_points: {}".format(valid_points.mean())) x, y, depth = x[valid_points], y[valid_points], depth_est_averaged[valid_points] #color = ref_img[1:-16:4, 1::4, :][valid_points] # hardcoded for DTU dataset color = ref_img[valid_points] xyz_ref = np.matmul(np.linalg.inv(ref_intrinsics), np.vstack((x, y, np.ones_like(x))) * depth) xyz_world = np.matmul(np.linalg.inv(ref_extrinsics), np.vstack((xyz_ref, np.ones_like(x))))[:3] vertexs.append(xyz_world.transpose((1, 0))) vertex_colors.append((color * 255).astype(np.uint8)) vertexs = np.concatenate(vertexs, axis=0) vertex_colors = np.concatenate(vertex_colors, axis=0) vertexs = np.array([tuple(v) for v in vertexs], dtype=[('x', 'f4'), ('y', 'f4'), ('z', 'f4')]) vertex_colors = np.array([tuple(v) for v in vertex_colors], dtype=[('red', 'u1'), ('green', 'u1'), ('blue', 'u1')]) vertex_all = np.empty(len(vertexs), vertexs.dtype.descr + vertex_colors.dtype.descr) for prop in vertexs.dtype.names: vertex_all[prop] = vertexs[prop] for prop in vertex_colors.dtype.names: vertex_all[prop] = vertex_colors[prop] el = PlyElement.describe(vertex_all, 'vertex') PlyData([el]).write(plyfilename) logger.info("saving the final model to " + plyfilename) def init_worker(): ''' Catch Ctrl+C signal to termiante workers ''' signal.signal(signal.SIGINT, signal.SIG_IGN) def pcd_filter_worker(scan): scan_id = int(scan[4:]) save_name = 'mvsnet{:0>3}.ply'.format(scan_id) pair_folder = os.path.join(args.testpath, "Cameras") scan_folder = os.path.join(args.outdir, scan) out_folder = os.path.join(args.outdir, scan) filter_depth(pair_folder, scan_folder, out_folder, os.path.join(args.plydir, save_name)) def pcd_filter(testlist, number_worker): partial_func = partial(pcd_filter_worker) p = Pool(number_worker, init_worker) try: p.map(partial_func, testlist) except KeyboardInterrupt: logger.info("....\nCaught KeyboardInterrupt, terminating workers") p.terminate() else: p.close() p.join() def initLogger(): logger = logging.getLogger() logger.setLevel(logging.INFO) curTime = time.strftime('%Y%m%d-%H%M', time.localtime(time.time())) if not os.path.isdir(args.logdir): os.mkdir(args.logdir) logfile = os.path.join(args.logdir, 'fusion-' + curTime + '.log') formatter = logging.Formatter("%(asctime)s - %(filename)s[line:%(lineno)d] - %(levelname)s: %(message)s") if not args.nolog: fileHandler = logging.FileHandler(logfile, mode='a') fileHandler.setFormatter(formatter) logger.addHandler(fileHandler) consoleHandler = logging.StreamHandler(sys.stdout) consoleHandler.setFormatter(formatter) logger.addHandler(consoleHandler) logger.info("Logger initialized.") logger.info("Writing logs to file: {}".format(logfile)) logger.info("Current time: {}".format(curTime)) return logger if __name__ == '__main__': logger = initLogger() if not os.path.isdir(args.plydir): os.mkdir(args.plydir) with open(args.testlist) as f: content = f.readlines() testlist = [line.rstrip() for line in content] pcd_filter(testlist, args.num_worker) ================================================ FILE: fusions/tnt/dypcd.py ================================================ # -*- coding: utf-8 -*- # @Description: Point cloud fusion strategy for Tanks and Temples dataset: DYnamic PCD. # Refer to: https://github.com/yhw-yhw/D2HC-RMVSNet/blob/master/fusion.py # @Author: Zhe Zhang (doublez@stu.pku.edu.cn) # @Affiliation: Peking University (PKU) # @LastEditDate: 2023-09-07 import os import cv2 import signal import numpy as np from PIL import Image from functools import partial from multiprocessing import Pool from plyfile import PlyData, PlyElement import argparse import re, json from sklearn.preprocessing import scale parser = argparse.ArgumentParser() parser.add_argument("--root_dir", type=str, default="[/path/to/]tankandtemples/") parser.add_argument('--out_dir', type=str, default='outputs/[exp_name]') parser.add_argument('--ply_path', type=str, default='outputs/[exp_name]/dypcd_fusion_plys') parser.add_argument('--split', type=str, default='intermediate', choices=['intermediate', 'advanced']) parser.add_argument('--list_file', type=str, default='datasets/lists/tnt/intermediate.txt') parser.add_argument('--num_workers', type=int, default=1) parser.add_argument('--single_processor', action='store_true') parser.add_argument('--rescale', action='store_true') parser.add_argument('--max_w', type=int) parser.add_argument('--max_h', type=int) parser.add_argument('--cam_mode', type=str, default='origin', choices=['origin', 'short_range']) parser.add_argument('--img_mode', type=str, default='resize', choices=['resize', 'crop']) parser.add_argument('--dist_base', type=float, default=1 / 4) parser.add_argument('--rel_diff_base', type=float, default=1 / 1300) args = parser.parse_args() tnt_fusion_exps = [ { "ply_path": "dypcd_fusion_plys_mean", "param_strategy": "mean", }, { "ply_path": "dypcd_fusion_plys", "param_strategy": "hyper_param", "hyper_param_table": { # -1 -> mean() 'Family': 0.6, 'Francis': 0.6, 'Horse': 0.2, 'Lighthouse': 0.7, 'M60': 0.6, 'Panther': 0.6, 'Playground': 0.7, 'Train': 0.6, 'Auditorium': 0.1, 'Ballroom': 0.4, 'Courtroom': 0.4, 'Museum': 0.5, 'Palace': 0.5, 'Temple': 0.4 } }, ] def read_pfm(filename): file = open(filename, 'rb') color = None width = None height = None scale = None endian = None header = file.readline().decode('utf-8').rstrip() if header == 'PF': color = True elif header == 'Pf': color = False else: raise Exception('Not a PFM file.') dim_match = re.match(r'^(\d+)\s(\d+)\s$', file.readline().decode('utf-8')) if dim_match: width, height = map(int, dim_match.groups()) else: raise Exception('Malformed PFM header.') scale = float(file.readline().rstrip()) if scale < 0: # little-endian endian = '<' scale = -scale else: endian = '>' # big-endian data = np.fromfile(file, endian + 'f') shape = (height, width, 3) if color else (height, width) data = np.reshape(data, shape) data = np.flipud(data) file.close() return data, scale # save a binary mask def save_mask(filename, mask): assert mask.dtype == np.bool mask = mask.astype(np.uint8) * 255 Image.fromarray(mask).save(filename) # read an image def read_img(filename): img = Image.open(filename) # scale 0~255 to 0~1 np_img = np.array(img, dtype=np.float32) / 255. return np_img # read intrinsics and extrinsics def read_camera_parameters(filename): with open(filename) as f: lines = f.readlines() lines = [line.rstrip() for line in lines] # extrinsics: line [1,5), 4x4 matrix extrinsics = np.fromstring(' '.join(lines[1:5]), dtype=np.float32, sep=' ').reshape((4, 4)) # intrinsics: line [7-10), 3x3 matrix intrinsics = np.fromstring(' '.join(lines[7:10]), dtype=np.float32, sep=' ').reshape((3, 3)) # TODO: assume the feature is 1/4 of the original image size # intrinsics[:2, :] /= 4 return intrinsics, extrinsics # read a pair file, [(ref_view1, [src_view1-1, ...]), (ref_view2, [src_view2-1, ...]), ...] def read_pair_file(filename): data = [] with open(filename) as f: num_viewpoint = int(f.readline()) # 49 viewpoints for view_idx in range(num_viewpoint): ref_view = int(f.readline().rstrip()) src_views = [int(x) for x in f.readline().rstrip().split()[1::2]] if len(src_views) > 0: data.append((ref_view, src_views)) return data # project the reference point cloud into the source view, then project back def reproject_with_depth(depth_ref, intrinsics_ref, extrinsics_ref, depth_src, intrinsics_src, extrinsics_src): width, height = depth_ref.shape[1], depth_ref.shape[0] ## step1. project reference pixels to the source view # reference view x, y x_ref, y_ref = np.meshgrid(np.arange(0, width), np.arange(0, height)) x_ref, y_ref = x_ref.reshape([-1]), y_ref.reshape([-1]) # reference 3D space xyz_ref = np.matmul(np.linalg.inv(intrinsics_ref), np.vstack((x_ref, y_ref, np.ones_like(x_ref))) * depth_ref.reshape([-1])) # source 3D space xyz_src = np.matmul(np.matmul(extrinsics_src, np.linalg.inv(extrinsics_ref)), np.vstack((xyz_ref, np.ones_like(x_ref))))[:3] # source view x, y K_xyz_src = np.matmul(intrinsics_src, xyz_src) xy_src = K_xyz_src[:2] / K_xyz_src[2:3] ## step2. reproject the source view points with source view depth estimation # find the depth estimation of the source view x_src = xy_src[0].reshape([height, width]).astype(np.float32) y_src = xy_src[1].reshape([height, width]).astype(np.float32) sampled_depth_src = cv2.remap(depth_src, x_src, y_src, interpolation=cv2.INTER_LINEAR) # mask = sampled_depth_src > 0 # source 3D space # NOTE that we should use sampled source-view depth_here to project back xyz_src = np.matmul(np.linalg.inv(intrinsics_src), np.vstack((xy_src, np.ones_like(x_ref))) * sampled_depth_src.reshape([-1])) # reference 3D space xyz_reprojected = np.matmul(np.matmul(extrinsics_ref, np.linalg.inv(extrinsics_src)), np.vstack((xyz_src, np.ones_like(x_ref))))[:3] # source view x, y, depth depth_reprojected = xyz_reprojected[2].reshape([height, width]).astype(np.float32) K_xyz_reprojected = np.matmul(intrinsics_ref, xyz_reprojected) K_xyz_reprojected[2:3][K_xyz_reprojected[2:3]==0] += 0.00001 xy_reprojected = K_xyz_reprojected[:2] / K_xyz_reprojected[2:3] x_reprojected = xy_reprojected[0].reshape([height, width]).astype(np.float32) y_reprojected = xy_reprojected[1].reshape([height, width]).astype(np.float32) return depth_reprojected, x_reprojected, y_reprojected, x_src, y_src def check_geometric_consistency(depth_ref, intrinsics_ref, extrinsics_ref, depth_src, intrinsics_src, extrinsics_src): width, height = depth_ref.shape[1], depth_ref.shape[0] x_ref, y_ref = np.meshgrid(np.arange(0, width), np.arange(0, height)) depth_reprojected, x2d_reprojected, y2d_reprojected, x2d_src, y2d_src = reproject_with_depth(depth_ref, intrinsics_ref, extrinsics_ref, depth_src, intrinsics_src, extrinsics_src) # check |p_reproj-p_1| < 1 dist = np.sqrt((x2d_reprojected - x_ref) ** 2 + (y2d_reprojected - y_ref) ** 2) # check |d_reproj-d_1| / d_1 < 0.01 depth_diff = np.abs(depth_reprojected - depth_ref) relative_depth_diff = depth_diff / depth_ref mask = None masks = [] for i in range(2, 11): # mask = np.logical_and(dist < i / 4, relative_depth_diff < i / 1300) mask = np.logical_and(dist < i * args.dist_base, relative_depth_diff < i * args.rel_diff_base) masks.append(mask) depth_reprojected[~mask] = 0 return masks, mask, depth_reprojected, x2d_src, y2d_src def scale_input(intrinsics, img): if args.img_mode == "crop": intrinsics[1,2] = intrinsics[1,2] - 28 # 1080 -> 1024 img = img[28:1080-28, :, :] elif args.img_mode == "resize": height, width = img.shape[:2] img = cv2.resize(img, (width, 1024)) scale_h = 1.0 * 1024 / height intrinsics[1, :] *= scale_h return intrinsics, img def filter_depth(scene, root_dir, split, out_dir, plyfilename, fusion_exp): # num_stage = len(args.ndepths) # the pair file pair_file = os.path.join(root_dir, split, scene, "pair.txt") # for the final point cloud vertexs = [] vertex_colors = [] pair_data = read_pair_file(pair_file) nviews = len(pair_data) # for each reference view and the corresponding source views for ref_view, src_views in pair_data: # src_views = src_views[:args.num_view] # load the camera parameters if args.cam_mode == 'short_range': ref_intrinsics, ref_extrinsics = read_camera_parameters( os.path.join(root_dir, split, scene, 'cams_{}/{:0>8}_cam.txt'.format(scene.lower(), ref_view))) elif args.cam_mode == 'origin': ref_intrinsics, ref_extrinsics = read_camera_parameters( os.path.join(root_dir, split, scene, 'cams/{:0>8}_cam.txt'.format(ref_view))) ref_img = read_img(os.path.join(root_dir, split, scene, 'images/{:0>8}.jpg'.format(ref_view))) ref_depth_est = read_pfm(os.path.join(out_dir, scene, 'depth_est/{:0>8}.pfm'.format(ref_view)))[0] confidence = read_pfm(os.path.join(out_dir, scene, 'confidence/{:0>8}.pfm'.format(ref_view)))[0] if fusion_exp['param_strategy'] == 'mean': if ref_view % 50 == 0: print("-- thresh: {}".format(confidence.mean())) photo_mask = confidence > confidence.mean() elif fusion_exp['param_strategy'] == 'hyper_param': conf_thresh = fusion_exp['hyper_param_table'][scene] if conf_thresh == -1: photo_mask = confidence > confidence.mean() if ref_view % 50 == 0: print("-- thresh: mean() {}".format(confidence.mean())) else: photo_mask = confidence > conf_thresh if ref_view % 50 == 0: print("-- thresh: {}".format(conf_thresh)) flag_img = ref_img ref_intrinsics, _ = scale_input(ref_intrinsics, flag_img) all_srcview_depth_ests = [] all_srcview_x = [] all_srcview_y = [] all_srcview_geomask = [] # compute the geometric mask geo_mask_sum = 0 dy_range = len(src_views) + 1 geo_mask_sums = [0] * (dy_range - 2) for src_view in src_views: # camera parameters of the source view if args.cam_mode == 'short_range': src_intrinsics, src_extrinsics = read_camera_parameters( os.path.join(root_dir, split, scene, 'cams_{}/{:0>8}_cam.txt'.format(scene.lower(), src_view))) elif args.cam_mode == 'origin': src_intrinsics, src_extrinsics = read_camera_parameters( os.path.join(root_dir, split, scene, 'cams/{:0>8}_cam.txt'.format(src_view))) # the estimated depth of the source view src_depth_est = read_pfm(os.path.join(out_dir, scene, 'depth_est/{:0>8}.pfm'.format(src_view)))[0] src_intrinsics, _ = scale_input(src_intrinsics, flag_img) masks, geo_mask, depth_reprojected, x2d_src, y2d_src = check_geometric_consistency(ref_depth_est, ref_intrinsics, ref_extrinsics, src_depth_est, src_intrinsics, src_extrinsics) geo_mask_sum += geo_mask.astype(np.int32) for i in range(2, dy_range): geo_mask_sums[i - 2] += masks[i - 2].astype(np.int32) all_srcview_depth_ests.append(depth_reprojected) all_srcview_x.append(x2d_src) all_srcview_y.append(y2d_src) all_srcview_geomask.append(geo_mask) depth_est_averaged = (sum(all_srcview_depth_ests) + ref_depth_est) / (geo_mask_sum + 1) # at least args.thres_view source views matched geo_mask = geo_mask_sum >= dy_range for i in range(2, dy_range): geo_mask = np.logical_or(geo_mask, geo_mask_sums[i - 2] >= i) final_mask = np.logical_and(photo_mask, geo_mask) if ref_view < 3: os.makedirs(os.path.join(out_dir, scene, "mask"), exist_ok=True) save_mask(os.path.join(out_dir, scene, "mask/{:0>8}_photo.png".format(ref_view)), photo_mask) save_mask(os.path.join(out_dir, scene, "mask/{:0>8}_geo.png".format(ref_view)), geo_mask) save_mask(os.path.join(out_dir, scene, "mask/{:0>8}_final.png".format(ref_view)), final_mask) print("processing {}, ref-view{:0>2}, photo/geo/final-mask:{:.3f}/{:.3f}/{:.3f}".format(os.path.join(out_dir, scene), ref_view, photo_mask.mean(), geo_mask.mean(), final_mask.mean())) height, width = depth_est_averaged.shape[:2] x, y = np.meshgrid(np.arange(0, width), np.arange(0, height)) # valid_points = np.logical_and(final_mask, ~used_mask[ref_view]) valid_points = final_mask print("valid_points {:.3f}".format(valid_points.mean())) x, y, depth = x[valid_points], y[valid_points], depth_est_averaged[valid_points] # color = ref_img[:-24, :, :][valid_points] color = ref_img[28:1080-28, :, :][valid_points] xyz_ref = np.matmul(np.linalg.inv(ref_intrinsics), np.vstack((x, y, np.ones_like(x))) * depth) xyz_world = np.matmul(np.linalg.inv(ref_extrinsics), np.vstack((xyz_ref, np.ones_like(x))))[:3] vertexs.append(xyz_world.transpose((1, 0))) vertex_colors.append((color * 255).astype(np.uint8)) vertexs = np.concatenate(vertexs, axis=0) vertex_colors = np.concatenate(vertex_colors, axis=0) vertexs = np.array([tuple(v) for v in vertexs], dtype=[('x', 'f4'), ('y', 'f4'), ('z', 'f4')]) vertex_colors = np.array([tuple(v) for v in vertex_colors], dtype=[('red', 'u1'), ('green', 'u1'), ('blue', 'u1')]) vertex_all = np.empty(len(vertexs), vertexs.dtype.descr + vertex_colors.dtype.descr) for prop in vertexs.dtype.names: vertex_all[prop] = vertexs[prop] for prop in vertex_colors.dtype.names: vertex_all[prop] = vertex_colors[prop] el = PlyElement.describe(vertex_all, 'vertex') PlyData([el]).write(plyfilename) print("saving the final model to", plyfilename) def dypcd_filter_worker(scene): save_name = '{}.ply'.format(scene) filter_depth(scene, args.root_dir, args.split, args.out_dir, os.path.join(args.out_dir, fusion_exp['ply_path'], save_name), fusion_exp) def init_worker(): signal.signal(signal.SIGINT, signal.SIG_IGN) if __name__ == '__main__': with open(os.path.join(args.list_file)) as f: testlist = [line.rstrip() for line in f.readlines()] for fusion_exp in tnt_fusion_exps: if not os.path.isdir(os.path.join(args.out_dir, fusion_exp['ply_path'])): os.mkdir(os.path.join(args.out_dir, fusion_exp['ply_path'])) if args.single_processor: for scene in testlist: save_name = '{}.ply'.format(scene) filter_depth(scene, args.root_dir, args.split, args.out_dir, os.path.join(args.out_dir, fusion_exp['ply_path'], save_name), fusion_exp) else: partial_func = partial(dypcd_filter_worker) p = Pool(args.num_workers, init_worker) try: p.map(partial_func, testlist) except KeyboardInterrupt: print("....\nCaught KeyboardInterrupt, terminating workers") p.terminate() else: p.close() p.join() ================================================ FILE: models/__init__.py ================================================ from models.geomvsnet import GeoMVSNet from models.loss import geomvsnet_loss ================================================ FILE: models/filter.py ================================================ # -*- coding: utf-8 -*- # @Description: Basic implementation of Frequency Domain Filtering strategy (Sec 3.2 in the paper). # @Author: Zhe Zhang (doublez@stu.pku.edu.cn) # @Affiliation: Peking University (PKU) # @LastEditDate: 2023-09-07 import torch import numpy as np import matplotlib.pyplot as plt def frequency_domain_filter(depth, rho_ratio): """ large rho_ratio -> more information filtered """ f = torch.fft.fft2(depth) fshift = torch.fft.fftshift(f) b, h, w = depth.shape k_h, k_w = h/rho_ratio, w/rho_ratio fshift[:,:int(h/2-k_h/2),:] = 0 fshift[:,int(h/2+k_h/2):,:] = 0 fshift[:,:,:int(w/2-k_w/2)] = 0 fshift[:,:,int(w/2+k_w/2):] = 0 ishift = torch.fft.ifftshift(fshift) idepth = torch.fft.ifft2(ishift) depth_filtered = torch.abs(idepth) return depth_filtered def visual_fft_fig(fshift): fft_fig = torch.abs(20 * torch.log(fshift)) plt.figure(figsize=(10, 10)) plt.subplot(121) plt.imshow(fft_fig[0,:,:], cmap = 'gray') ================================================ FILE: models/geometry.py ================================================ # -*- coding: utf-8 -*- # @Description: Geometric Prior Guided Feature Fusion & Probability Volume Geometry Embedding (Sec 3.1 in the paper). # @Author: Zhe Zhang (doublez@stu.pku.edu.cn) # @Affiliation: Peking University (PKU) # @LastEditDate: 2023-09-07 import math import numpy as np import torch import torch.nn as nn import torch.nn.functional as F from models.submodules import ConvBnReLU3D class GeoFeatureFusion(nn.Module): def __init__(self, convolutional_layer_encoding="z", mask_type="basic", add_origin_feat_flag=True): super(GeoFeatureFusion, self).__init__() self.convolutional_layer_encoding = convolutional_layer_encoding # std / uv / z / xyz self.mask_type = mask_type # basic / mean self.add_origin_feat_flag = add_origin_feat_flag # True / False if self.convolutional_layer_encoding == "std": self.geoplanes = 0 elif self.convolutional_layer_encoding == "uv": self.geoplanes = 2 elif self.convolutional_layer_encoding == "z": self.geoplanes = 1 elif self.convolutional_layer_encoding == "xyz": self.geoplanes = 3 self.geofeature = GeometryFeature() # rgb encoder self.rgb_conv_init = convbnrelu(in_channels=4, out_channels=8, kernel_size=5, stride=1, padding=2) self.rgb_encoder_layer1 = BasicBlockGeo(inplanes=8, planes=16, stride=2, geoplanes=self.geoplanes) self.rgb_encoder_layer2 = BasicBlockGeo(inplanes=16, planes=32, stride=1, geoplanes=self.geoplanes) self.rgb_encoder_layer3 = BasicBlockGeo(inplanes=32, planes=64, stride=2, geoplanes=self.geoplanes) self.rgb_encoder_layer4 = BasicBlockGeo(inplanes=64, planes=128, stride=1, geoplanes=self.geoplanes) self.rgb_encoder_layer5 = BasicBlockGeo(inplanes=128, planes=256, stride=2, geoplanes=self.geoplanes) self.rgb_decoder_layer4 = deconvbnrelu(in_channels=256, out_channels=128, kernel_size=5, stride=2, padding=2, output_padding=1) self.rgb_decoder_layer2 = deconvbnrelu(in_channels=128, out_channels=32, kernel_size=5, stride=2, padding=2, output_padding=1) self.rgb_decoder_layer0 = deconvbnrelu(in_channels=32, out_channels=16, kernel_size=3, stride=1, padding=1, output_padding=0) self.rgb_decoder_layer= deconvbnrelu(in_channels=16, out_channels=8, kernel_size=5, stride=2, padding=2, output_padding=1) self.rgb_decoder_output = deconvbnrelu(in_channels=8, out_channels=2, kernel_size=3, stride=1, padding=1, output_padding=0) # depth encoder self.depth_conv_init = convbnrelu(in_channels=2, out_channels=8, kernel_size=5, stride=1, padding=2) self.depth_layer1 = BasicBlockGeo(inplanes=8, planes=16, stride=2, geoplanes=self.geoplanes) self.depth_layer2 = BasicBlockGeo(inplanes=16, planes=32, stride=1, geoplanes=self.geoplanes) self.depth_layer3 = BasicBlockGeo(inplanes=64, planes=64, stride=2, geoplanes=self.geoplanes) self.depth_layer4 = BasicBlockGeo(inplanes=64, planes=128, stride=1, geoplanes=self.geoplanes) self.depth_layer5 = BasicBlockGeo(inplanes=256, planes=256, stride=2, geoplanes=self.geoplanes) self.decoder_layer3 = deconvbnrelu(in_channels=256, out_channels=128, kernel_size=5, stride=2, padding=2, output_padding=1) self.decoder_layer4 = deconvbnrelu(in_channels=128, out_channels=64, kernel_size=3, stride=1, padding=1, output_padding=0) self.decoder_layer5 = deconvbnrelu(in_channels=64, out_channels=32, kernel_size=5, stride=2, padding=2, output_padding=1) self.decoder_layer6 = deconvbnrelu(in_channels=32, out_channels=16, kernel_size=3, stride=1, padding=1, output_padding=0) self.decoder_layer7 = deconvbnrelu(in_channels=16, out_channels=8, kernel_size=5, stride=2, padding=2, output_padding=1) # output self.rgbdepth_decoder_stage1 = deconvbnrelu(in_channels=32, out_channels=32, kernel_size=5, stride=2, padding=2, output_padding=1) self.rgbdepth_decoder_stage2 = deconvbnrelu(in_channels=16, out_channels=16, kernel_size=5, stride=2, padding=2, output_padding=1) self.rgbdepth_decoder_stage3 = deconvbnrelu(in_channels=8, out_channels=8, kernel_size=3, stride=1, padding=1, output_padding=0) self.final_decoder_stage1 = deconvbnrelu(in_channels=32, out_channels=32, kernel_size=3, stride=1, padding=1, output_padding=0) self.final_decoder_stage2 = deconvbnrelu(in_channels=16, out_channels=16, kernel_size=3, stride=1, padding=1, output_padding=0) self.final_decoder_stage3 = deconvbnrelu(in_channels=8, out_channels=8, kernel_size=3, stride=1, padding=1, output_padding=0) self.softmax = nn.Softmax(dim=1) self.pooling = nn.AvgPool2d(kernel_size=2) self.sparsepooling = SparseDownSampleClose(stride=2) weights_init(self) def forward(self, rgb, depth, confidence, depth_values, stage_idx, origin_feat, intrinsics_matrices_stage): rgb = rgb depth_min, depth_max = depth_values[:,0,None,None,None], depth_values[:,-1,None,None,None] d = (depth - depth_min) / (depth_max - depth_min) if self.mask_type == "basic": valid_mask = torch.where(d>0, torch.full_like(d, 1.0), torch.full_like(d, 0.0)) elif self.mask_type == "mean": valid_mask = torch.where(torch.logical_and(d>0, confidence>confidence.mean()), torch.full_like(d, 1.0), torch.full_like(d, 0.0)) # pre-data preparation if self.convolutional_layer_encoding in ["uv", "xyz"]: B, _, W, H = rgb.shape position = AddCoordsNp(H, W) position = position.call() position = torch.from_numpy(position).to(rgb.device).repeat(B, 1, 1, 1).transpose(-1, 1) unorm = position[:, 0:1, :, :] vnorm = position[:, 1:2, :, :] vnorm_s2 = self.pooling(vnorm) vnorm_s3 = self.pooling(vnorm_s2) vnorm_s4 = self.pooling(vnorm_s3) unorm_s2 = self.pooling(unorm) unorm_s3 = self.pooling(unorm_s2) unorm_s4 = self.pooling(unorm_s3) if self.convolutional_layer_encoding in ["z", "xyz"]: d_s2, vm_s2 = self.sparsepooling(d, valid_mask) d_s3, vm_s3 = self.sparsepooling(d_s2, vm_s2) d_s4, vm_s4 = self.sparsepooling(d_s3, vm_s3) if self.convolutional_layer_encoding == "xyz": K = intrinsics_matrices_stage f352 = K[:, 1, 1] f352 = f352.unsqueeze(1) f352 = f352.unsqueeze(2) f352 = f352.unsqueeze(3) c352 = K[:, 1, 2] c352 = c352.unsqueeze(1) c352 = c352.unsqueeze(2) c352 = c352.unsqueeze(3) f1216 = K[:, 0, 0] f1216 = f1216.unsqueeze(1) f1216 = f1216.unsqueeze(2) f1216 = f1216.unsqueeze(3) c1216 = K[:, 0, 2] c1216 = c1216.unsqueeze(1) c1216 = c1216.unsqueeze(2) c1216 = c1216.unsqueeze(3) # geometric info if self.convolutional_layer_encoding == "std": geo_s1 = None geo_s2 = None geo_s3 = None geo_s4 = None elif self.convolutional_layer_encoding == "uv": geo_s1 = torch.cat((vnorm, unorm), dim=1) geo_s2 = torch.cat((vnorm_s2, unorm_s2), dim=1) geo_s3 = torch.cat((vnorm_s3, unorm_s3), dim=1) geo_s4 = torch.cat((vnorm_s4, unorm_s4), dim=1) elif self.convolutional_layer_encoding == "z": geo_s1 = d geo_s2 = d_s2 geo_s3 = d_s3 geo_s4 = d_s4 elif self.convolutional_layer_encoding == "xyz": geo_s1 = self.geofeature(d, vnorm, unorm, H, W, c352, c1216, f352, f1216) geo_s2 = self.geofeature(d_s2, vnorm_s2, unorm_s2, H / 2, W / 2, c352, c1216, f352, f1216) geo_s3 = self.geofeature(d_s3, vnorm_s3, unorm_s3, H / 4, W / 4, c352, c1216, f352, f1216) geo_s4 = self.geofeature(d_s4, vnorm_s4, unorm_s4, H / 8, W / 8, c352, c1216, f352, f1216) # ----------------------------------------------------------------------------------------- # 128*160 -> 256*320 -> 512*640 rgb_feature = self.rgb_conv_init(torch.cat((rgb, d), dim=1)) # b 8 h w rgb_feature1 = self.rgb_encoder_layer1(rgb_feature, geo_s1, geo_s2) # b 16 h/2 w/2 rgb_feature2 = self.rgb_encoder_layer2(rgb_feature1, geo_s2, geo_s2) # b 32 h/2 w/2 rgb_feature3 = self.rgb_encoder_layer3(rgb_feature2, geo_s2, geo_s3) # b 64 h/4 w/4 rgb_feature4 = self.rgb_encoder_layer4(rgb_feature3, geo_s3, geo_s3) # b 128 h/4 w/4 rgb_feature5 = self.rgb_encoder_layer5(rgb_feature4, geo_s3, geo_s4) # b 256 h/8 w/8 rgb_feature_decoder4 = self.rgb_decoder_layer4(rgb_feature5) rgb_feature4_plus = rgb_feature_decoder4 + rgb_feature4 # b 128 h/4 w/4 rgb_feature_decoder2 = self.rgb_decoder_layer2(rgb_feature4_plus) rgb_feature2_plus = rgb_feature_decoder2 + rgb_feature2 # b 32 h/2 w/2 rgb_feature_decoder0 = self.rgb_decoder_layer0(rgb_feature2_plus) rgb_feature0_plus = rgb_feature_decoder0 + rgb_feature1 # b 16 h/2 w/2 rgb_feature_decoder = self.rgb_decoder_layer(rgb_feature0_plus) rgb_feature_plus = rgb_feature_decoder + rgb_feature # b 8 h w rgb_output = self.rgb_decoder_output(rgb_feature_plus) # b 2 h w rgb_depth = rgb_output[:, 0:1, :, :] rgb_conf = rgb_output[:, 1:2, :, :] # ----------------------------------------------------------------------------------------- sparsed_feature = self.depth_conv_init(torch.cat((d, rgb_depth), dim=1)) # b 8 h w sparsed_feature1 = self.depth_layer1(sparsed_feature, geo_s1, geo_s2) # b 16 h/2 w/2 sparsed_feature2 = self.depth_layer2(sparsed_feature1, geo_s2, geo_s2) # b 32 h/2 w/2 sparsed_feature2_plus = torch.cat([rgb_feature2_plus, sparsed_feature2], 1) sparsed_feature3 = self.depth_layer3(sparsed_feature2_plus, geo_s2, geo_s3) # b 64 h/4 w/4 sparsed_feature4 = self.depth_layer4(sparsed_feature3, geo_s3, geo_s3) # b 128 h/4 w/4 sparsed_feature4_plus = torch.cat([rgb_feature4_plus, sparsed_feature4], 1) sparsed_feature5 = self.depth_layer5(sparsed_feature4_plus, geo_s3, geo_s4) # b 256 h/8 w/8 # ----------------------------------------------------------------------------------------- fusion3 = rgb_feature5 + sparsed_feature5 decoder_feature3 = self.decoder_layer3(fusion3) # b 128 h/4 w/4 fusion4 = sparsed_feature4 + decoder_feature3 decoder_feature4 = self.decoder_layer4(fusion4) # b 64 h/4 w/4 if stage_idx >= 1: decoder_feature5 = self.decoder_layer5(decoder_feature4) fusion5 = sparsed_feature2 + decoder_feature5 # b 32 h/2 w/2 if stage_idx == 1: rgbdepth_feature = self.rgbdepth_decoder_stage1(fusion5) if self.add_origin_feat_flag: final_feature = self.final_decoder_stage1(rgbdepth_feature + origin_feat) else: final_feature = self.final_decoder_stage1(rgbdepth_feature) if stage_idx >= 2: decoder_feature6 = self.decoder_layer6(decoder_feature5) fusion6 = sparsed_feature1 + decoder_feature6 # b 16 h/2 w/2 if stage_idx == 2: rgbdepth_feature = self.rgbdepth_decoder_stage2(fusion6) if self.add_origin_feat_flag: final_feature = self.final_decoder_stage2(rgbdepth_feature + origin_feat) else: final_feature = self.final_decoder_stage2(rgbdepth_feature) if stage_idx >= 3: decoder_feature7 = self.decoder_layer7(decoder_feature6) fusion7 = sparsed_feature + decoder_feature7 # b 8 h w if stage_idx == 3: rgbdepth_feature = self.rgbdepth_decoder_stage3(fusion7) if self.add_origin_feat_flag: final_feature = self.final_decoder_stage3(rgbdepth_feature + origin_feat) else: final_feature = self.final_decoder_stage3(rgbdepth_feature) return final_feature class GeoRegNet2d(nn.Module): def __init__(self, input_channel=128, base_channel=32, convolutional_layer_encoding="std"): super(GeoRegNet2d, self).__init__() self.convolutional_layer_encoding = convolutional_layer_encoding # std / uv / z / xyz self.mask_type = "basic" # basic / mean if self.convolutional_layer_encoding == "std": self.geoplanes = 0 elif self.convolutional_layer_encoding == "z": self.geoplanes = 1 self.conv_init = ConvBnReLU3D(input_channel, out_channels=8, kernel_size=(1,3,3), pad=(0,1,1)) self.encoder_layer1 = Reg_BasicBlockGeo(inplanes=8, planes=16, kernel_size=(1,3,3), stride=(1,2,2), padding=(0,1,1), geoplanes=self.geoplanes) self.encoder_layer2 = Reg_BasicBlockGeo(inplanes=16, planes=32, kernel_size=(1,3,3), stride=1, padding=(0,1,1), geoplanes=self.geoplanes) self.encoder_layer3 = Reg_BasicBlockGeo(inplanes=32, planes=64, kernel_size=(1,3,3), stride=(1,2,2), padding=(0,1,1), geoplanes=self.geoplanes) self.encoder_layer4 = Reg_BasicBlockGeo(inplanes=64, planes=128, kernel_size=(1,3,3), stride=1, padding=(0,1,1), geoplanes=self.geoplanes) self.encoder_layer5 = Reg_BasicBlockGeo(inplanes=128, planes=256, kernel_size=(1,3,3), stride=(1,2,2), padding=(0,1,1), geoplanes=self.geoplanes) self.decoder_layer4 = reg_deconvbnrelu(in_channels=256, out_channels=128, kernel_size=(1,5,5), stride=(1,2,2), padding=(0,2,2), output_padding=(0,1,1)) self.decoder_layer3 = reg_deconvbnrelu(in_channels=128, out_channels=64, kernel_size=(1,3,3), stride=1, padding=(0,1,1), output_padding=0) self.decoder_layer2 = reg_deconvbnrelu(in_channels=64, out_channels=32, kernel_size=(1,5,5), stride=(1,2,2), padding=(0,2,2), output_padding=(0,1,1)) self.decoder_layer1 = reg_deconvbnrelu(in_channels=32, out_channels=16, kernel_size=(1,3,3), stride=1, padding=(0,1,1), output_padding=0) self.decoder_layer = reg_deconvbnrelu(in_channels=16, out_channels=8, kernel_size=(1,5,5), stride=(1,2,2), padding=(0,2,2), output_padding=(0,1,1)) self.prob = reg_deconvbnrelu(in_channels=8, out_channels=1, kernel_size=(1,3,3), stride=1, padding=(0,1,1), output_padding=0) self.depthpooling = nn.MaxPool3d((2,1,1),(2,1,1)) self.basicpooling = nn.MaxPool3d((1,2,2), (1,2,2)) weights_init(self) def forward(self, x, stage_idx, geo_reg_data=None): B, C, D, W, H = x.shape if stage_idx >= 1 and self.convolutional_layer_encoding == "z": prob_volume = geo_reg_data["prob_volume_last"].unsqueeze(1) # B 1 D H W else: assert self.convolutional_layer_encoding == "std" # geometric info if self.convolutional_layer_encoding == "std": geo_s1 = None geo_s2 = None geo_s3 = None geo_s4 = None elif self.convolutional_layer_encoding == "z": if stage_idx == 2: geo_s1 = self.depthpooling(prob_volume) else: geo_s1 = prob_volume # B 1 D H W geo_s2 = self.basicpooling(geo_s1) geo_s3 = self.basicpooling(geo_s2) feature = self.conv_init(x) # B 8 D H W feature1 = self.encoder_layer1(feature, geo_s1, geo_s1) # B 16 D H/2 W/2 feature2 = self.encoder_layer2(feature1, geo_s2, geo_s2) # B 32 D H/2 W/2 feature3 = self.encoder_layer3(feature2, geo_s2, geo_s2) # B 64 D H/4 W/4 feature4 = self.encoder_layer4(feature3, geo_s3, geo_s3) # B 128 D H/4 W/4 feature5 = self.encoder_layer5(feature4, geo_s3, geo_s3) # B 256 D H/8 W/8 feature_decoder4 = self.decoder_layer4(feature5) feature4_plus = feature_decoder4 + feature4 # B 128 D H/4 W/4 feature_decoder3 = self.decoder_layer3(feature4_plus) feature3_plus = feature_decoder3 + feature3 # B 64 D H/4 W/4 feature_decoder2 = self.decoder_layer2(feature3_plus) feature2_plus = feature_decoder2 + feature2 # B 32 D H/2 W/2 feature_decoder1 = self.decoder_layer1(feature2_plus) feature1_plus = feature_decoder1 + feature1 # B 16 D H/2 W/2 feature_decoder = self.decoder_layer(feature1_plus) feature_plus = feature_decoder + feature # B 8 D H W x = self.prob(feature_plus) return x.squeeze(1) # -------------------------------------------------------------- class BasicBlockGeo(nn.Module): expansion = 1 __constants__ = ['downsample'] def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1, base_width=64, dilation=1, norm_layer=None, geoplanes=3): super(BasicBlockGeo, self).__init__() if norm_layer is None: norm_layer = nn.BatchNorm2d if groups != 1 or base_width != 64: raise ValueError('BasicBlock only supports groups=1 and base_width=64') if dilation > 1: raise NotImplementedError("Dilation > 1 not supported in BasicBlock") self.conv1 = conv3x3(inplanes + geoplanes, planes, stride) self.bn1 = norm_layer(planes) self.relu = nn.ReLU(inplace=True) self.conv2 = conv3x3(planes+geoplanes, planes) self.bn2 = norm_layer(planes) if stride != 1 or inplanes != planes: downsample = nn.Sequential( conv1x1(inplanes+geoplanes, planes, stride), norm_layer(planes), ) self.downsample = downsample self.stride = stride def forward(self, x, g1=None, g2=None): identity = x if g1 is not None: x = torch.cat((x, g1), 1) out = self.conv1(x) out = self.bn1(out) out = self.relu(out) if g2 is not None: out = torch.cat((g2,out), 1) out = self.conv2(out) out = self.bn2(out) if self.downsample is not None: identity = self.downsample(x) out += identity out = self.relu(out) return out class GeometryFeature(nn.Module): def __init__(self): super(GeometryFeature, self).__init__() def forward(self, z, vnorm, unorm, h, w, ch, cw, fh, fw): x = z*(0.5*h*(vnorm+1)-ch)/fh y = z*(0.5*w*(unorm+1)-cw)/fw return torch.cat((x, y, z),1) class SparseDownSampleClose(nn.Module): def __init__(self, stride): super(SparseDownSampleClose, self).__init__() self.pooling = nn.MaxPool2d(stride, stride) self.large_number = 600 def forward(self, d, mask): encode_d = - (1-mask)*self.large_number - d d = - self.pooling(encode_d) mask_result = self.pooling(mask) d_result = d - (1-mask_result)*self.large_number return d_result, mask_result def convbnrelu(in_channels, out_channels, kernel_size=3,stride=1, padding=1): return nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding, bias=False), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True) ) def deconvbnrelu(in_channels, out_channels, kernel_size=5, stride=2, padding=2, output_padding=1): return nn.Sequential( nn.ConvTranspose2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding, output_padding=output_padding, bias=False), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True) ) def weights_init(m): """Initialize filters with Gaussian random weights""" if isinstance(m, nn.Conv2d): n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels m.weight.data.normal_(0, math.sqrt(2. / n)) if m.bias is not None: m.bias.data.zero_() elif isinstance(m, nn.ConvTranspose2d): n = m.kernel_size[0] * m.kernel_size[1] * m.in_channels m.weight.data.normal_(0, math.sqrt(2. / n)) if m.bias is not None: m.bias.data.zero_() elif isinstance(m, nn.BatchNorm2d): m.weight.data.fill_(1) m.bias.data.zero_() def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1, bias=False, padding=1): """3x3 convolution with padding""" if padding >= 1: padding = dilation return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=padding, groups=groups, bias=bias, dilation=dilation) def conv1x1(in_planes, out_planes, stride=1, groups=1, bias=False): """1x1 convolution""" return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, groups=groups, bias=bias) class AddCoordsNp(): """Add coords to a tensor""" def __init__(self, x_dim=64, y_dim=64, with_r=False): self.x_dim = x_dim self.y_dim = y_dim self.with_r = with_r def call(self): """ input_tensor: (batch, x_dim, y_dim, c) """ xx_ones = np.ones([self.x_dim], dtype=np.int32) xx_ones = np.expand_dims(xx_ones, 1) xx_range = np.expand_dims(np.arange(self.y_dim), 0) xx_channel = np.matmul(xx_ones, xx_range) xx_channel = np.expand_dims(xx_channel, -1) yy_ones = np.ones([self.y_dim], dtype=np.int32) yy_ones = np.expand_dims(yy_ones, 0) yy_range = np.expand_dims(np.arange(self.x_dim), 1) yy_channel = np.matmul(yy_range, yy_ones) yy_channel = np.expand_dims(yy_channel, -1) xx_channel = xx_channel.astype('float32') / (self.y_dim - 1) yy_channel = yy_channel.astype('float32') / (self.x_dim - 1) xx_channel = xx_channel*2 - 1 yy_channel = yy_channel*2 - 1 ret = np.concatenate([xx_channel, yy_channel], axis=-1) if self.with_r: rr = np.sqrt( np.square(xx_channel-0.5) + np.square(yy_channel-0.5)) ret = np.concatenate([ret, rr], axis=-1) return ret # -------------------------------------------------------------- class Reg_BasicBlockGeo(nn.Module): def __init__(self, inplanes, planes, kernel_size, stride, padding, downsample=None, groups=1, base_width=64, dilation=1, norm_layer=nn.BatchNorm3d, geoplanes=3): super(Reg_BasicBlockGeo, self).__init__() self.conv1 = regconv3D(inplanes + geoplanes, planes, kernel_size=(1,3,3), stride=1, padding=(0,1,1)) self.bn1 = norm_layer(planes) self.relu = nn.ReLU(inplace=True) self.conv2 = regconv3D(planes+geoplanes, planes, kernel_size, stride, padding) self.bn2 = norm_layer(planes) if stride != 1 or inplanes != planes: downsample = nn.Sequential( regconv1x1(inplanes+geoplanes, planes, kernel_size, stride, padding), norm_layer(planes), ) self.downsample = downsample self.stride = stride def forward(self, x, g1=None, g2=None): identity = x if g1 is not None: x = torch.cat((x, g1), 1) out = self.conv1(x) out = self.bn1(out) out = self.relu(out) if g2 is not None: out = torch.cat((g2,out), 1) out = self.conv2(out) out = self.bn2(out) if self.downsample is not None: identity = self.downsample(x) out += identity out = self.relu(out) return out def regconv3D(in_planes, out_planes, kernel_size, stride, padding, groups=1, dilation=1, bias=False): return nn.Conv3d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=padding, groups=groups, bias=bias, dilation=dilation) def regconv1x1(in_planes, out_planes, kernel_size, stride, padding, groups=1, bias=False): return nn.Conv3d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=padding, groups=groups, bias=bias) def reg_deconvbnrelu(in_channels, out_channels, kernel_size, stride, padding, output_padding): return nn.Sequential( nn.ConvTranspose3d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding, output_padding=output_padding, bias=False), nn.BatchNorm3d(out_channels), nn.ReLU(inplace=True) ) ================================================ FILE: models/geomvsnet.py ================================================ # -*- coding: utf-8 -*- # @Description: Main network architecture for GeoMVSNet. # @Author: Zhe Zhang (doublez@stu.pku.edu.cn) # @Affiliation: Peking University (PKU) # @LastEditDate: 2023-09-07 import math import torch import torch.nn as nn import torch.nn.functional as F from models.submodules import homo_warping, init_inverse_range, schedule_inverse_range, FPN, Reg2d from models.geometry import GeoFeatureFusion, GeoRegNet2d from models.filter import frequency_domain_filter class GeoMVSNet(nn.Module): def __init__(self, levels, hypo_plane_num_stages, depth_interal_ratio_stages, feat_base_channel, reg_base_channel, group_cor_dim_stages): super(GeoMVSNet, self).__init__() self.levels = levels self.hypo_plane_num_stages = hypo_plane_num_stages self.depth_interal_ratio_stages = depth_interal_ratio_stages self.StageNet = StageNet() # feature settings self.FeatureNet = FPN(base_channels=feat_base_channel) self.coarest_separate_flag = True if self.coarest_separate_flag: self.CoarestFeatureNet = FPN(base_channels=feat_base_channel) self.GeoFeatureFusionNet = GeoFeatureFusion( convolutional_layer_encoding="z", mask_type="basic", add_origin_feat_flag=True) # cost regularization settings self.RegNet_stages = nn.ModuleList() self.group_cor_dim_stages = group_cor_dim_stages self.geo_reg_flag = True self.geo_reg_encodings = ['std', 'z', 'z', 'z'] # must use std in idx-0 for stage_idx in range(self.levels): in_dim = group_cor_dim_stages[stage_idx] if self.geo_reg_flag: self.RegNet_stages.append(GeoRegNet2d(input_channel=in_dim, base_channel=reg_base_channel, convolutional_layer_encoding=self.geo_reg_encodings[stage_idx])) else: self.RegNet_stages.append(Reg2d(input_channel=in_dim, base_channel=reg_base_channel)) # frequency domain filter settings self.curriculum_learning_rho_ratios = [9, 4, 2, 1] def forward(self, imgs, proj_matrices, intrinsics_matrices, depth_values, filename=None): features = [] if self.coarest_separate_flag: coarsest_features = [] for nview_idx in range(len(imgs)): img = imgs[nview_idx] features.append(self.FeatureNet(img)) # B C H W if self.coarest_separate_flag: coarsest_features.append(self.CoarestFeatureNet(img)) # coarse-to-fine outputs = {} for stage_idx in range(self.levels): stage_name = "stage{}".format(stage_idx + 1) B, C, H, W = features[0][stage_name].shape proj_matrices_stage = proj_matrices[stage_name] intrinsics_matrices_stage = intrinsics_matrices[stage_name] # @Note features if stage_idx == 0: if self.coarest_separate_flag: features_stage = [feat[stage_name] for feat in coarsest_features] else: features_stage = [feat[stage_name] for feat in features] elif stage_idx >= 1: features_stage = [feat[stage_name] for feat in features] ref_img_stage = F.interpolate(imgs[0], size=None, scale_factor=1./2**(3-stage_idx), mode="bilinear", align_corners=False) depth_last = F.interpolate(depth_last.unsqueeze(1), size=None, scale_factor=2, mode="bilinear", align_corners=False) confidence_last = F.interpolate(confidence_last.unsqueeze(1), size=None, scale_factor=2, mode="bilinear", align_corners=False) # reference feature features_stage[0] = self.GeoFeatureFusionNet( ref_img_stage, depth_last, confidence_last, depth_values, stage_idx, features_stage[0], intrinsics_matrices_stage ) # @Note depth hypos if stage_idx == 0: depth_hypo = init_inverse_range(depth_values, self.hypo_plane_num_stages[stage_idx], img[0].device, img[0].dtype, H, W) else: inverse_min_depth, inverse_max_depth = outputs_stage['inverse_min_depth'].detach(), outputs_stage['inverse_max_depth'].detach() depth_hypo = schedule_inverse_range(inverse_min_depth, inverse_max_depth, self.hypo_plane_num_stages[stage_idx], H, W) # B D H W # @Note cost regularization geo_reg_data = {} if self.geo_reg_flag: geo_reg_data['depth_values'] = depth_values if stage_idx >= 1 and self.geo_reg_encodings[stage_idx] == 'z': prob_volume_last = F.interpolate(prob_volume_last, size=None, scale_factor=2, mode="bilinear", align_corners=False) geo_reg_data["prob_volume_last"] = prob_volume_last outputs_stage = self.StageNet( stage_idx, features_stage, proj_matrices_stage, depth_hypo=depth_hypo, regnet=self.RegNet_stages[stage_idx], group_cor_dim=self.group_cor_dim_stages[stage_idx], depth_interal_ratio=self.depth_interal_ratio_stages[stage_idx], geo_reg_data=geo_reg_data ) # @Note frequency domain filter depth_est = outputs_stage['depth'] depth_est_filtered = frequency_domain_filter(depth_est, rho_ratio=self.curriculum_learning_rho_ratios[stage_idx]) outputs_stage['depth_filtered'] = depth_est_filtered depth_last = depth_est_filtered confidence_last = outputs_stage['photometric_confidence'] prob_volume_last = outputs_stage['prob_volume'] outputs[stage_name] = outputs_stage outputs.update(outputs_stage) return outputs class StageNet(nn.Module): def __init__(self, attn_temp=2): super(StageNet, self).__init__() self.attn_temp = attn_temp def forward(self, stage_idx, features, proj_matrices, depth_hypo, regnet, group_cor_dim, depth_interal_ratio, geo_reg_data=None): # @Note step1: feature extraction proj_matrices = torch.unbind(proj_matrices, 1) ref_feature, src_features = features[0], features[1:] ref_proj, src_projs = proj_matrices[0], proj_matrices[1:] B, D, H, W = depth_hypo.shape C = ref_feature.shape[1] # @Note step2: cost aggregation ref_volume = ref_feature.unsqueeze(2).repeat(1, 1, D, 1, 1) cor_weight_sum = 1e-8 cor_feats = 0 for src_idx, (src_fea, src_proj) in enumerate(zip(src_features, src_projs)): save_fn = None src_proj_new = src_proj[:, 0].clone() src_proj_new[:, :3, :4] = torch.matmul(src_proj[:, 1, :3, :3], src_proj[:, 0, :3, :4]) ref_proj_new = ref_proj[:, 0].clone() ref_proj_new[:, :3, :4] = torch.matmul(ref_proj[:, 1, :3, :3], ref_proj[:, 0, :3, :4]) warped_src = homo_warping(src_fea, src_proj_new, ref_proj_new, depth_hypo) # B C D H W warped_src = warped_src.reshape(B, group_cor_dim, C//group_cor_dim, D, H, W) ref_volume = ref_volume.reshape(B, group_cor_dim, C//group_cor_dim, D, H, W) cor_feat = (warped_src * ref_volume).mean(2) # B G D H W del warped_src, src_proj, src_fea cor_weight = torch.softmax(cor_feat.sum(1) / self.attn_temp, 1) / math.sqrt(C) # B D H W cor_weight_sum += cor_weight # B D H W cor_feats += cor_weight.unsqueeze(1) * cor_feat # B C D H W del cor_weight, cor_feat cost_volume = cor_feats / cor_weight_sum.unsqueeze(1) # B C D H W del cor_weight_sum, src_features # @Note step3: cost regularization if geo_reg_data == {}: # basic cost_reg = regnet(cost_volume) else: # probability volume geometry embedding cost_reg = regnet(cost_volume, stage_idx, geo_reg_data) del cost_volume prob_volume = F.softmax(cost_reg, dim=1) # B D H W # @Note step4: depth regression prob_max_indices = prob_volume.max(1, keepdim=True)[1] # B 1 H W depth = torch.gather(depth_hypo, 1, prob_max_indices).squeeze(1) # B H W with torch.no_grad(): photometric_confidence = prob_volume.max(1)[0] # B H W photometric_confidence = F.interpolate(photometric_confidence.unsqueeze(1), scale_factor=1, mode='bilinear', align_corners=True).squeeze(1) last_depth_itv = 1./depth_hypo[:,2,:,:] - 1./depth_hypo[:,1,:,:] inverse_min_depth = 1/depth + depth_interal_ratio * last_depth_itv # B H W inverse_max_depth = 1/depth - depth_interal_ratio * last_depth_itv # B H W output_stage = { "depth": depth, "photometric_confidence": photometric_confidence, "depth_hypo": depth_hypo, "prob_volume": prob_volume, "inverse_min_depth": inverse_min_depth, "inverse_max_depth": inverse_max_depth, } return output_stage ================================================ FILE: models/loss.py ================================================ # -*- coding: utf-8 -*- # @Description: Loss Functions (Sec 3.4 in the paper). # @Author: Zhe Zhang (doublez@stu.pku.edu.cn) # @Affiliation: Peking University (PKU) # @LastEditDate: 2023-09-07 import torch def geomvsnet_loss(inputs, depth_gt_ms, mask_ms, **kwargs): stage_lw = kwargs.get("stage_lw", [1, 1, 1, 1]) depth_values = kwargs.get("depth_values") depth_min, depth_max = depth_values[:,0], depth_values[:,-1] total_loss = torch.tensor(0.0, dtype=torch.float32, device=mask_ms["stage1"].device, requires_grad=False) pw_loss_stages = [] dds_loss_stages = [] for stage_idx, (stage_inputs, stage_key) in enumerate([(inputs[k], k) for k in inputs.keys() if "stage" in k]): depth = stage_inputs['depth_filtered'] prob_volume = stage_inputs['prob_volume'] depth_value = stage_inputs['depth_hypo'] depth_gt = depth_gt_ms[stage_key] mask = mask_ms[stage_key] > 0.5 # pw loss pw_loss = pixel_wise_loss(prob_volume, depth_gt, mask, depth_value) pw_loss_stages.append(pw_loss) # dds loss dds_loss = depth_distribution_similarity_loss(depth, depth_gt, mask, depth_min, depth_max) dds_loss_stages.append(dds_loss) # total loss lam1, lam2 = 0.8, 0.2 total_loss = total_loss + stage_lw[stage_idx] * (lam1 * pw_loss + lam2 * dds_loss) depth_pred = stage_inputs['depth'] depth_gt = depth_gt_ms[stage_key] epe = cal_metrics(depth_pred, depth_gt, mask, depth_min, depth_max) return total_loss, epe, pw_loss_stages, dds_loss_stages def pixel_wise_loss(prob_volume, depth_gt, mask, depth_value): mask_true = mask valid_pixel_num = torch.sum(mask_true, dim=[1,2])+1e-12 shape = depth_gt.shape depth_num = depth_value.shape[1] depth_value_mat = depth_value gt_index_image = torch.argmin(torch.abs(depth_value_mat-depth_gt.unsqueeze(1)), dim=1) gt_index_image = torch.mul(mask_true, gt_index_image.type(torch.float)) gt_index_image = torch.round(gt_index_image).type(torch.long).unsqueeze(1) gt_index_volume = torch.zeros(shape[0], depth_num, shape[1], shape[2]).type(mask_true.type()).scatter_(1, gt_index_image, 1) cross_entropy_image = -torch.sum(gt_index_volume * torch.log(prob_volume+1e-12), dim=1).squeeze(1) masked_cross_entropy_image = torch.mul(mask_true, cross_entropy_image) masked_cross_entropy = torch.sum(masked_cross_entropy_image, dim=[1, 2]) masked_cross_entropy = torch.mean(masked_cross_entropy / valid_pixel_num) pw_loss = masked_cross_entropy return pw_loss def depth_distribution_similarity_loss(depth, depth_gt, mask, depth_min, depth_max): depth_norm = depth * 128 / (depth_max - depth_min)[:,None,None] depth_gt_norm = depth_gt * 128 / (depth_max - depth_min)[:,None,None] M_bins = 48 kl_min = torch.min(torch.min(depth_gt), depth.mean()-3.*depth.std()) kl_max = torch.max(torch.max(depth_gt), depth.mean()+3.*depth.std()) bins = torch.linspace(kl_min, kl_max, steps=M_bins) kl_divs = [] for i in range(len(bins) - 1): bin_mask = (depth_gt >= bins[i]) & (depth_gt < bins[i+1]) merged_mask = mask & bin_mask if merged_mask.sum() > 0: p = depth_norm[merged_mask] q = depth_gt_norm[merged_mask] kl_div = torch.nn.functional.kl_div(torch.log(p)-torch.log(q), p, reduction='batchmean') kl_div = torch.log(kl_div) kl_divs.append(kl_div) dds_loss = sum(kl_divs) return dds_loss def cal_metrics(depth_pred, depth_gt, mask, depth_min, depth_max): depth_pred_norm = depth_pred * 128 / (depth_max - depth_min)[:,None,None] depth_gt_norm = depth_gt * 128 / (depth_max - depth_min)[:,None,None] abs_err = torch.abs(depth_pred_norm[mask] - depth_gt_norm[mask]) epe = abs_err.mean() err1= (abs_err<=1).float().mean()*100 err3 = (abs_err<=3).float().mean()*100 return epe # err1, err3 ================================================ FILE: models/submodules.py ================================================ # -*- coding: utf-8 -*- # @Description: Some sub-modules for the network. # @Author: Zhe Zhang (doublez@stu.pku.edu.cn) # @Affiliation: Peking University (PKU) # @LastEditDate: 2023-09-07 import torch import torch.nn as nn import torch.nn.functional as F class FPN(nn.Module): """FPN aligncorners downsample 4x""" def __init__(self, base_channels, gn=False): super(FPN, self).__init__() self.base_channels = base_channels self.conv0 = nn.Sequential( Conv2d(3, base_channels, 3, 1, padding=1, gn=gn), Conv2d(base_channels, base_channels, 3, 1, padding=1, gn=gn), ) self.conv1 = nn.Sequential( Conv2d(base_channels, base_channels * 2, 5, stride=2, padding=2, gn=gn), Conv2d(base_channels * 2, base_channels * 2, 3, 1, padding=1, gn=gn), Conv2d(base_channels * 2, base_channels * 2, 3, 1, padding=1, gn=gn), ) self.conv2 = nn.Sequential( Conv2d(base_channels * 2, base_channels * 4, 5, stride=2, padding=2, gn=gn), Conv2d(base_channels * 4, base_channels * 4, 3, 1, padding=1, gn=gn), Conv2d(base_channels * 4, base_channels * 4, 3, 1, padding=1, gn=gn), ) self.conv3 = nn.Sequential( Conv2d(base_channels * 4, base_channels * 8, 5, stride=2, padding=2, gn=gn), Conv2d(base_channels * 8, base_channels * 8, 3, 1, padding=1, gn=gn), Conv2d(base_channels * 8, base_channels * 8, 3, 1, padding=1, gn=gn), ) self.out_channels = [8 * base_channels] final_chs = base_channels * 8 self.inner1 = nn.Conv2d(base_channels * 4, final_chs, 1, bias=True) self.inner2 = nn.Conv2d(base_channels * 2, final_chs, 1, bias=True) self.inner3 = nn.Conv2d(base_channels * 1, final_chs, 1, bias=True) self.out1 = nn.Conv2d(final_chs, base_channels * 8, 1, bias=False) self.out2 = nn.Conv2d(final_chs, base_channels * 4, 3, padding=1, bias=False) self.out3 = nn.Conv2d(final_chs, base_channels * 2, 3, padding=1, bias=False) self.out4 = nn.Conv2d(final_chs, base_channels, 3, padding=1, bias=False) self.out_channels.append(base_channels * 4) self.out_channels.append(base_channels * 2) self.out_channels.append(base_channels) def forward(self, x): conv0 = self.conv0(x) conv1 = self.conv1(conv0) conv2 = self.conv2(conv1) conv3 = self.conv3(conv2) intra_feat = conv3 outputs = {} out1 = self.out1(intra_feat) intra_feat = F.interpolate(intra_feat, scale_factor=2, mode="bilinear", align_corners=True) + self.inner1(conv2) out2 = self.out2(intra_feat) intra_feat = F.interpolate(intra_feat, scale_factor=2, mode="bilinear", align_corners=True) + self.inner2(conv1) out3 = self.out3(intra_feat) intra_feat = F.interpolate(intra_feat, scale_factor=2, mode="bilinear", align_corners=True) + self.inner3(conv0) out4 = self.out4(intra_feat) outputs["stage1"] = out1 outputs["stage2"] = out2 outputs["stage3"] = out3 outputs["stage4"] = out4 return outputs class Reg2d(nn.Module): def __init__(self, input_channel=128, base_channel=32): super(Reg2d, self).__init__() self.conv0 = ConvBnReLU3D(input_channel, base_channel, kernel_size=(1,3,3), pad=(0,1,1)) self.conv1 = ConvBnReLU3D(base_channel, base_channel*2, kernel_size=(1,3,3), stride=(1,2,2), pad=(0,1,1)) self.conv2 = ConvBnReLU3D(base_channel*2, base_channel*2) self.conv3 = ConvBnReLU3D(base_channel*2, base_channel*4, kernel_size=(1,3,3), stride=(1,2,2), pad=(0,1,1)) self.conv4 = ConvBnReLU3D(base_channel*4, base_channel*4) self.conv5 = ConvBnReLU3D(base_channel*4, base_channel*8, kernel_size=(1,3,3), stride=(1,2,2), pad=(0,1,1)) self.conv6 = ConvBnReLU3D(base_channel*8, base_channel*8) self.conv7 = nn.Sequential( nn.ConvTranspose3d(base_channel*8, base_channel*4, kernel_size=(1,3,3), padding=(0,1,1), output_padding=(0,1,1), stride=(1,2,2), bias=False), nn.BatchNorm3d(base_channel*4), nn.ReLU(inplace=True)) self.conv9 = nn.Sequential( nn.ConvTranspose3d(base_channel*4, base_channel*2, kernel_size=(1,3,3), padding=(0,1,1), output_padding=(0,1,1), stride=(1,2,2), bias=False), nn.BatchNorm3d(base_channel*2), nn.ReLU(inplace=True)) self.conv11 = nn.Sequential( nn.ConvTranspose3d(base_channel*2, base_channel, kernel_size=(1,3,3), padding=(0,1,1), output_padding=(0,1,1), stride=(1,2,2), bias=False), nn.BatchNorm3d(base_channel), nn.ReLU(inplace=True)) self.prob = nn.Conv3d(8, 1, 1, stride=1, padding=0) def forward(self, x): conv0 = self.conv0(x) conv2 = self.conv2(self.conv1(conv0)) conv4 = self.conv4(self.conv3(conv2)) x = self.conv6(self.conv5(conv4)) x = conv4 + self.conv7(x) x = conv2 + self.conv9(x) x = conv0 + self.conv11(x) x = self.prob(x) return x.squeeze(1) def homo_warping(src_fea, src_proj, ref_proj, depth_values): # src_fea: [B, C, H, W] # src_proj: [B, 4, 4] # ref_proj: [B, 4, 4] # depth_values: [B, Ndepth] o [B, Ndepth, H, W] # out: [B, C, Ndepth, H, W] C = src_fea.shape[1] Hs,Ws = src_fea.shape[-2:] B,num_depth,Hr,Wr = depth_values.shape with torch.no_grad(): proj = torch.matmul(src_proj, torch.inverse(ref_proj)) rot = proj[:, :3, :3] # [B,3,3] trans = proj[:, :3, 3:4] # [B,3,1] y, x = torch.meshgrid([torch.arange(0, Hr, dtype=torch.float32, device=src_fea.device), torch.arange(0, Wr, dtype=torch.float32, device=src_fea.device)]) y = y.reshape(Hr*Wr) x = x.reshape(Hr*Wr) xyz = torch.stack((x, y, torch.ones_like(x))) # [3, H*W] xyz = torch.unsqueeze(xyz, 0).repeat(B, 1, 1) # [B, 3, H*W] rot_xyz = torch.matmul(rot, xyz) # [B, 3, H*W] rot_depth_xyz = rot_xyz.unsqueeze(2).repeat(1, 1, num_depth, 1) * depth_values.reshape(B, 1, num_depth, -1) # [B, 3, Ndepth, H*W] proj_xyz = rot_depth_xyz + trans.reshape(B, 3, 1, 1) # [B, 3, Ndepth, H*W] # FIXME divide 0 temp = proj_xyz[:, 2:3, :, :] temp[temp==0] = 1e-9 proj_xy = proj_xyz[:, :2, :, :] / temp # [B, 2, Ndepth, H*W] # proj_xy = proj_xyz[:, :2, :, :] / proj_xyz[:, 2:3, :, :] # [B, 2, Ndepth, H*W] proj_x_normalized = proj_xy[:, 0, :, :] / ((Ws - 1) / 2) - 1 proj_y_normalized = proj_xy[:, 1, :, :] / ((Hs - 1) / 2) - 1 proj_xy = torch.stack((proj_x_normalized, proj_y_normalized), dim=3) # [B, Ndepth, H*W, 2] grid = proj_xy if len(src_fea.shape)==4: warped_src_fea = F.grid_sample(src_fea, grid.reshape(B, num_depth * Hr, Wr, 2), mode='bilinear', padding_mode='zeros', align_corners=True) warped_src_fea = warped_src_fea.reshape(B, C, num_depth, Hr, Wr) elif len(src_fea.shape)==5: warped_src_fea = [] for d in range(src_fea.shape[2]): warped_src_fea.append(F.grid_sample(src_fea[:,:,d], grid.reshape(B, num_depth, Hr, Wr, 2)[:,d], mode='bilinear', padding_mode='zeros', align_corners=True)) warped_src_fea = torch.stack(warped_src_fea, dim=2) return warped_src_fea def init_inverse_range(cur_depth, ndepths, device, dtype, H, W): inverse_depth_min = 1. / cur_depth[:, 0] # (B,) inverse_depth_max = 1. / cur_depth[:, -1] itv = torch.arange(0, ndepths, device=device, dtype=dtype, requires_grad=False).reshape(1, -1,1,1).repeat(1, 1, H, W) / (ndepths - 1) # 1 D H W inverse_depth_hypo = inverse_depth_max[:,None, None, None] + (inverse_depth_min - inverse_depth_max)[:,None, None, None] * itv return 1./inverse_depth_hypo def schedule_inverse_range(inverse_min_depth, inverse_max_depth, ndepths, H, W): # cur_depth_min, (B, H, W) # cur_depth_max: (B, H, W) itv = torch.arange(0, ndepths, device=inverse_min_depth.device, dtype=inverse_min_depth.dtype, requires_grad=False).reshape(1, -1,1,1).repeat(1, 1, H//2, W//2) / (ndepths - 1) # 1 D H W inverse_depth_hypo = inverse_max_depth[:,None, :, :] + (inverse_min_depth - inverse_max_depth)[:,None, :, :] * itv # B D H W inverse_depth_hypo = F.interpolate(inverse_depth_hypo.unsqueeze(1), [ndepths, H, W], mode='trilinear', align_corners=True).squeeze(1) return 1./inverse_depth_hypo # -------------------------------------------------------------- def init_bn(module): if module.weight is not None: nn.init.ones_(module.weight) if module.bias is not None: nn.init.zeros_(module.bias) return def init_uniform(module, init_method): if module.weight is not None: if init_method == "kaiming": nn.init.kaiming_uniform_(module.weight) elif init_method == "xavier": nn.init.xavier_uniform_(module.weight) return class ConvBnReLU3D(nn.Module): def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, pad=1): super(ConvBnReLU3D, self).__init__() self.conv = nn.Conv3d(in_channels, out_channels, kernel_size, stride=stride, padding=pad, bias=False) self.bn = nn.BatchNorm3d(out_channels) def forward(self, x): return F.relu(self.bn(self.conv(x)), inplace=True) class Conv2d(nn.Module): def __init__(self, in_channels, out_channels, kernel_size, stride=1, relu=True, bn_momentum=0.1, init_method="xavier", gn=False, group_channel=8, **kwargs): super(Conv2d, self).__init__() bn = not gn self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride=stride, bias=(not bn), **kwargs) self.kernel_size = kernel_size self.stride = stride self.bn = nn.BatchNorm2d(out_channels, momentum=bn_momentum) if bn else None self.gn = nn.GroupNorm(int(max(1, out_channels / group_channel)), out_channels) if gn else None self.relu = relu def forward(self, x): x = self.conv(x) if self.bn is not None: x = self.bn(x) else: x = self.gn(x) if self.relu: x = F.relu(x, inplace=True) return x def init_weights(self, init_method): init_uniform(self.conv, init_method) if self.bn is not None: init_bn(self.bn) ================================================ FILE: models/utils/__init__.py ================================================ from models.utils.utils import * ================================================ FILE: models/utils/opts.py ================================================ # -*- coding: utf-8 -*- # @Description: Options settings & configurations for GeoMVSNet. # @Author: Zhe Zhang (doublez@stu.pku.edu.cn) # @Affiliation: Peking University (PKU) # @LastEditDate: 2023-09-07 import argparse def get_opts(): parser = argparse.ArgumentParser(description="args") # global settings parser.add_argument('--mode', default='train', help='train or test', choices=['train', 'test', 'val']) parser.add_argument('--which_dataset', default='dtu', choices=['dtu', 'tnt', 'blendedmvs'], help='which dataset for using') parser.add_argument('--n_views', type=int, default=5, help='num of view') parser.add_argument('--levels', type=int, default=4, help='num of stages') parser.add_argument('--hypo_plane_num_stages', type=str, default="8,8,4,4", help='num of hypothesis planes for each stage') parser.add_argument('--depth_interal_ratio_stages', type=str, default="0.5,0.5,0.5,1", help='depth interals for each stage') parser.add_argument("--feat_base_channel", type=int, default=8, help='channel num for base feature') parser.add_argument("--reg_base_channel", type=int, default=8, help='channel num for regularization') parser.add_argument('--group_cor_dim_stages', type=str, default="8,8,4,4", help='group correlation dim') parser.add_argument('--batch_size', type=int, default=1, help='batch size for training') parser.add_argument('--data_scale', type=str, choices=['mid', 'raw'], help='use mid or raw resolution') parser.add_argument('--trainpath', help='data path for training') parser.add_argument('--testpath', help='data path for testing') parser.add_argument('--trainlist', help='data list for training') parser.add_argument('--testlist', help='data list for testing') # training config parser.add_argument('--stage_lw', type=str, default="1,1,1,1", help='loss weight for different stages') parser.add_argument('--epochs', type=int, default=10, help='number of epochs to train') parser.add_argument('--lr_scheduler', type=str, default='MS', help='scheduler for learning rate') parser.add_argument('--lr', type=float, default=0.001, help='learning rate') parser.add_argument('--lrepochs', type=str, default="1,3,5,7,9,11,13,15:1.5", help='epoch ids to downscale lr and the downscale rate') parser.add_argument('--wd', type=float, default=0.0, help='weight decay') parser.add_argument('--summary_freq', type=int, default=100, help='print and summary frequency') parser.add_argument('--save_freq', type=int, default=1, help='save checkpoint frequency') parser.add_argument('--eval_freq', type=int, default=1, help='eval frequency') parser.add_argument('--robust_train', action='store_true',help='robust training') # testing config parser.add_argument('--split', type=str, choices=['intermediate', 'advanced'], help='intermediate|advanced for tanksandtemples') parser.add_argument('--img_mode', type=str, default='resize', choices=['resize', 'crop'], help='image resolution matching strategy for TNT dataset') parser.add_argument('--cam_mode', type=str, default='origin', choices=['origin', 'short_range'], help='camera parameter strategy for TNT dataset') parser.add_argument('--loadckpt', default=None, help='load a specific checkpoint') parser.add_argument('--logdir', default='./checkpoints/debug', help='the directory to save checkpoints/logs') parser.add_argument('--nolog', action='store_true', help='do not log into .log file') parser.add_argument('--notensorboard', action='store_true', help='do not log into tensorboard') parser.add_argument('--save_conf_all_stages', action='store_true', help='save confidence maps for all stages') parser.add_argument('--outdir', default='./outputs', help='output dir') parser.add_argument('--resume', action='store_true', help='continue to train the model') # pytorch config parser.add_argument('--device', default='cuda', help='device to use') parser.add_argument('--seed', type=int, default=1, metavar='S', help='random seed') parser.add_argument('--pin_m', action='store_true', help='data loader pin memory') parser.add_argument("--local_rank", type=int, default=0) return parser.parse_args() ================================================ FILE: models/utils/utils.py ================================================ # -*- coding: utf-8 -*- # @Description: Some useful utils. # @Author: Zhe Zhang (doublez@stu.pku.edu.cn) # @Affiliation: Peking University (PKU) # @LastEditDate: 2023-09-07 import random import numpy as np import torch import torchvision.utils as vutils # torch.no_grad warpper for functions def make_nograd_func(func): def wrapper(*f_args, **f_kwargs): with torch.no_grad(): ret = func(*f_args, **f_kwargs) return ret return wrapper # convert a function into recursive style to handle nested dict/list/tuple variables def make_recursive_func(func): def wrapper(vars): if isinstance(vars, list): return [wrapper(x) for x in vars] elif isinstance(vars, tuple): return tuple([wrapper(x) for x in vars]) elif isinstance(vars, dict): return {k: wrapper(v) for k, v in vars.items()} else: return func(vars) return wrapper @make_recursive_func def tensor2float(vars): if isinstance(vars, float): return vars elif isinstance(vars, torch.Tensor): return vars.data.item() else: raise NotImplementedError("invalid input type {} for tensor2float".format(type(vars))) @make_recursive_func def tensor2numpy(vars): if isinstance(vars, np.ndarray): return vars elif isinstance(vars, torch.Tensor): return vars.detach().cpu().numpy().copy() else: raise NotImplementedError("invalid input type {} for tensor2numpy".format(type(vars))) @make_recursive_func def tocuda(vars): if isinstance(vars, torch.Tensor): return vars.to(torch.device("cuda")) elif isinstance(vars, str): return vars else: raise NotImplementedError("invalid input type {} for tensor2numpy".format(type(vars))) def tb_save_scalars(logger, mode, scalar_dict, global_step): scalar_dict = tensor2float(scalar_dict) for key, value in scalar_dict.items(): if not isinstance(value, (list, tuple)): name = '{}/{}'.format(mode, key) logger.add_scalar(name, value, global_step) else: for idx in range(len(value)): name = '{}/{}_{}'.format(mode, key, idx) logger.add_scalar(name, value[idx], global_step) def tb_save_images(logger, mode, images_dict, global_step): images_dict = tensor2numpy(images_dict) def preprocess(name, img): if not (len(img.shape) == 3 or len(img.shape) == 4): raise NotImplementedError("invalid img shape {}:{} in save_images".format(name, img.shape)) if len(img.shape) == 3: img = img[:, np.newaxis, :, :] img = torch.from_numpy(img[:1]) return vutils.make_grid(img, padding=0, nrow=1, normalize=True, scale_each=True) for key, value in images_dict.items(): if not isinstance(value, (list, tuple)): name = '{}/{}'.format(mode, key) logger.add_image(name, preprocess(name, value), global_step) else: for idx in range(len(value)): name = '{}/{}_{}'.format(mode, key, idx) logger.add_image(name, preprocess(name, value[idx]), global_step) class DictAverageMeter(object): def __init__(self): self.data = {} self.count = 0 def update(self, new_input): self.count += 1 if len(self.data) == 0: for k, v in new_input.items(): if not isinstance(v, float): raise NotImplementedError("invalid data {}: {}".format(k, type(v))) self.data[k] = v else: for k, v in new_input.items(): if not isinstance(v, float): raise NotImplementedError("invalid data {}: {}".format(k, type(v))) self.data[k] += v def mean(self): return {k: v / self.count for k, v in self.data.items()} # a wrapper to compute metrics for each image individually def compute_metrics_for_each_image(metric_func): def wrapper(depth_est, depth_gt, mask, *args): batch_size = depth_gt.shape[0] results = [] # compute result one by one for idx in range(batch_size): ret = metric_func(depth_est[idx], depth_gt[idx], mask[idx], *args) results.append(ret) return torch.stack(results).mean() return wrapper @make_nograd_func @compute_metrics_for_each_image def Thres_metrics(depth_est, depth_gt, mask, thres): assert isinstance(thres, (int, float)) depth_est, depth_gt = depth_est[mask], depth_gt[mask] errors = torch.abs(depth_est - depth_gt) err_mask = errors > thres return torch.mean(err_mask.float()) # NOTE: please do not use this to build up training loss @make_nograd_func @compute_metrics_for_each_image def AbsDepthError_metrics(depth_est, depth_gt, mask, thres=None): depth_est, depth_gt = depth_est[mask], depth_gt[mask] error = (depth_est - depth_gt).abs() if thres is not None: error = error[(error >= float(thres[0])) & (error <= float(thres[1]))] if error.shape[0] == 0: return torch.tensor(0, device=error.device, dtype=error.dtype) return torch.mean(error) import torch.distributed as dist def synchronize(): """ Helper function to synchronize (barrier) among all processes when using distributed training """ if not dist.is_available(): return if not dist.is_initialized(): return world_size = dist.get_world_size() if world_size == 1: return dist.barrier() def get_world_size(): if not dist.is_available(): return 1 if not dist.is_initialized(): return 1 return dist.get_world_size() def reduce_scalar_outputs(scalar_outputs): world_size = get_world_size() if world_size < 2: return scalar_outputs with torch.no_grad(): names = [] scalars = [] for k in sorted(scalar_outputs.keys()): names.append(k) scalars.append(scalar_outputs[k]) scalars = torch.stack(scalars, dim=0) dist.reduce(scalars, dst=0) if dist.get_rank() == 0: # only main process gets accumulated, so only divide by # world_size in this case scalars /= world_size reduced_scalars = {k: v for k, v in zip(names, scalars)} return reduced_scalars import torch from bisect import bisect_right class WarmupMultiStepLR(torch.optim.lr_scheduler._LRScheduler): def __init__( self, optimizer, milestones, gamma=0.1, warmup_factor=1.0 / 3, warmup_iters=500, warmup_method="linear", last_epoch=-1, ): if not list(milestones) == sorted(milestones): raise ValueError( "Milestones should be a list of" " increasing integers. Got {}", milestones, ) if warmup_method not in ("constant", "linear"): raise ValueError( "Only 'constant' or 'linear' warmup_method accepted" "got {}".format(warmup_method) ) self.milestones = milestones self.gamma = gamma self.warmup_factor = warmup_factor self.warmup_iters = warmup_iters self.warmup_method = warmup_method super(WarmupMultiStepLR, self).__init__(optimizer, last_epoch) def get_lr(self): warmup_factor = 1 if self.last_epoch < self.warmup_iters: if self.warmup_method == "constant": warmup_factor = self.warmup_factor elif self.warmup_method == "linear": alpha = float(self.last_epoch) / self.warmup_iters warmup_factor = self.warmup_factor * (1 - alpha) + alpha return [ base_lr * warmup_factor * self.gamma ** bisect_right(self.milestones, self.last_epoch) for base_lr in self.base_lrs ] def set_random_seed(seed): random.seed(seed) np.random.seed(seed) torch.manual_seed(seed) torch.cuda.manual_seed_all(seed) ================================================ FILE: outputs/visual.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "- @Description: Juputer notebook for visualizing depth maps.\n", "- @Author: Zhe Zhang (doublez@stu.pku.edu.cn)\n", "- @Affiliation: Peking University (PKU)\n", "- @LastEditDate: 2023-09-07" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecutionIndicator": { "show": true }, "tags": [] }, "outputs": [], "source": [ "import sys, os\n", "sys.path.append('../')\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import re\n", "\n", "\n", "def read_pfm(filename):\n", " file = open(filename, 'rb')\n", " color = None\n", " width = None\n", " height = None\n", " scale = None\n", " endian = None\n", "\n", " header = file.readline().decode('utf-8').rstrip()\n", " if header == 'PF':\n", " color = True\n", " elif header == 'Pf':\n", " color = False\n", " else:\n", " raise Exception('Not a PFM file.')\n", "\n", " dim_match = re.match(r'^(\\d+)\\s(\\d+)\\s$', file.readline().decode('utf-8'))\n", " if dim_match:\n", " width, height = map(int, dim_match.groups())\n", " else:\n", " raise Exception('Malformed PFM header.')\n", "\n", " scale = float(file.readline().rstrip())\n", " if scale < 0: # little-endian\n", " endian = '<'\n", " scale = -scale\n", " else:\n", " endian = '>' # big-endian\n", "\n", " data = np.fromfile(file, endian + 'f')\n", " shape = (height, width, 3) if color else (height, width)\n", "\n", " data = np.reshape(data, shape)\n", " data = np.flipud(data)\n", " file.close()\n", " return data, scale\n", "\n", "\n", "def read_depth(filename):\n", " depth = read_pfm(filename)[0]\n", " return np.array(depth, dtype=np.float32)\n", "\n", "\n", "assert False" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## DTU" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecutionIndicator": { "show": true }, "tags": [] }, "outputs": [], "source": [ "exp_name = 'dtu/geomvsnet'\n", "depth_name = \"00000009.pfm\"\n", "\n", "scans = os.listdir(os.path.join(exp_name))\n", "scans = list(filter(lambda x: x.startswith(\"scan\"), scans))\n", "scans.sort(key=lambda x: int(x[4:]))\n", "for scan in scans:\n", " depth_filename = os.path.join(exp_name, scan, \"depth_est\", depth_name)\n", " if not os.path.exists(depth_filename): continue\n", " depth = read_depth(depth_filename)\n", "\n", " confidence_filename = os.path.join(exp_name, scan, \"confidence\", depth_name)\n", " confidence = read_depth(confidence_filename)\n", "\n", " print(scan, depth_name)\n", "\n", " plt.figure(figsize=(12, 12))\n", " plt.subplot(1, 2, 1)\n", " plt.xticks([]), plt.yticks([]), plt.axis('off')\n", " plt.imshow(depth, 'viridis', vmin=500, vmax=830)\n", "\n", " plt.subplot(1, 2, 2)\n", " plt.xticks([]), plt.yticks([]), plt.axis('off')\n", " plt.imshow(confidence, 'viridis')\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## TNT" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "exp_name = './tnt/blend/geomvsnet/'\n", "depth_name = \"00000009.pfm\"\n", "\n", "with open(\"../datasets/lists/tnt/intermediate.txt\") as f:\n", " scans_i = [line.rstrip() for line in f.readlines()]\n", "\n", "with open(\"../datasets/lists/tnt/advanced.txt\") as f:\n", " scans_a = [line.rstrip() for line in f.readlines()]\n", "\n", "scans = scans_i + scans_a\n", "\n", "for scan in scans:\n", "\n", " depth_filename = os.path.join(exp_name, scan, \"depth_est\", depth_name)\n", " if not os.path.exists(depth_filename): continue\n", " depth = read_depth(depth_filename)\n", "\n", " print(scan, depth_name, depth.shape)\n", "\n", " plt.figure(figsize=(12, 12))\n", " plt.xticks([]), plt.yticks([]), plt.axis('off')\n", " plt.imshow(depth, 'viridis', vmin=0, vmax=10)\n", "\n", " plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.12" }, "vscode": { "interpreter": { "hash": "d253918f84404206ad3cf9c22ee3709ef6e34cbea610b0ac9787033d60da5e03" } } }, "nbformat": 4, "nbformat_minor": 4 } ================================================ FILE: requirements.txt ================================================ torch==1.10.0 torchvision opencv-python numpy==1.18.1 pillow scipy tensorboardX plyfile open3d jupyter notebook ================================================ FILE: scripts/blend/train_blend.sh ================================================ #!/usr/bin/env bash source scripts/data_path.sh THISNAME="geomvsnet" LOG_DIR="./checkpoints/blend/"$THISNAME if [ ! -d $LOG_DIR ]; then mkdir -p $LOG_DIR fi CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node=4 train.py ${@} \ --which_dataset="blendedmvs" --epochs=16 --logdir=$LOG_DIR \ --trainpath=$BLENDEDMVS_ROOT --testpath=$BLENDEDMVS_ROOT \ --trainlist="datasets/lists/blendedmvs/low_res_all.txt" --testlist="datasets/lists/blendedmvs/val.txt" \ \ --n_views="7" --batch_size=2 --lr=0.001 --robust_train \ --lr_scheduler="onecycle" ================================================ FILE: scripts/data_path.sh ================================================ #!/usr/bin/env bash # DTU DTU_TRAIN_ROOT="[/path/to/]dtu" DTU_TEST_ROOT="[/path/to/]dtu-test" DTU_QUANTITATIVE_ROOT="[/path/to/]dtu-evaluation" # Tanks and Temples TNT_ROOT="[/path/to/]tnt" # BlendedMVS BLENDEDMVS_ROOT="[/path/to/]blendmvs" ================================================ FILE: scripts/dtu/fusion_dtu.sh ================================================ #!/usr/bin/env bash source scripts/data_path.sh THISNAME="geomvsnet" FUSION_METHOD="open3d" LOG_DIR="./checkpoints/dtu/"$THISNAME DTU_OUT_DIR="./outputs/dtu/"$THISNAME if [ $FUSION_METHOD = "pcd" ] ; then python3 fusions/dtu/pcd.py ${@} \ --testpath=$DTU_TEST_ROOT --testlist="datasets/lists/dtu/test.txt" \ --outdir=$DTU_OUT_DIR --logdir=$LOG_DIR --nolog \ --num_worker=1 \ \ --thres_view=4 --conf=0.5 \ \ --plydir=$DTU_OUT_DIR"/pcd_fusion_plys/" elif [ $FUSION_METHOD = "gipuma" ] ; then # source [/path/to/]anaconda3/etc/profile.d/conda.sh # conda activate fusibile CUDA_VISIBLE_DEVICES=0 python2 fusions/dtu/gipuma.py \ --root_dir=$DTU_TEST_ROOT --list_file="datasets/lists/dtu/test.txt" \ --fusibile_exe_path="fusions/fusibile" --out_folder="fusibile_fused" \ --depth_folder=$DTU_OUT_DIR \ --downsample_factor=1 \ \ --prob_threshold=0.5 --disp_threshold=0.25 --num_consistent=3 \ \ --plydir=$DTU_OUT_DIR"/gipuma_fusion_plys/" elif [ $FUSION_METHOD = "open3d" ] ; then CUDA_VISIBLE_DEVICES=0 python fusions/dtu/_open3d.py --device="cuda" \ --root_path=$DTU_TEST_ROOT \ --depth_path=$DTU_OUT_DIR \ --data_list="datasets/lists/dtu/test.txt" \ \ --prob_thresh=0.3 --dist_thresh=0.2 --num_consist=4 \ \ --ply_path=$DTU_OUT_DIR"/open3d_fusion_plys/" fi ================================================ FILE: scripts/dtu/matlab_quan_dtu.sh ================================================ #!/usr/bin/env bash source scripts/data_path.sh OUTNAME="geomvsnet" FUSIONMETHOD="open3d" # Evaluation echo "<<<<<<<<<< start parallel evaluation" METHOD='mvsnet' PLYPATH='../../../outputs/dtu/'$OUTNAME'/'$FUSIONMETHOD'_fusion_plys/' RESULTPATH='../../../outputs/dtu/'$OUTNAME'/'$FUSIONMETHOD'_quantitative/' LOGPATH='outputs/dtu/'$OUTNAME'/'$FUSIONMETHOD'_quantitative/'$OUTNAME'.log' mkdir -p 'outputs/dtu/'$OUTNAME'/'$FUSIONMETHOD'_quantitative/' set_array=(1 4 9 10 11 12 13 15 23 24 29 32 33 34 48 49 62 75 77 110 114 118) num_at_once=2 # 1 2 4 5 7 11 22 times=`expr $((${#set_array[*]} / $num_at_once))` remain=`expr $((${#set_array[*]} - $num_at_once * $times))` this_group_num=0 pos=0 for ((t=0; t<$times; t++)) do if [ "$t" -ge `expr $(($times-$remain))` ] ; then this_group_num=`expr $(($num_at_once + 1))` else this_group_num=$num_at_once fi for set in "${set_array[@]:pos:this_group_num}" do matlab -nodesktop -nosplash -r "cd datasets/evaluations/dtu_parallel; dataPath='$DTU_QUANTITATIVE_ROOT'; plyPath='$PLYPATH'; resultsPath='$RESULTPATH'; method_string='$METHOD'; thisset='$set'; BaseEvalMain_web" & done wait pos=`expr $(($pos + $this_group_num))` done wait SET=[1,4,9,10,11,12,13,15,23,24,29,32,33,34,48,49,62,75,77,110,114,118] matlab -nodesktop -nosplash -r "cd datasets/evaluations/dtu_parallel; resultsPath='$RESULTPATH'; method_string='$METHOD'; set='$SET'; ComputeStat_web" > $LOGPATH ================================================ FILE: scripts/dtu/test_dtu.sh ================================================ #!/usr/bin/env bash source scripts/data_path.sh THISNAME="geomvsnet" BESTEPOCH="geomvsnet_release" LOG_DIR="./checkpoints/dtu/"$THISNAME DTU_CKPT_FILE=$LOG_DIR"/model_"$BESTEPOCH".ckpt" DTU_OUT_DIR="./outputs/dtu/"$THISNAME CUDA_VISIBLE_DEVICES=0 python3 test.py ${@} \ --which_dataset="dtu" --loadckpt=$DTU_CKPT_FILE --batch_size=1 \ --outdir=$DTU_OUT_DIR --logdir=$LOG_DIR --nolog \ --testpath=$DTU_TEST_ROOT --testlist="datasets/lists/dtu/test.txt" \ \ --data_scale="raw" --n_views="5" ================================================ FILE: scripts/dtu/train_dtu.sh ================================================ #!/usr/bin/env bash source scripts/data_path.sh THISNAME="geomvsnet" LOG_DIR="./checkpoints/dtu/"$THISNAME if [ ! -d $LOG_DIR ]; then mkdir -p $LOG_DIR fi CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node=4 train.py ${@} \ --which_dataset="dtu" --epochs=16 --logdir=$LOG_DIR \ --trainpath=$DTU_TRAIN_ROOT --testpath=$DTU_TRAIN_ROOT \ --trainlist="datasets/lists/dtu/train.txt" --testlist="datasets/lists/dtu/test.txt" \ \ --data_scale="mid" --n_views="5" --batch_size=4 --lr=0.002 --robust_train \ --lrepochs="1,3,5,7,9,11,13,15:1.5" ================================================ FILE: scripts/dtu/train_dtu_raw.sh ================================================ #!/usr/bin/env bash source scripts/data_path.sh THISNAME="geomvsnet_raw" LOG_DIR="./checkpoints/dtu/"$THISNAME if [ ! -d $LOG_DIR ]; then mkdir -p $LOG_DIR fi CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node=4 train.py ${@} \ --which_dataset="dtu" --epochs=16 --logdir=$LOG_DIR \ --trainpath=$DTU_TRAIN_ROOT --testpath=$DTU_TRAIN_ROOT \ --trainlist="datasets/lists/dtu/train.txt" --testlist="datasets/lists/dtu/test.txt" \ \ --data_scale="raw" --n_views="5" --batch_size=1 --lr=0.0005 --robust_train \ --lrepochs="1,3,5,7,9,11,13,15:1.5" ================================================ FILE: scripts/tnt/fusion_tnt.sh ================================================ #!/usr/bin/env bash source scripts/data_path.sh THISNAME="blend/geomvsnet" LOG_DIR="./checkpoints/tnt/"$THISNAME TNT_OUT_DIR="./outputs/tnt/"$THISNAME # Intermediate python3 fusions/tnt/dypcd.py ${@} \ --root_dir=$TNT_ROOT --list_file="datasets/lists/tnt/intermediate.txt" --split="intermediate" \ --out_dir=$TNT_OUT_DIR --ply_path=$TNT_OUT_DIR"/dypcd_fusion_plys" \ --img_mode="resize" --cam_mode="origin" --single_processor # Advanced python3 fusions/tnt/dypcd.py ${@} \ --root_dir=$TNT_ROOT --list_file="datasets/lists/tnt/advanced.txt" --split="advanced" \ --out_dir=$TNT_OUT_DIR --ply_path=$TNT_OUT_DIR"/dypcd_fusion_plys" \ --img_mode="resize" --cam_mode="origin" --single_processor ================================================ FILE: scripts/tnt/test_tnt.sh ================================================ #!/usr/bin/env bash source scripts/data_path.sh THISNAME="blend/geomvsnet" BESTEPOCH="15" LOG_DIR="./checkpoints/"$THISNAME CKPT_FILE=$LOG_DIR"/model_"$BESTEPOCH".ckpt" TNT_OUT_DIR="./outputs/tnt/"$THISNAME # Intermediate CUDA_VISIBLE_DEVICES=0 python3 test.py ${@} \ --which_dataset="tnt" --loadckpt=$CKPT_FILE --batch_size=1 \ --outdir=$TNT_OUT_DIR --logdir=$LOG_DIR --nolog \ --testpath=$TNT_ROOT --testlist="datasets/lists/tnt/intermediate.txt" --split="intermediate" \ \ --n_views="11" --img_mode="resize" --cam_mode="origin" # Advanced CUDA_VISIBLE_DEVICES=0 python3 test.py ${@} \ --which_dataset="tnt" --loadckpt=$CKPT_FILE --batch_size=1 \ --outdir=$TNT_OUT_DIR --logdir=$LOG_DIR --nolog \ --testpath=$TNT_ROOT --testlist="datasets/lists/tnt/advanced.txt" --split="advanced" \ \ --n_views="11" --img_mode="resize" --cam_mode="origin" ================================================ FILE: test.py ================================================ # -*- coding: utf-8 -*- # @Description: Main process of network testing. # @Author: Zhe Zhang (doublez@stu.pku.edu.cn) # @Affiliation: Peking University (PKU) # @LastEditDate: 2023-09-07 import os, time, sys, gc, cv2, logging import numpy as np import torch import torch.nn as nn import torch.nn.parallel import torch.backends.cudnn as cudnn from torch.utils.data import DataLoader from datasets.data_io import * from datasets.dtu import DTUDataset from datasets.tnt import TNTDataset from models.geomvsnet import GeoMVSNet from models.utils import * from models.utils.opts import get_opts cudnn.benchmark = True args = get_opts() def test(): total_time = 0 with torch.no_grad(): for batch_idx, sample in enumerate(TestImgLoader): sample_cuda = tocuda(sample) start_time = time.time() # @Note GeoMVSNet main outputs = model( sample_cuda["imgs"], sample_cuda["proj_matrices"], sample_cuda["intrinsics_matrices"], sample_cuda["depth_values"], sample["filename"] ) end_time = time.time() total_time += end_time - start_time outputs = tensor2numpy(outputs) del sample_cuda filenames = sample["filename"] cams = sample["proj_matrices"]["stage{}".format(args.levels)].numpy() imgs = sample["imgs"] logger.info('Iter {}/{}, Time:{:.3f} Res:{}'.format(batch_idx, len(TestImgLoader), end_time - start_time, imgs[0].shape)) for filename, cam, img, depth_est, photometric_confidence in zip(filenames, cams, imgs, outputs["depth"], outputs["photometric_confidence"]): img = img[0].numpy() # ref view cam = cam[0] # ref cam depth_filename = os.path.join(args.outdir, filename.format('depth_est', '.pfm')) confidence_filename = os.path.join(args.outdir, filename.format('confidence', '.pfm')) cam_filename = os.path.join(args.outdir, filename.format('cams', '_cam.txt')) img_filename = os.path.join(args.outdir, filename.format('images', '.jpg')) os.makedirs(depth_filename.rsplit('/', 1)[0], exist_ok=True) os.makedirs(confidence_filename.rsplit('/', 1)[0], exist_ok=True) if args.which_dataset == 'dtu': os.makedirs(cam_filename.rsplit('/', 1)[0], exist_ok=True) os.makedirs(img_filename.rsplit('/', 1)[0], exist_ok=True) # save depth maps save_pfm(depth_filename, depth_est) # save confidence maps confidence_list = [outputs['stage{}'.format(i)]['photometric_confidence'].squeeze(0) for i in range(1,5)] photometric_confidence = confidence_list[-1] if not args.save_conf_all_stages: save_pfm(confidence_filename, photometric_confidence) else: for stage_idx, photometric_confidence in enumerate(confidence_list): if stage_idx != args.levels - 1: confidence_filename = os.path.join(args.outdir, filename.format('confidence', "_stage"+str(stage_idx)+'.pfm')) else: confidence_filename = os.path.join(args.outdir, filename.format('confidence', '.pfm')) save_pfm(confidence_filename, photometric_confidence) # save cams, img if args.which_dataset == 'dtu': write_cam(cam_filename, cam) img = np.clip(np.transpose(img, (1, 2, 0)) * 255, 0, 255).astype(np.uint8) img_bgr = cv2.cvtColor(img, cv2.COLOR_RGB2BGR) cv2.imwrite(img_filename, img_bgr) torch.cuda.empty_cache() gc.collect() return total_time, len(TestImgLoader) def initLogger(): logger = logging.getLogger() logger.setLevel(logging.INFO) curTime = time.strftime('%Y%m%d-%H%M', time.localtime(time.time())) if args.which_dataset == 'tnt': logfile = os.path.join(args.logdir, 'TNT-test-' + curTime + '.log') else: logfile = os.path.join(args.logdir, 'test-' + curTime + '.log') formatter = logging.Formatter("%(asctime)s - %(filename)s[line:%(lineno)d] - %(levelname)s: %(message)s") if not args.nolog: fileHandler = logging.FileHandler(logfile, mode='a') fileHandler.setFormatter(formatter) logger.addHandler(fileHandler) consoleHandler = logging.StreamHandler(sys.stdout) consoleHandler.setFormatter(formatter) logger.addHandler(consoleHandler) logger.info("Logger initialized.") logger.info("Writing logs to file: {}".format(logfile)) logger.info("Current time: {}".format(curTime)) settings_str = "All settings:\n" for k,v in vars(args).items(): settings_str += '{0}: {1}\n'.format(k,v) logger.info(settings_str) return logger if __name__ == '__main__': logger = initLogger() # dataset, dataloader if args.which_dataset == 'dtu': test_dataset = DTUDataset(args.testpath, args.testlist, "test", args.n_views, max_wh=(1600, 1200)) elif args.which_dataset == 'tnt': test_dataset = TNTDataset(args.testpath, args.testlist, split=args.split, n_views=args.n_views, img_wh=(-1, 1024), cam_mode=args.cam_mode, img_mode=args.img_mode) TestImgLoader = DataLoader(test_dataset, args.batch_size, shuffle=False, num_workers=4, drop_last=False) # @Note GeoMVSNet model model = GeoMVSNet( levels=args.levels, hypo_plane_num_stages=[int(n) for n in args.hypo_plane_num_stages.split(",")], depth_interal_ratio_stages=[float(ir) for ir in args.depth_interal_ratio_stages.split(",")], feat_base_channel=args.feat_base_channel, reg_base_channel=args.reg_base_channel, group_cor_dim_stages=[int(n) for n in args.group_cor_dim_stages.split(",")], ) logger.info("loading model {}".format(args.loadckpt)) state_dict = torch.load(args.loadckpt, map_location=torch.device("cpu")) model.load_state_dict(state_dict['model'], strict=False) model.cuda() model.eval() test() ================================================ FILE: train.py ================================================ # -*- coding: utf-8 -*- # @Description: Main process of network training & evaluation. # @Author: Zhe Zhang (doublez@stu.pku.edu.cn) # @Affiliation: Peking University (PKU) # @LastEditDate: 2023-09-07 import os, sys, time, gc, datetime, logging, json import torch import torch.nn as nn import torch.nn.parallel import torch.backends.cudnn as cudnn import torch.optim as optim import torch.distributed as dist from torch.utils.data import DataLoader from tensorboardX import SummaryWriter from datasets.dtu import DTUDataset from datasets.blendedmvs import BlendedMVSDataset from models.geomvsnet import GeoMVSNet from models.loss import geomvsnet_loss from models.utils import * from models.utils.opts import get_opts cudnn.benchmark = True num_gpus = int(os.environ["WORLD_SIZE"]) if "WORLD_SIZE" in os.environ else 1 is_distributed = num_gpus > 1 args = get_opts() def train(model, model_loss, optimizer, TrainImgLoader, TestImgLoader, start_epoch, args): if args.lr_scheduler == 'MS': milestones = [len(TrainImgLoader) * int(epoch_idx) for epoch_idx in args.lrepochs.split(':')[0].split(',')] lr_gamma = 1 / float(args.lrepochs.split(':')[1]) lr_scheduler = WarmupMultiStepLR(optimizer, milestones, gamma=lr_gamma, warmup_factor=1.0/3, warmup_iters=500, last_epoch=len(TrainImgLoader) * start_epoch - 1) elif args.lr_scheduler == 'cos': lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=int(args.epochs*len(TrainImgLoader)), eta_min=0) elif args.lr_scheduler == 'onecycle': lr_scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=args.lr, total_steps=int(args.epochs*len(TrainImgLoader))) elif args.lr_scheduler == 'lambda': lr_scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda epoch: 0.9 ** ((epoch-1) / len(TrainImgLoader)), last_epoch=len(TrainImgLoader)*start_epoch-1) for epoch_idx in range(start_epoch, args.epochs): logger.info('Epoch {}:'.format(epoch_idx)) global_step = len(TrainImgLoader) * epoch_idx # training for batch_idx, sample in enumerate(TrainImgLoader): start_time = time.time() global_step = len(TrainImgLoader) * epoch_idx + batch_idx do_summary = global_step % args.summary_freq == 0 loss, scalar_outputs, image_outputs = train_sample(model, model_loss, optimizer, sample, args) lr_scheduler.step() if (not is_distributed) or (dist.get_rank() == 0): if do_summary: if not args.notensorboard: tb_save_scalars(tb_writer, 'train', scalar_outputs, global_step) tb_save_images(tb_writer, 'train', image_outputs, global_step) logger.info("Epoch {}/{}, Iter {}/{}, 2mm_err={:.3f} | lr={:.6f}, train_loss={:.3f}, abs_err={:.3f}, pw_loss={:.3f}, dds_loss={:.3f}, time={:.3f}".format( epoch_idx, args.epochs, batch_idx, len(TrainImgLoader), scalar_outputs["thres2mm_error"], optimizer.param_groups[0]["lr"], loss, scalar_outputs["abs_depth_error"], scalar_outputs["s3_pw_loss"], scalar_outputs["s3_dds_loss"], time.time() - start_time)) del scalar_outputs, image_outputs # save checkpoint if (not is_distributed) or (dist.get_rank() == 0): if ((epoch_idx + 1) % args.save_freq == 0) or (epoch_idx == args.epochs-1): torch.save({ 'epoch': epoch_idx, 'model': model.module.state_dict(), 'optimizer': optimizer.state_dict()}, "{}/model_{:0>2}.ckpt".format(args.logdir, epoch_idx)) gc.collect() # testing if (epoch_idx % args.eval_freq == 0) or (epoch_idx == args.epochs - 1): avg_test_scalars = DictAverageMeter() for batch_idx, sample in enumerate(TestImgLoader): start_time = time.time() global_step = len(TrainImgLoader) * epoch_idx + batch_idx do_summary = global_step % args.summary_freq == 0 loss, scalar_outputs, image_outputs = test_sample_depth(model, model_loss, sample, args) if (not is_distributed) or (dist.get_rank() == 0): if do_summary: if not args.notensorboard: tb_save_scalars(tb_writer, 'test', scalar_outputs, global_step) tb_save_images(tb_writer, 'test', image_outputs, global_step) logger.info( "Epoch {}/{}, Iter {}/{}, 2mm_err={:.3f} | lr={:.6f}, test_loss={:.3f}, abs_err={:.3f}, pw_loss={:.3f}, dds_loss={:.3f}, time={:.3f}".format( epoch_idx, args.epochs, batch_idx, len(TestImgLoader), scalar_outputs["thres2mm_error"], optimizer.param_groups[0]["lr"], loss, scalar_outputs["abs_depth_error"], scalar_outputs["s3_pw_loss"], scalar_outputs["s3_dds_loss"], time.time() - start_time)) avg_test_scalars.update(scalar_outputs) del scalar_outputs, image_outputs if (not is_distributed) or (dist.get_rank() == 0): if not args.notensorboard: tb_save_scalars(tb_writer, 'fulltest', avg_test_scalars.mean(), global_step) logger.info("avg_test_scalars: " + json.dumps(avg_test_scalars.mean())) gc.collect() def train_sample(model, model_loss, optimizer, sample, args): model.train() optimizer.zero_grad() sample_cuda = tocuda(sample) depth_gt_ms, mask_ms = sample_cuda["depth"], sample_cuda["mask"] depth_gt, mask = depth_gt_ms["stage{}".format(args.levels)], mask_ms["stage{}".format(args.levels)] # @Note GeoMVSNet main outputs = model( sample_cuda["imgs"], sample_cuda["proj_matrices"], sample_cuda["intrinsics_matrices"], sample_cuda["depth_values"] ) depth_est = outputs["depth"] loss, epe, pw_loss_stages, dds_loss_stages = model_loss( outputs, depth_gt_ms, mask_ms, stage_lw=[float(e) for e in args.stage_lw.split(",") if e], depth_values=sample_cuda["depth_values"] ) loss.backward() optimizer.step() scalar_outputs = { "loss": loss, "epe": epe, "s0_pw_loss": pw_loss_stages[0], "s1_pw_loss": pw_loss_stages[1], "s2_pw_loss": pw_loss_stages[2], "s3_pw_loss": pw_loss_stages[3], "s0_dds_loss": dds_loss_stages[0], "s1_dds_loss": dds_loss_stages[1], "s2_dds_loss": dds_loss_stages[2], "s3_dds_loss": dds_loss_stages[3], "abs_depth_error": AbsDepthError_metrics(depth_est, depth_gt, mask > 0.5), "thres2mm_error": Thres_metrics(depth_est, depth_gt, mask > 0.5, 2), "thres4mm_error": Thres_metrics(depth_est, depth_gt, mask > 0.5, 4), "thres8mm_error": Thres_metrics(depth_est, depth_gt, mask > 0.5, 8), } image_outputs = { "depth_est": depth_est * mask, "depth_est_nomask": depth_est, "depth_gt": sample["depth"]["stage1"], "ref_img": sample["imgs"][0], "mask": sample["mask"]["stage1"], "errormap": (depth_est - depth_gt).abs() * mask, } if is_distributed: scalar_outputs = reduce_scalar_outputs(scalar_outputs) return tensor2float(scalar_outputs["loss"]), tensor2float(scalar_outputs), tensor2numpy(image_outputs) @make_nograd_func def test_sample_depth(model, model_loss, sample, args): if is_distributed: model_eval = model.module else: model_eval = model model_eval.eval() sample_cuda = tocuda(sample) depth_gt_ms, mask_ms = sample_cuda["depth"], sample_cuda["mask"] depth_gt, mask = depth_gt_ms["stage{}".format(args.levels)], mask_ms["stage{}".format(args.levels)] outputs = model_eval( sample_cuda["imgs"], sample_cuda["proj_matrices"], sample_cuda["intrinsics_matrices"], sample_cuda["depth_values"] ) depth_est = outputs["depth"] loss, epe, pw_loss_stages, dds_loss_stages = model_loss( outputs, depth_gt_ms, mask_ms, stage_lw=[float(e) for e in args.stage_lw.split(",") if e], depth_values=sample_cuda["depth_values"] ) scalar_outputs = { "loss": loss, "epe": epe, "s0_pw_loss": pw_loss_stages[0], "s1_pw_loss": pw_loss_stages[1], "s2_pw_loss": pw_loss_stages[2], "s3_pw_loss": pw_loss_stages[3], "s0_dds_loss": dds_loss_stages[0], "s1_dds_loss": dds_loss_stages[1], "s2_dds_loss": dds_loss_stages[2], "s3_dds_loss": dds_loss_stages[3], "abs_depth_error": AbsDepthError_metrics(depth_est, depth_gt, mask > 0.5), "thres2mm_error": Thres_metrics(depth_est, depth_gt, mask > 0.5, 2), "thres4mm_error": Thres_metrics(depth_est, depth_gt, mask > 0.5, 4), "thres8mm_error": Thres_metrics(depth_est, depth_gt, mask > 0.5, 8), } image_outputs = { "depth_est": depth_est * mask, "depth_est_nomask": depth_est, "depth_gt": sample["depth"]["stage1"], "ref_img": sample["imgs"][0], "mask": sample["mask"]["stage1"], "errormap": (depth_est - depth_gt).abs() * mask } if is_distributed: scalar_outputs = reduce_scalar_outputs(scalar_outputs) return tensor2float(scalar_outputs["loss"]), tensor2float(scalar_outputs), tensor2numpy(image_outputs) def initLogger(): logger = logging.getLogger() logger.setLevel(logging.INFO) curTime = time.strftime('%Y%m%d-%H%M', time.localtime(time.time())) logfile = os.path.join(args.logdir, 'train-' + curTime + '.log') formatter = logging.Formatter("%(asctime)s - %(filename)s[line:%(lineno)d] - %(levelname)s: %(message)s") fileHandler = logging.FileHandler(logfile, mode='a') fileHandler.setFormatter(formatter) logger.addHandler(fileHandler) consoleHandler = logging.StreamHandler(sys.stdout) consoleHandler.setFormatter(formatter) logger.addHandler(consoleHandler) logger.info("Logger initialized.") logger.info("Writing logs to file: {}".format(logfile)) logger.info("Current time: {}".format(curTime)) settings_str = "All settings:\n" for k,v in vars(args).items(): settings_str += '{0}: {1}\n'.format(k,v) logger.info(settings_str) return logger if __name__ == '__main__': logger = initLogger() if args.resume: assert args.mode == "train" assert args.loadckpt is None if is_distributed: torch.cuda.set_device(args.local_rank) torch.distributed.init_process_group(backend="nccl", init_method="env://") synchronize() set_random_seed(args.seed) device = torch.device(args.device) # tensorboard if (not is_distributed) or (dist.get_rank() == 0): if not os.path.isdir(args.logdir): os.makedirs(args.logdir) current_time_str = str(datetime.datetime.now().strftime('%Y%m%d_%H%M%S')) logger.info("current time " + current_time_str) logger.info("creating new summary file") if not args.notensorboard: tb_writer = SummaryWriter(args.logdir) # @Note GeoMVSNet model model = GeoMVSNet( levels=args.levels, hypo_plane_num_stages=[int(n) for n in args.hypo_plane_num_stages.split(",")], depth_interal_ratio_stages=[float(ir) for ir in args.depth_interal_ratio_stages.split(",")], feat_base_channel=args.feat_base_channel, reg_base_channel=args.reg_base_channel, group_cor_dim_stages=[int(n) for n in args.group_cor_dim_stages.split(",")], ) model.to(device) model_loss = geomvsnet_loss # optimizer optimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=args.lr, betas=(0.9, 0.999), weight_decay=args.wd) # load parameters start_epoch = 0 if args.resume: saved_models = [fn for fn in os.listdir(args.logdir) if fn.endswith(".ckpt")] saved_models = sorted(saved_models, key=lambda x: int(x.split('_')[-1].split('.')[0])) loadckpt = os.path.join(args.logdir, saved_models[-1]) logger.info("resuming: " + loadckpt) state_dict = torch.load(loadckpt, map_location=torch.device("cpu")) model.load_state_dict(state_dict['model']) optimizer.load_state_dict(state_dict['optimizer']) start_epoch = state_dict['epoch'] + 1 # distributed if (not is_distributed) or (dist.get_rank() == 0): logger.info("start at epoch {}".format(start_epoch)) logger.info('Number of model parameters: {}'.format(sum([p.data.nelement() for p in model.parameters()]))) if is_distributed: if dist.get_rank() == 0: logger.info("Let's use {} GPUs in distributed mode!".format(torch.cuda.device_count())) model = torch.nn.parallel.DistributedDataParallel( model, device_ids=[args.local_rank], output_device=args.local_rank, find_unused_parameters=True, ) else: if torch.cuda.is_available(): logger.info("Let's use {} GPUs in parallel mode.".format(torch.cuda.device_count())) model = nn.DataParallel(model) # dataset, dataloader if args.which_dataset == "dtu": train_dataset = DTUDataset(args.trainpath, args.trainlist, "train", args.n_views, data_scale=args.data_scale, robust_train=args.robust_train) test_dataset = DTUDataset(args.testpath, args.testlist, "val", args.n_views, data_scale=args.data_scale) elif args.which_dataset == "blendedmvs": train_dataset = BlendedMVSDataset(args.trainpath, args.trainlist, "train", args.n_views, img_wh=(768, 576), robust_train=args.robust_train, augment=False) test_dataset = BlendedMVSDataset(args.testpath, args.testlist, "val", args.n_views, img_wh=(768, 576)) if is_distributed: train_sampler = torch.utils.data.DistributedSampler(train_dataset, num_replicas=dist.get_world_size(), rank=dist.get_rank()) test_sampler = torch.utils.data.DistributedSampler(test_dataset, num_replicas=dist.get_world_size(), rank=dist.get_rank()) TrainImgLoader = DataLoader(train_dataset, args.batch_size, sampler=train_sampler, num_workers=8, drop_last=True, pin_memory=args.pin_m) TestImgLoader = DataLoader(test_dataset, args.batch_size, sampler=test_sampler, num_workers=8, drop_last=False, pin_memory=args.pin_m) else: TrainImgLoader = DataLoader(train_dataset, args.batch_size, shuffle=True, num_workers=8, drop_last=True, pin_memory=args.pin_m) TestImgLoader = DataLoader(test_dataset, args.batch_size, shuffle=False, num_workers=8, drop_last=False, pin_memory=args.pin_m) train(model, model_loss, optimizer, TrainImgLoader, TestImgLoader, start_epoch, args)