Repository: doubleZ0108/GeoMVSNet
Branch: master
Commit: 09167fc95f04
Files: 48
Total size: 230.0 KB

Directory structure:
gitextract_c1590sn0/

├── .gitignore
├── LICENSE
├── README.md
├── datasets/
│   ├── __init__.py
│   ├── blendedmvs.py
│   ├── data_io.py
│   ├── dtu.py
│   ├── evaluations/
│   │   └── dtu_parallel/
│   │       ├── BaseEval2Obj_web.m
│   │       ├── BaseEvalMain_web.m
│   │       ├── ComputeStat_web.m
│   │       ├── MaxDistCP.m
│   │       ├── PointCompareMain.m
│   │       ├── plyread.m
│   │       └── reducePts_haa.m
│   ├── lists/
│   │   ├── blendedmvs/
│   │   │   ├── low_res_all.txt
│   │   │   └── val.txt
│   │   ├── dtu/
│   │   │   ├── test.txt
│   │   │   ├── train.txt
│   │   │   └── val.txt
│   │   └── tnt/
│   │       ├── advanced.txt
│   │       └── intermediate.txt
│   └── tnt.py
├── fusions/
│   ├── dtu/
│   │   ├── _open3d.py
│   │   ├── gipuma.py
│   │   └── pcd.py
│   └── tnt/
│       └── dypcd.py
├── models/
│   ├── __init__.py
│   ├── filter.py
│   ├── geometry.py
│   ├── geomvsnet.py
│   ├── loss.py
│   ├── submodules.py
│   └── utils/
│       ├── __init__.py
│       ├── opts.py
│       └── utils.py
├── outputs/
│   └── visual.ipynb
├── requirements.txt
├── scripts/
│   ├── blend/
│   │   └── train_blend.sh
│   ├── data_path.sh
│   ├── dtu/
│   │   ├── fusion_dtu.sh
│   │   ├── matlab_quan_dtu.sh
│   │   ├── test_dtu.sh
│   │   ├── train_dtu.sh
│   │   └── train_dtu_raw.sh
│   └── tnt/
│       ├── fusion_tnt.sh
│       └── test_tnt.sh
├── test.py
└── train.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
.DS_Store
__pycache__

================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: README.md
================================================
<h1 align="center">GeoMVSNet: Learning Multi-View Stereo With Geometry Perception (CVPR 2023)</h1>

<div align="center">
    <a href="https://www.doublez.site" target='_blank'>Zhe Zhang</a>, 
    <a href="https://github.com/prstrive" target='_blank'>Rui Peng</a>, 
    <a href="https://yuhsihu.github.io" target='_blank'>Yuxi Hu</a>, 
    <a href="https://www.ece.pku.edu.cn/info/1046/2147.htm" target='_blank'>Ronggang Wang</a>*
</div>

<br />

<div align="center">
    <a href="https://openaccess.thecvf.com//content/CVPR2023/html/Zhang_GeoMVSNet_Learning_Multi-View_Stereo_With_Geometry_Perception_CVPR_2023_paper.html" target='_blank'><img src="https://img.shields.io/badge/CVPR-2023-9cf?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADUAAAAjCAMAAAADt7LEAAADAFBMVEVMaXF5mcqXsNZ0lceQq9OXsNaHpM9zlMeHo8+dtNh9nMypvt12lsiFos6mu9xxksZvkcZzlMdlicJxksZqjcRukMUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACreRF2AAAAEXRSTlMA1mzoi0mi9BQ1wiRqVgeN+nWbeCoAAAAJcEhZcwAALiMAAC4jAXilP3YAAAJASURBVDiNnZTbduIwDEW3ZJOEwBTo/P8ntnRCCQHb8jzkUhgIrDV6iRN7W0eXCP7HZFotJKiPtsjh7pDfHq5f68YPq1/tNksEFfmsD/9iVt/e0lNvsaxivw/vuV01zxUqQHWsU5w+RbPnDHjg7bBLgCqmOZt854euguRhtfBAt8ugshfxMedNd3l4fyvdta/FWjKaDwkCVF/FK309haGyH4Lp6J4TAMqyFsjNywzcUsEMlcXLk8sbhYUmpOkz8BYA9DhtVwJk0wSklQGnnjotgaFaTV05U+w0UrZ2pG8DkLUqNFUHKGqQV8OpAMSUJlVvzkxdJX0wU/mVsRN7qsuGbMYYkkWTj5OLAGZmcaSigkyRBMBGZ6vagQl3ppQKNjkMrQPZ9IrPGvF/1uPWd4yxKPpsXCpwqRprGwDr/7EquEiSoSlbCdOfp4hC3Ewtf+Xssol4K+8FooQPB243lbmPLABZIbXHB5QDLcG0dEOrG2XGVxJ0Z6iLcah1kHhNZVmAOQdFLVV5oQDLrS2LDIczULl8Sykg5iCm1XZ5XhYtoXPgki4TqhngmLk1B4RUuAxmWkiuzrlSsLoAtD1DH8OdL8Lp4BQwMzRz2DswN176MIcAp7JR5xVQ2SlrAwzcx92M+1EInJMcj6U67LNw4dKt+kjO/UNLFXHxR+F1k1USfe4AdJsB/znMxdXFE/2ieUhdmfxOIF9zY0FnKAN/nJ1WM5TtHSnNTqsZCjF1j/r2hcn7k2k65wuZq/BTW/knm38BWrgDGcRH1DMAAAAASUVORK5CYII="/></a>&nbsp;
    <a href="https://openaccess.thecvf.com//content/CVPR2023/papers/Zhang_GeoMVSNet_Learning_Multi-View_Stereo_With_Geometry_Perception_CVPR_2023_paper.pdf" target='_blank'><img src="https://img.shields.io/badge/Paper-PDF-f5cac3?logo=adobeacrobatreader&logoColor=red"/></a>&nbsp;
    <a href="https://openaccess.thecvf.com/content/CVPR2023/supplemental/Zhang_GeoMVSNet_Learning_Multi-View_CVPR_2023_supplemental.pdf" target='_blank'><img src="https://img.shields.io/badge/Supp.-PDF-f5cac3?logo=adobeacrobatreader&logoColor=red"/></a>&nbsp;
    <a href="https://paperswithcode.com/sota/point-clouds-on-tanks-and-temples?p=geomvsnet-learning-multi-view-stereo-with" target='_blank'><img src="https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/geomvsnet-learning-multi-view-stereo-with/point-clouds-on-tanks-and-temples" /></a>
</div>

<br />

<div align="center">

<a href="https://youtu.be/XqLDgJAZAKc" target='_blank'><img src=".github/imgs/geomvsnet-video-cover.png" width="50%" /></a><a href="https://youtu.be/dLyuFMz1tAk" target='_blank'><img src=".github/imgs/mvs-demo-video-cover.png" width="50%" /></a>

</div>


## 🔨 Setup

### 1.1 Requirements

Use the following commands to build the `conda` environment.

```bash
conda create -n geomvsnet python=3.8
conda activate geomvsnet
pip install -r requirements.txt
```

### 1.2 Datasets

Download the following datasets and modify the corresponding local path in `scripts/data_path.sh`.

#### DTU Dataset

**Training data**. We use the same DTU training data as mentioned in MVSNet and CasMVSNet, please refer to [DTU training data](https://drive.google.com/file/d/1eDjh-_bxKKnEuz5h-HXS7EDJn59clx6V/view) and [Depth raw](https://virutalbuy-public.oss-cn-hangzhou.aliyuncs.com/share/cascade-stereo/CasMVSNet/dtu_data/dtu_train_hr/Depths_raw.zip) for data download. Optional, you should download the [Recitfied raw](http://roboimagedata2.compute.dtu.dk/data/MVS/Rectified.zip) if you want to train the model in raw image resolution. Unzip and organize them as:

```
dtu/
├── Cameras
├── Depths
├── Depths_raw
├── Rectified
└── Rectified_raw (optional)
```

**Testing data**. For convenience, we use the [DTU testing data](https://drive.google.com/file/d/1rX0EXlUL4prRxrRu2DgLJv2j7-tpUD4D/view?usp=sharing) processed by CVP-MVSNet. Also unzip and organize it as:

```
dtu-test/
├── Cameras
├── Depths
└── Rectified
```

> Please note that the images and lighting here are consistent with the original dataset. 

#### BlendedMVS Dataset

Download the low image resolution version of [BlendedMVS dataset](https://drive.google.com/file/d/1ilxls-VJNvJnB7IaFj7P0ehMPr7ikRCb/view) and unzip it as:

```
blendedmvs/
└── dataset_low_res
    ├── ...
    └── 5c34529873a8df509ae57b58
```

#### Tanks and Temples Dataset

Download the intermediate and advanced subsets of [Tanks and Temples dataset](https://drive.google.com/file/d/1YArOJaX9WVLJh4757uE8AEREYkgszrCo/view) and unzip them. If you want to use the short range version of camera parameters for `Intermediate` subset, unzip `short_range_caemeras_for_mvsnet.zip` and move `cam_[]` to the corresponding scenarios.

```
tnt/
├── advanced
│   ├── ...
│   └── Temple
│       ├── cams
│       ├── images
│       ├── pair.txt
│       └── Temple.log
└── intermediate
    ├── ...
    └── Train
        ├── cams
        ├── cams_train
        ├── images
        ├── pair.txt
        └── Train.log
```


## 🚂 Training

You can train GeoMVSNet from scratch on DTU dataset and BlendedMVS dataset. After suitable setting and training, you can get the training checkpoints model in `checkpoints/[Dataset]/[THISNAME]`, and the following outputs lied in the folder:
- `events.out.tfevents*`: you can use `tensorboard` to monitor the training process.
- `model_[epoch].ckpt`: we save a checkpoint every `--save_freq`.
- `train-[TIME].log`: logged the detailed training message, you can refer to appropiate indicators to judge the quality of training.

### 2.1 DTU

To train GeoMVSNet on DTU dataset, you can refer to `scripts/dtu/train_dtu.sh`, specify `THISNAME`, `CUDA_VISIBLE_DEVICES`, `batch_size`, etc. to meet your demand. And run:

```bash
bash scripts/dtu/train_dtu.sh
```

The default training strategy we provide is the *distributed* training mode. If you want to use the *general* training mode, you can refer to the following code. 

<details>
<summary>general training script</summary>

```bash
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train.py ${@} \
    --which_dataset="dtu" --epochs=16 --logdir=$LOG_DIR \
    --trainpath=$DTU_TRAIN_ROOT --testpath=$DTU_TRAIN_ROOT \
    --trainlist="datasets/lists/dtu/train.txt" --testlist="datasets/lists/dtu/test.txt" \
    \
    --data_scale="mid" --n_views="5" --batch_size=16 --lr=0.025 --robust_train \
    --lrepochs="1,3,5,7,9,11,13,15:1.5"
```

</details>

> It should be noted that two different training strategies need to adjust the `batch_size` and `lr` parameters to achieve the best training results.


### 2.2 BlendedMVS

To train GeoMVSNet on BlendedMVS dataset, you can refer to `scripts/bled/train_blend.sh`, and also specify `THISNAME`, `CUDA_VISIBLE_DEVICES`, `batch_size`, etc. to meet your demand. And run:

```bash
bash scripts/blend/train_blend.sh
```

By default, we use `7` viewpoints as input for the BlendedMVS training. Similarly, you can choose to use the *distributed* training mode or the *general* one as mentioned in 2.1.

## ⚗️ Testing

### 3.1 DTU

For DTU testing, we use model trained on DTU training dataset. You can basically download our [DTU pretrained model](https://drive.google.com/file/d/147_UbjE87E-HB9sZ5yLDbckynH825nJd/view?usp=sharing) and put it into `checkpoints/dtu/geomvsnet/`. And perform *depth map estimation, point cloud fusion, and result evaluation* according to the following steps.
1. Run `bash scripts/dtu/test_dtu.sh` for depth map estimation. The results will be stored in `outputs/dtu/[THISNAME]/`, each scan folder holding `depth_est` and `confidence`, etc.
    - Use `outputs/visual.ipynb` for depth map visualization.
2. Run `bash scripts/dtu/fusion_dtu.sh` for point cloud fusion. We provide 3 different fusion methods, and we recommend the `open3d` option by default. After fusion, you can get `[FUSION_METHOD]_fusion_plys` under the experiment output folder, point clouds of each testing scan are there.

    <details>
    <summary>(Optional) If you want to use the "Gipuma" fusion method.</summary>

    1. Clone the [edited fusibile repo](https://github.com/YoYo000/fusibile).
    2. Refer to [fusibile configuration blog (Chinese)](https://zhuanlan.zhihu.com/p/460212787) for building details.
    3. Create a new python2.7 conda env.
        ```bash
        conda create -n fusibile python=2.7
        conda install scipy matplotlib
        conda install tensorflow==1.14.0
        conda install -c https://conda.anaconda.org/menpo opencv
        ```
    4. Use the `fusibile` conda environment for `gipuma` fusion method.

    </details>

3. Download the [ObsMask](http://roboimagedata2.compute.dtu.dk/data/MVS/SampleSet.zip) and [Points](http://roboimagedata2.compute.dtu.dk/data/MVS/Points.zip) of DTU GT point clouds from the official website and organize them as:

    ```
    dtu-evaluation/
    ├── ObsMask
    └── Points
    ```

4. Setup `Matlab` in command line mode, and run `bash scripts/dtu/matlab_quan_dtu.sh`. You can adjust the `num_at_once` config according to your machine's CPU and memory ceiling. After quantitative evaluation, you will get `[FUSION_METHOD]_quantitative/` and `[THISNAME].log` just store the quantitative results.

### 3.2 Tanks and Temples

For testing on [Tanks and Temples benchmark](https://www.tanksandtemples.org/leaderboard/), you can use any of the following configurations:
- Only train on DTU training dataset.
- Only train on BlendedMVS dataset.
- Pretrained on DTU training dataset and finetune on BlendedMVS dataset. (Recommend)

After your personal training, also follow these steps:
1. Run `bash scripts/tnt/test_tnt.sh` for depth map estimation. The results will be stored in `outputs/[TRAINING_DATASET]/[THISNAME]/`.
    - Use `outputs/visual.ipynb` for depth map visualization.
2. Run `bash scripts/tnt/fusion_tnt.sh` for point cloud fusion. We provide the popular dynamic fusion strategy, and you can tune the fusion threshold in `fusions/tnt/dypcd.py`.
3. Follow the *Upload Instructions* on the [T&T official website](https://www.tanksandtemples.org/submit/) to make online submissions.

### 3.3 Custom Data (TODO)

GeoMVSNet can reconstruct on custom data. At present, you can refer to [MVSNet](https://github.com/YoYo000/MVSNet#file-formats) to organize your data, and refer to the same steps as above for *depth estimation* and *point cloud fusion*.

## 💡 Results

Our results on DTU and Tanks and Temples Dataset are listed in the tables.

| DTU Dataset | Acc. ↓ | Comp. ↓ | Overall ↓ |
| ----------- | ------ | ------- | --------- |
| GeoMVSNet   | 0.3309 | 0.2593  | 0.2951    |

| T&T (Intermediate) | Mean ↑ | Family | Francis | Horse | Lighthouse | M60   | Panther | Playground | Train |
| ------------------ | ------ | ------ | ------- | ----- | ---------- | ----- | ------- | ---------- | ----- |
| GeoMVSNet          | 65.89  | 81.64  | 67.53   | 55.78 | 68.02      | 65.49 | 67.19   | 63.27      | 58.22 |

| T&T (Advanced) | Mean ↑ | Auditorium | Ballroom | Courtroom | Museum | Palace | Temple |
| -------------- | ------ | ---------- | -------- | --------- | ------ | ------ | ------ |
| GeoMVSNet      | 41.52  | 30.23      | 46.53    | 39.98     | 53.05  | 35.98  | 43.34  |

And you can download our [Point Cloud](https://disk.pku.edu.cn:443/link/69D473126C509C8DCBCC7E233FAAEEAA) and [Estimated Depth](https://disk.pku.edu.cn:443/link/4217EB2F063D2B10EDC711F54A12B5F7) for academic usage.

<details>
<summary>🌟 About Reproduce Paper Results</summary>


In our experiment, we found that the reproduction of MVS network is relatively difficult. Therefore, we summarize some of the problems encountered in our experiment as follows, hoping to be helpful to you.

**Q1. GPU Architecture Matters.**

There are two commonly used NVIDIA GPU series: GeForce RTX (e.g. 4090Ti, 3090Ti, 2090Ti) and Tesla (e.g. V100, T4). We find that there is generally no performance degradation in training and testing on the same series of GPUs. But on the contrary, for example, if you train on V100 and test on 3090Ti, the visual effect of the depth map looks exactly the same, but each pixel value is not exactly the same. We conjecture that the two series or architectures differ in numerical computation and processing precision.

> Our pretrained model is trained on NVIDIA V100 GPUs.

**Q2. Pytorch Version Matters.**

Different Cuda versions will result in different optional Pytorch versions. Different torch versions will affect the accuracy of network training and testing. One of the reasons we found is that the implementation and parameter control of the `F.grid_sample()` are various in different versions of Pytorch.

**Q3. Training Hyperparameters Matters.**

In the era of neural network, hyperparameters really matter. We made some network hyperparameters tuning, but it may not be the same as your configuration. Most fundamentally, due to differences in GPU graphics memory, you need to synchronize `batch_size` and `lr`. And the schedule of learning rate also matters.

**Q4. Testing Epoch Matters.**

By default, our model will train 16 epochs. But how to select the best training model for testing to achieve the best performance? One solution is to use [PyTorch-lightning](https://lightning.ai/docs/pytorch/latest/starter/introduction.html). For simplicity, you can decide which checkpoint to use based on the `.log` file we provide.

**Q5. Fusion Hyperparameters Matters.**

For both DTU and T&T datasets, the hyperparameters of point cloud fusion greatly affect the final performance. We have provided different fusion strategies and easy access to adjust parameters. Maybe you need to know the temperament of your model.

Qx. Others, you can [raise an issue](https://github.com/doubleZ0108/GeoMVSNet/issues/new/choose) if you meet other problems.

</details>

<br />

## ⚖️ Citation
```
@InProceedings{zhe2023geomvsnet,
  title={GeoMVSNet: Learning Multi-View Stereo With Geometry Perception},
  author={Zhang, Zhe and Peng, Rui and Hu, Yuxi and Wang, Ronggang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={21508--21518},
  year={2023}
}
```

## 💌 Acknowledgements

This repository is partly based on [MVSNet](https://github.com/YoYo000/MVSNet), [MVSNet-pytorch](https://github.com/xy-guo/MVSNet_pytorch), [CVP-MVSNet](https://github.com/JiayuYANG/CVP-MVSNet), [cascade-stereo](https://github.com/alibaba/cascade-stereo), [MVSTER](https://github.com/JeffWang987/MVSTER).

We appreciate their contributions to the MVS community.

================================================
FILE: datasets/__init__.py
================================================


================================================
FILE: datasets/blendedmvs.py
================================================
# -*- coding: utf-8 -*-
# @Description: Data preprocessing and organization for BlendedMVS dataset.
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07

import os
import cv2
import random
import numpy as np
from PIL import Image

from torch.utils.data import Dataset
from torchvision import transforms as T

from datasets.data_io import *


def motion_blur(img: np.ndarray, max_kernel_size=3):
    # Either vertial, hozirontal or diagonal blur
    mode = np.random.choice(['h', 'v', 'diag_down', 'diag_up'])
    ksize = np.random.randint(0, (max_kernel_size + 1) / 2) * 2 + 1  # make sure is odd
    center = int((ksize - 1) / 2)
    kernel = np.zeros((ksize, ksize))
    if mode == 'h':
        kernel[center, :] = 1.
    elif mode == 'v':
        kernel[:, center] = 1.
    elif mode == 'diag_down':
        kernel = np.eye(ksize)
    elif mode == 'diag_up':
        kernel = np.flip(np.eye(ksize), 0)
    var = ksize * ksize / 16.
    grid = np.repeat(np.arange(ksize)[:, np.newaxis], ksize, axis=-1)
    gaussian = np.exp(-(np.square(grid - center) + np.square(grid.T - center)) / (2. * var))
    kernel *= gaussian
    kernel /= np.sum(kernel)
    img = cv2.filter2D(img, -1, kernel)
    return img


class BlendedMVSDataset(Dataset):
    def __init__(self, root_dir, list_file, split, n_views, **kwargs):
        super(BlendedMVSDataset, self).__init__()

        self.levels = 4 
        self.root_dir = root_dir
        self.list_file = list_file
        self.split = split
        self.n_views = n_views

        assert self.split in ['train', 'val', 'all']

        self.scale_factors = {}
        self.scale_factor = 0

        self.img_wh = kwargs.get("img_wh", (768, 576))
        assert self.img_wh[0]%32==0 and self.img_wh[1]%32==0, \
            'img_wh must both be multiples of 2^5!'
        
        self.robust_train = kwargs.get("robust_train", True)
        self.augment = kwargs.get("augment", True)
        if self.augment:
            self.color_augment = T.ColorJitter(brightness=0.25, contrast=(0.3, 1.5))

        self.metas = self.build_metas()


    def build_metas(self):
        metas = []
        with open(self.list_file) as f:
            self.scans = [line.rstrip() for line in f.readlines()]
        for scan in self.scans:
            with open(os.path.join(self.root_dir, scan, "cams/pair.txt")) as f:
                num_viewpoint = int(f.readline())
                for _ in range(num_viewpoint):
                    ref_view = int(f.readline().rstrip())
                    src_views = [int(x) for x in f.readline().rstrip().split()[1::2]]
                    if len(src_views) >= self.n_views-1:
                        metas += [(scan, ref_view, src_views)]
        return metas


    def read_cam_file(self, scan, filename):
        with open(filename) as f:
            lines = f.readlines()
            lines = [line.rstrip() for line in lines]
        # extrinsics: line [1,5), 4x4 matrix
        extrinsics = np.fromstring(' '.join(lines[1:5]), dtype=np.float32, sep=' ').reshape((4, 4))
        # intrinsics: line [7-10), 3x3 matrix
        intrinsics = np.fromstring(' '.join(lines[7:10]), dtype=np.float32, sep=' ').reshape((3, 3))
        depth_min = float(lines[11].split()[0])
        depth_max = float(lines[11].split()[-1])

        if scan not in self.scale_factors:
            self.scale_factors[scan] = 100.0 / depth_min
        depth_min *= self.scale_factors[scan]
        depth_max *= self.scale_factors[scan]
        extrinsics[:3, 3] *= self.scale_factors[scan]

        return intrinsics, extrinsics, depth_min, depth_max


    def read_depth_mask(self, scan, filename, depth_min, depth_max, scale):
        depth = np.array(read_pfm(filename)[0], dtype=np.float32)
        depth = (depth * self.scale_factors[scan]) * scale

        mask = (depth>=depth_min) & (depth<=depth_max)
        assert mask.sum() > 0
        mask = mask.astype(np.float32)
        if self.img_wh is not None:
            depth = cv2.resize(depth, self.img_wh, interpolation=cv2.INTER_NEAREST)
        h, w = depth.shape
        depth_ms = {}
        mask_ms = {}

        for i in range(self.levels):
            depth_cur = cv2.resize(depth, (w//(2**i), h//(2**i)), interpolation=cv2.INTER_NEAREST)
            mask_cur = cv2.resize(mask, (w//(2**i), h//(2**i)), interpolation=cv2.INTER_NEAREST)

            depth_ms[f"stage{self.levels-i}"] = depth_cur
            mask_ms[f"stage{self.levels-i}"] = mask_cur

        return depth_ms, mask_ms


    def read_img(self, filename):
        img = Image.open(filename)

        if self.augment:
            img = self.color_augment(img)
            img = motion_blur(np.array(img, dtype=np.float32))

        np_img = np.array(img, dtype=np.float32) / 255.
        return np_img


    def __len__(self):
        return len(self.metas)


    def __getitem__(self, idx):
        meta = self.metas[idx]
        scan, ref_view, src_views = meta
        
        if self.robust_train:
            num_src_views = len(src_views)
            index = random.sample(range(num_src_views), self.n_views - 1)
            view_ids = [ref_view] + [src_views[i] for i in index]
            scale_ratio = random.uniform(0.8, 1.25)
        else:
            view_ids = [ref_view] + src_views[:self.n_views - 1]
            scale_ratio = 1

        imgs = []
        mask = None
        depth = None
        depth_min = None
        depth_max = None

        proj={}
        proj_matrices_0 = []
        proj_matrices_1 = []
        proj_matrices_2 = []
        proj_matrices_3 = []

        for i, vid in enumerate(view_ids):
            img_filename = os.path.join(self.root_dir, '{}/blended_images/{:0>8}.jpg'.format(scan, vid))
            depth_filename = os.path.join(self.root_dir, '{}/rendered_depth_maps/{:0>8}.pfm'.format(scan, vid))
            proj_mat_filename = os.path.join(self.root_dir, '{}/cams/{:0>8}_cam.txt'.format(scan, vid))

            img = self.read_img(img_filename)
            imgs.append(img.transpose(2,0,1))

            intrinsics, extrinsics, depth_min_, depth_max_ = self.read_cam_file(scan, proj_mat_filename)

            proj_mat_0 = np.zeros(shape=(2, 4, 4), dtype=np.float32)
            proj_mat_1 = np.zeros(shape=(2, 4, 4), dtype=np.float32)
            proj_mat_2 = np.zeros(shape=(2, 4, 4), dtype=np.float32)
            proj_mat_3 = np.zeros(shape=(2, 4, 4), dtype=np.float32)
            extrinsics[:3, 3] *= scale_ratio
            intrinsics[:2,:] *= 0.125
            proj_mat_0[0,:4,:4] = extrinsics.copy()
            proj_mat_0[1,:3,:3] = intrinsics.copy()
            int_mat_0 = intrinsics.copy()

            intrinsics[:2,:] *= 2
            proj_mat_1[0,:4,:4] = extrinsics.copy()
            proj_mat_1[1,:3,:3] = intrinsics.copy()
            int_mat_1 = intrinsics.copy()

            intrinsics[:2,:] *= 2
            proj_mat_2[0,:4,:4] = extrinsics.copy()
            proj_mat_2[1,:3,:3] = intrinsics.copy()
            int_mat_2 = intrinsics.copy()

            intrinsics[:2,:] *= 2
            proj_mat_3[0,:4,:4] = extrinsics.copy()
            proj_mat_3[1,:3,:3] = intrinsics.copy()
            int_mat_3 = intrinsics.copy()

            proj_matrices_0.append(proj_mat_0)
            proj_matrices_1.append(proj_mat_1)
            proj_matrices_2.append(proj_mat_2)
            proj_matrices_3.append(proj_mat_3)

            # reference view
            if i == 0:
                depth_min = depth_min_ * scale_ratio
                depth_max = depth_max_ * scale_ratio
                depth, mask = self.read_depth_mask(scan, depth_filename, depth_min, depth_max, scale_ratio)
                for l in range(self.levels):
                    mask[f'stage{l+1}'] = mask[f'stage{l+1}']
                    depth[f'stage{l+1}'] = depth[f'stage{l+1}']

        proj['stage1'] = np.stack(proj_matrices_0)
        proj['stage2'] = np.stack(proj_matrices_1)
        proj['stage3'] = np.stack(proj_matrices_2)
        proj['stage4'] = np.stack(proj_matrices_3)

        intrinsics_matrices = {
            "stage1": int_mat_0,
            "stage2": int_mat_1,
            "stage3": int_mat_2,
            "stage4": int_mat_3
        }
        
        sample = {
            "imgs": imgs,
            "proj_matrices": proj,
            "intrinsics_matrices": intrinsics_matrices,
            "depth": depth,
            "depth_values": np.array([depth_min, depth_max], dtype=np.float32),
            "mask": mask
        }

        return sample

================================================
FILE: datasets/data_io.py
================================================
# -*- coding: utf-8 -*-
# @Description: I/O functions for depth maps and camera files.
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07

import sys, re
import numpy as np


def read_pfm(filename):
    file = open(filename, 'rb')
    color = None
    width = None
    height = None
    scale = None
    endian = None

    header = file.readline().decode('utf-8').rstrip()
    if header == 'PF':
        color = True
    elif header == 'Pf':
        color = False
    else:
        raise Exception('Not a PFM file.')

    dim_match = re.match(r'^(\d+)\s(\d+)\s$', file.readline().decode('utf-8'))
    if dim_match:
        width, height = map(int, dim_match.groups())
    else:
        raise Exception('Malformed PFM header.')

    scale = float(file.readline().rstrip())
    if scale < 0:  # little-endian
        endian = '<'
        scale = -scale
    else:
        endian = '>'  # big-endian

    data = np.fromfile(file, endian + 'f')
    shape = (height, width, 3) if color else (height, width)

    data = np.reshape(data, shape)
    data = np.flipud(data)
    file.close()
    return data, scale


def save_pfm(filename, image, scale=1):
    file = open(filename, "wb")
    color = None

    image = np.flipud(image)

    if image.dtype.name != 'float32':
        raise Exception('Image dtype must be float32.')

    if len(image.shape) == 3 and image.shape[2] == 3:  # color image
        color = True
    elif len(image.shape) == 2 or len(image.shape) == 3 and image.shape[2] == 1:  # greyscale
        color = False
    else:
        raise Exception('Image must have H x W x 3, H x W x 1 or H x W dimensions.')

    file.write('PF\n'.encode('utf-8') if color else 'Pf\n'.encode('utf-8'))
    file.write('{} {}\n'.format(image.shape[1], image.shape[0]).encode('utf-8'))

    endian = image.dtype.byteorder

    if endian == '<' or endian == '=' and sys.byteorder == 'little':
        scale = -scale

    file.write(('%f\n' % scale).encode('utf-8'))

    image.tofile(file)
    file.close()


def write_cam(file, cam):
    f = open(file, "w")
    f.write('extrinsic\n')
    for i in range(0, 4):
        for j in range(0, 4):
            f.write(str(cam[0][i][j]) + ' ')
        f.write('\n')
    f.write('\n')

    f.write('intrinsic\n')
    for i in range(0, 3):
        for j in range(0, 3):
            f.write(str(cam[1][i][j]) + ' ')
        f.write('\n')

    f.write('\n' + str(cam[1][3][0]) + ' ' + str(cam[1][3][1]) + ' ' + str(cam[1][3][2]) + ' ' + str(cam[1][3][3]) + '\n')

    f.close()

================================================
FILE: datasets/dtu.py
================================================
# -*- coding: utf-8 -*-
# @Description: Data preprocessing and organization for DTU dataset.
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07

import os
import cv2
import random
import numpy as np
from PIL import Image

from torchvision import transforms
from torch.utils.data import Dataset

from datasets.data_io import *


class DTUDataset(Dataset):
    def __init__(self, root_dir, list_file, mode, n_views, **kwargs):
        super(DTUDataset, self).__init__()
        
        self.root_dir = root_dir
        self.list_file = list_file
        self.mode = mode
        self.n_views = n_views

        assert self.mode in ["train", "val", "test"]

        self.total_depths = 192
        self.interval_scale = 1.06

        self.data_scale = kwargs.get("data_scale", "mid")     # mid / raw
        self.robust_train = kwargs.get("robust_train", False)   # True / False
        self.color_augment = transforms.ColorJitter(brightness=0.5, contrast=0.5)

        if self.mode == "test":
            self.max_wh = kwargs.get("max_wh", (1600, 1200))

        self.metas = self.build_metas()

    
    def build_metas(self):
        metas = []

        with open(os.path.join(self.list_file)) as f:
            scans = [line.rstrip() for line in f.readlines()]

        pair_file = "Cameras/pair.txt"
        for scan in scans:
            with open(os.path.join(self.root_dir, pair_file)) as f:
                num_viewpoint = int(f.readline())

                # viewpoints (49)
                for _ in range(num_viewpoint):
                    ref_view = int(f.readline().rstrip())
                    src_views = [int(x) for x in f.readline().rstrip().split()[1::2]]

                    if self.mode == "train":
                        # light conditions 0-6
                        for light_idx in range(7):
                            metas.append((scan, light_idx, ref_view, src_views))
                    elif self.mode in ["test", "val"]:
                        if len(src_views) < self.n_views:
                            print("{} < num_views:{}".format(len(src_views), self.n_views))
                            src_views += [src_views[0]] * (self.n_views - len(src_views))
                        metas.append((scan, 3, ref_view, src_views))

        print("DTU Dataset in", self.mode, "mode metas:", len(metas))
        return metas


    def __len__(self):
        return len(self.metas)


    def read_cam_file(self, filename):
        with open(filename) as f:
            lines = f.readlines()
            lines = [line.rstrip() for line in lines]
        # extrinsics: line [1,5), 4x4 matrix
        extrinsics = np.fromstring(' '.join(lines[1:5]), dtype=np.float32, sep=' ').reshape((4, 4))
        # intrinsics: line [7-10), 3x3 matrix
        intrinsics = np.fromstring(' '.join(lines[7:10]), dtype=np.float32, sep=' ').reshape((3, 3))

        if self.mode == "test":
            intrinsics[:2, :] /= 4.0

        # depth_min & depth_interval: line 11
        depth_min = float(lines[11].split()[0])
        depth_interval = float(lines[11].split()[1])
        
        if len(lines[11].split()) >= 3:
            num_depth = lines[11].split()[2]
            depth_max = depth_min + int(float(num_depth)) * depth_interval
            depth_interval = (depth_max - depth_min) / self.total_depths

        depth_interval *= self.interval_scale

        return intrinsics, extrinsics, depth_min, depth_interval


    def read_img(self, filename):
        img = Image.open(filename)
        if self.mode == "train" and self.robust_train:
            img = self.color_augment(img)
        # scale 0~255 to 0~1
        np_img = np.array(img, dtype=np.float32) / 255.
        return np_img


    def crop_img(self, img):
        raw_h, raw_w = img.shape[:2]
        start_h = (raw_h-1024)//2
        start_w = (raw_w-1280)//2
        return img[start_h:start_h+1024, start_w:start_w+1280, :]  # (1024, 1280)

    
    def prepare_img(self, hr_img):
        h, w = hr_img.shape
        if self.data_scale == "mid":
            hr_img_ds = cv2.resize(hr_img, (w//2, h//2), interpolation=cv2.INTER_NEAREST)
            h, w = hr_img_ds.shape
            target_h, target_w = 512, 640
            start_h, start_w = (h - target_h)//2, (w - target_w)//2
            hr_img_crop = hr_img_ds[start_h: start_h + target_h, start_w: start_w + target_w]
        elif self.data_scale == "raw":
            hr_img_crop = hr_img[h//2-1024//2:h//2+1024//2, w//2-1280//2:w//2+1280//2]  # (1024, 1280)
        return hr_img_crop

    
    def scale_mvs_input(self, img, intrinsics, max_w, max_h, base=64):
        h, w = img.shape[:2]
        if h > max_h or w > max_w:
            scale = 1.0 * max_h / h
            if scale * w > max_w:
                scale = 1.0 * max_w / w
            new_w, new_h = scale * w // base * base, scale * h // base * base
        else:
            new_w, new_h = 1.0 * w // base * base, 1.0 * h // base * base

        scale_w = 1.0 * new_w / w
        scale_h = 1.0 * new_h / h
        intrinsics[0, :] *= scale_w
        intrinsics[1, :] *= scale_h

        img = cv2.resize(img, (int(new_w), int(new_h)))

        return img, intrinsics

    
    def read_mask_hr(self, filename):
        img = Image.open(filename)
        np_img = np.array(img, dtype=np.float32)
        np_img = (np_img > 10).astype(np.float32)
        np_img = self.prepare_img(np_img)

        h, w = np_img.shape
        np_img_ms = {
            "stage1": cv2.resize(np_img, (w//8, h//8), interpolation=cv2.INTER_NEAREST),
            "stage2": cv2.resize(np_img, (w//4, h//4), interpolation=cv2.INTER_NEAREST),
            "stage3": cv2.resize(np_img, (w//2, h//2), interpolation=cv2.INTER_NEAREST),
            "stage4": np_img,
        }
        return np_img_ms


    def read_depth_hr(self, filename, scale):
        depth_hr = np.array(read_pfm(filename)[0], dtype=np.float32) * scale
        depth_lr = self.prepare_img(depth_hr)

        h, w = depth_lr.shape
        depth_lr_ms = {
            "stage1": cv2.resize(depth_lr, (w//8, h//8), interpolation=cv2.INTER_NEAREST),
            "stage2": cv2.resize(depth_lr, (w//4, h//4), interpolation=cv2.INTER_NEAREST),
            "stage3": cv2.resize(depth_lr, (w//2, h//2), interpolation=cv2.INTER_NEAREST),
            "stage4": depth_lr,
        }
        return depth_lr_ms


    def __getitem__(self, idx):
        scan, light_idx, ref_view, src_views = self.metas[idx]

        if self.mode == "train" and self.robust_train:
            num_src_views = len(src_views)
            index = random.sample(range(num_src_views), self.n_views-1)
            view_ids = [ref_view] + [src_views[i] for i in index]
            scale_ratio = random.uniform(0.8, 1.25) 
        else:
            view_ids = [ref_view] + src_views[:self.n_views-1]
            scale_ratio = 1

        imgs = []
        mask = None
        depth_values = None
        proj_matrices = []

        for i, vid in enumerate(view_ids):
            # @Note image & cam
            if self.mode in ["train", "val"]:
                if self.data_scale == "mid":
                    img_filename = os.path.join(self.root_dir, 'Rectified/{}_train/rect_{:0>3}_{}_r5000.png'.format(scan, vid+1, light_idx))
                elif self.data_scale == "raw":
                    img_filename = os.path.join(self.root_dir, 'Rectified_raw/{}/rect_{:0>3}_{}_r5000.png'.format(scan, vid + 1, light_idx))
                proj_mat_filename = os.path.join(self.root_dir, 'Cameras/train/{:0>8}_cam.txt').format(vid)
            elif self.mode == "test":
                img_filename = os.path.join(self.root_dir, 'Rectified/{}/rect_{:0>3}_3_r5000.png'.format(scan, vid+1))
                proj_mat_filename = os.path.join(self.root_dir, 'Cameras/{:0>8}_cam.txt'.format(vid))

            img = self.read_img(img_filename)
            intrinsics, extrinsics, depth_min, depth_interval = self.read_cam_file(proj_mat_filename)

            if self.mode in ["train", "val"]:
                if self.data_scale == "raw":
                    img = self.crop_img(img)
                    intrinsics[:2, :] *= 2.0
                if self.mode == "train" and self.robust_train:
                    extrinsics[:3,3] *= scale_ratio                    
            elif self.mode == "test":
                img, intrinsics = self.scale_mvs_input(img, intrinsics, self.max_wh[0], self.max_wh[1])

            imgs.append(img.transpose(2,0,1))

            # reference view
            if i == 0:
                # @Note depth values
                diff = 0.5 if self.mode in ["test", "val"] else 0
                depth_max = depth_interval * (self.total_depths - diff) + depth_min
                depth_values = np.array([depth_min * scale_ratio, depth_max * scale_ratio], dtype=np.float32)

                # @Note depth & mask
                if self.mode in ["train", "val"]:
                    depth_filename_hr = os.path.join(self.root_dir, 'Depths_raw/{}/depth_map_{:0>4}.pfm'.format(scan, vid))
                    depth = self.read_depth_hr(depth_filename_hr, scale_ratio)

                    mask_filename_hr = os.path.join(self.root_dir, 'Depths_raw/{}/depth_visual_{:0>4}.png'.format(scan, vid))
                    mask = self.read_mask_hr(mask_filename_hr)

            proj_mat = np.zeros(shape=(2, 4, 4), dtype=np.float32)
            proj_mat[0, :4, :4] = extrinsics
            proj_mat[1, :3, :3] = intrinsics
            proj_matrices.append(proj_mat)
            
        proj_matrices = np.stack(proj_matrices)
        intrinsics = np.stack(intrinsics)
        stage1_pjmats = proj_matrices.copy()
        stage1_pjmats[:, 1, :2, :] = proj_matrices[:, 1, :2, :] / 2.0
        stage1_ins = intrinsics.copy()
        stage1_ins[:2, :] = intrinsics[:2, :] / 2.0
        stage3_pjmats = proj_matrices.copy()
        stage3_pjmats[:, 1, :2, :] = proj_matrices[:, 1, :2, :] * 2
        stage3_ins = intrinsics.copy()
        stage3_ins[:2, :] = intrinsics[:2, :] * 2.0
        stage4_pjmats = proj_matrices.copy()
        stage4_pjmats[:, 1, :2, :] = proj_matrices[:, 1, :2, :] * 4
        stage4_ins = intrinsics.copy()
        stage4_ins[:2, :] = intrinsics[:2, :] * 4.0
        proj_matrices = {
            "stage1": stage1_pjmats,
            "stage2": proj_matrices,
            "stage3": stage3_pjmats,
            "stage4": stage4_pjmats
        }
        intrinsics_matrices = {
            "stage1": stage1_ins,
            "stage2": intrinsics,
            "stage3": stage3_ins,
            "stage4": stage4_ins
        }

        sample = {
            "imgs": imgs,
            "proj_matrices": proj_matrices,
            "intrinsics_matrices": intrinsics_matrices,
            "depth_values": depth_values
        }
        if self.mode in ["train", "val"]:
            sample["depth"] = depth
            sample["mask"] = mask
        elif self.mode == "test":
            sample["filename"] = scan + '/{}/' + '{:0>8}'.format(view_ids[0]) + "{}"

        return sample

================================================
FILE: datasets/evaluations/dtu_parallel/BaseEval2Obj_web.m
================================================
function BaseEval2Obj_web(BaseEval,method_string,outputPath)

if(nargin<3)
    outputPath='./';
end

% tresshold for coloring alpha channel in the range of 0-10 mm
dist_tresshold=10;

cSet=BaseEval.cSet;

Qdata=BaseEval.Qdata;
alpha=min(BaseEval.Ddata,dist_tresshold)/dist_tresshold;

fid=fopen([outputPath method_string '2Stl_' num2str(cSet) ' .obj'],'w+');

for cP=1:size(Qdata,2)
    if(BaseEval.DataInMask(cP))
        C=[1 0 0]*alpha(cP)+[1 1 1]*(1-alpha(cP)); %coloring from red to white in the range of 0-10 mm (0 to dist_tresshold)
    else
        C=[0 1 0]*alpha(cP)+[0 0 1]*(1-alpha(cP)); %green to blue for points outside the mask (which are not included in the analysis)
    end
    fprintf(fid,'v %f %f %f %f %f %f\n',[Qdata(1,cP) Qdata(2,cP) Qdata(3,cP) C(1) C(2) C(3)]);
end
fclose(fid);

disp('Data2Stl saved as obj')

Qstl=BaseEval.Qstl;
fid=fopen([outputPath 'Stl2' method_string '_' num2str(cSet) '.obj'],'w+');

alpha=min(BaseEval.Dstl,dist_tresshold)/dist_tresshold;

for cP=1:size(Qstl,2)
    if(BaseEval.StlAbovePlane(cP))
        C=[1 0 0]*alpha(cP)+[1 1 1]*(1-alpha(cP)); %coloring from red to white in the range of 0-10 mm (0 to dist_tresshold)
    else
        C=[0 1 0]*alpha(cP)+[0 0 1]*(1-alpha(cP)); %green to blue for points below plane (which are not included in the analysis)
    end
    fprintf(fid,'v %f %f %f %f %f %f\n',[Qstl(1,cP) Qstl(2,cP) Qstl(3,cP) C(1) C(2) C(3)]);
end
fclose(fid);

disp('Stl2Data saved as obj')

================================================
FILE: datasets/evaluations/dtu_parallel/BaseEvalMain_web.m
================================================
format compact

representation_string='Points'; %mvs representation 'Points' or 'Surfaces'

switch representation_string
    case 'Points'
        eval_string='_Eval_'; %results naming
        settings_string='';
end


dst=0.2;    %Min dist between points when reducing

% start this evaluation
cSet = str2num(thisset)

%input data name
DataInName = [plyPath sprintf('%s%03d.ply', lower(method_string), cSet)]


%results name
EvalName=[resultsPath method_string eval_string num2str(cSet) '.mat']

%check if file is already computed
if(~exist(EvalName,'file'))
    disp(DataInName);
    
    time=clock;time(4:5), drawnow
    
    tic
    Mesh = plyread(DataInName);
    Qdata=[Mesh.vertex.x Mesh.vertex.y Mesh.vertex.z]';
    toc
    
    BaseEval=PointCompareMain(cSet,Qdata,dst,dataPath);
    
    disp('Saving results'), drawnow
    toc
    save(EvalName,'BaseEval');
    toc
    
    % write obj-file of evaluation
    % BaseEval2Obj_web(BaseEval,method_string, resultsPath)
    % toc
    time=clock;time(4:5), drawnow

    BaseEval.MaxDist=20; %outlier threshold of 20 mm
    
    BaseEval.FilteredDstl=BaseEval.Dstl(BaseEval.StlAbovePlane); %use only points that are above the plane 
    BaseEval.FilteredDstl=BaseEval.FilteredDstl(BaseEval.FilteredDstl<BaseEval.MaxDist); % discard outliers

    BaseEval.FilteredDdata=BaseEval.Ddata(BaseEval.DataInMask); %use only points that within mask
    BaseEval.FilteredDdata=BaseEval.FilteredDdata(BaseEval.FilteredDdata<BaseEval.MaxDist); % discard outliers
    
    fprintf("mean/median Data (acc.) %f/%f\n", mean(BaseEval.FilteredDdata), median(BaseEval.FilteredDdata));
    fprintf("mean/median Stl (comp.) %f/%f\n", mean(BaseEval.FilteredDstl), median(BaseEval.FilteredDstl));
end

fprintf("=== %d done! ===\n", cSet)

exit

================================================
FILE: datasets/evaluations/dtu_parallel/ComputeStat_web.m
================================================
format compact


MaxDist=20; %outlier thresshold of 20 mm

time=clock;

% method_string='mvsnet';
representation_string='Points'; %mvs representation 'Points' or 'Surfaces'

switch representation_string
    case 'Points'
        eval_string='_Eval_'; %results naming
        settings_string='';
end


UsedSets=str2num(set)

nStat=length(UsedSets);

BaseStat.nStl=zeros(1,nStat);
BaseStat.nData=zeros(1,nStat);
BaseStat.MeanStl=zeros(1,nStat);
BaseStat.MeanData=zeros(1,nStat);
BaseStat.VarStl=zeros(1,nStat);
BaseStat.VarData=zeros(1,nStat);
BaseStat.MedStl=zeros(1,nStat);
BaseStat.MedData=zeros(1,nStat);

for cStat=1:length(UsedSets) %Data set number
    
    currentSet=UsedSets(cStat);

    EvalName=[resultsPath method_string eval_string num2str(currentSet) '.mat'];
    
    disp(EvalName);
    load(EvalName);
    
    Dstl=BaseEval.Dstl(BaseEval.StlAbovePlane); %use only points that are above the plane 
    Dstl=Dstl(Dstl<MaxDist); % discard outliers
    
    Ddata=BaseEval.Ddata(BaseEval.DataInMask); %use only points that within mask
    Ddata=Ddata(Ddata<MaxDist); % discard outliers
    
    BaseStat.nStl(cStat)=length(Dstl);
    BaseStat.nData(cStat)=length(Ddata);
    
    BaseStat.MeanStl(cStat)=mean(Dstl);
    BaseStat.MeanData(cStat)=mean(Ddata);
    
    BaseStat.VarStl(cStat)=var(Dstl);
    BaseStat.VarData(cStat)=var(Ddata);
    
    BaseStat.MedStl(cStat)=median(Dstl);
    BaseStat.MedData(cStat)=median(Ddata);
    
    disp("acc");
    disp(mean(Ddata));
    disp("comp");
    disp(mean(Dstl));
    time=clock;
end

disp(BaseStat);
disp("mean acc")
disp(mean(BaseStat.MeanData));
disp("mean comp")
disp(mean(BaseStat.MeanStl));

totalStatName=[resultsPath 'TotalStat_' method_string eval_string '.mat']
save(totalStatName,'BaseStat','time','MaxDist');


exit

================================================
FILE: datasets/evaluations/dtu_parallel/MaxDistCP.m
================================================
function Dist = MaxDistCP(Qto,Qfrom,BB,MaxDist)

Dist=ones(1,size(Qfrom,2))*MaxDist;

Range=floor((BB(2,:)-BB(1,:))/MaxDist);

tic
Done=0;
LookAt=zeros(1,size(Qfrom,2));
for x=0:Range(1),
    for y=0:Range(2),
        for z=0:Range(3),
            
            Low=BB(1,:)+[x y z]*MaxDist;
            High=Low+MaxDist;
            
            idxF=find(Qfrom(1,:)>=Low(1) & Qfrom(2,:)>=Low(2) & Qfrom(3,:)>=Low(3) &...
                Qfrom(1,:)<High(1) & Qfrom(2,:)<High(2) & Qfrom(3,:)<High(3));
            SQfrom=Qfrom(:,idxF);
            LookAt(idxF)=LookAt(idxF)+1; %Debug
            
            Low=Low-MaxDist;
            High=High+MaxDist;
            idxT=find(Qto(1,:)>=Low(1) & Qto(2,:)>=Low(2) & Qto(3,:)>=Low(3) &...
                Qto(1,:)<High(1) & Qto(2,:)<High(2) & Qto(3,:)<High(3));
            SQto=Qto(:,idxT);
            
            if(isempty(SQto))
                Dist(idxF)=MaxDist;
            else
                KDstl=KDTreeSearcher(SQto');
                [~,SDist] = knnsearch(KDstl,SQfrom');
                Dist(idxF)=SDist;
                
            end
            
            Done=Done+length(idxF); %Debug
            
        end
    end
    %Complete=Done/size(Qfrom,2);
    %EstTime=(toc/Complete)/60
    %toc
    %LA=[sum(LookAt==0),...
    %	sum(LookAt==1),...
   % 	sum(LookAt==2),...
   % 	sum(LookAt==3),...
   % 	sum(LookAt>3)]
end

================================================
FILE: datasets/evaluations/dtu_parallel/PointCompareMain.m
================================================
function BaseEval=PointCompareMain(cSet,Qdata,dst,dataPath)
% evaluation function the calculates the distantes from the reference data (stl) to the evalution points (Qdata) and the
% distances from the evaluation points to the reference

tic
% reduce points 0.2 mm neighbourhood density
Qdata=reducePts_haa(Qdata,dst);
toc

StlInName=[dataPath '/Points/stl/stl' sprintf('%03d',cSet) '_total.ply'];

StlMesh = plyread(StlInName);  %STL points already reduced 0.2 mm neighbourhood density
Qstl=[StlMesh.vertex.x StlMesh.vertex.y StlMesh.vertex.z]';

%Load Mask (ObsMask) and Bounding box (BB) and Resolution (Res)
Margin=10;
MaskName=[dataPath '/ObsMask/ObsMask' num2str(cSet) '_' num2str(Margin) '.mat'];
load(MaskName)

MaxDist=60;
disp('Computing Data 2 Stl distances')
Ddata = MaxDistCP(Qstl,Qdata,BB,MaxDist);
toc

disp('Computing Stl 2 Data distances')
Dstl=MaxDistCP(Qdata,Qstl,BB,MaxDist);
disp('Distances computed')
toc

%use mask
%From Get mask - inverted & modified.
One=ones(1,size(Qdata,2));
Qv=(Qdata-BB(1,:)'*One)/Res+1;
Qv=round(Qv);

Midx1=find(Qv(1,:)>0 & Qv(1,:)<=size(ObsMask,1) & Qv(2,:)>0 & Qv(2,:)<=size(ObsMask,2) & Qv(3,:)>0 & Qv(3,:)<=size(ObsMask,3));
MidxA=sub2ind(size(ObsMask),Qv(1,Midx1),Qv(2,Midx1),Qv(3,Midx1));
Midx2=find(ObsMask(MidxA));

BaseEval.DataInMask(1:size(Qv,2))=false;
BaseEval.DataInMask(Midx1(Midx2))=true; %If Data is within the mask

BaseEval.cSet=cSet;
BaseEval.Margin=Margin;         %Margin of masks
BaseEval.dst=dst;               %Min dist between points when reducing
BaseEval.Qdata=Qdata;           %Input data points
BaseEval.Ddata=Ddata;           %distance from data to stl
BaseEval.Qstl=Qstl;             %Input stl points
BaseEval.Dstl=Dstl;             %Distance from the stl to data

load([dataPath '/ObsMask/Plane' num2str(cSet)],'P')
BaseEval.GroundPlane=P;         % Plane used to destinguise which Stl points are 'used'
BaseEval.StlAbovePlane=(P'*[Qstl;ones(1,size(Qstl,2))])>0; %Is stl above 'ground plane'
BaseEval.Time=clock;            %Time when computation is finished

================================================
FILE: datasets/evaluations/dtu_parallel/plyread.m
================================================
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [Elements,varargout] = plyread(Path,Str)
%PLYREAD   Read a PLY 3D data file.
%   [DATA,COMMENTS] = PLYREAD(FILENAME) reads a version 1.0 PLY file
%   FILENAME and returns a structure DATA.  The fields in this structure
%   are defined by the PLY header; each element type is a field and each
%   element property is a subfield.  If the file contains any comments,
%   they are returned in a cell string array COMMENTS.
%
%   [TRI,PTS] = PLYREAD(FILENAME,'tri') or
%   [TRI,PTS,DATA,COMMENTS] = PLYREAD(FILENAME,'tri') converts vertex
%   and face data into triangular connectivity and vertex arrays.  The
%   mesh can then be displayed using the TRISURF command.
%
%   Note: This function is slow for large mesh files (+50K faces),
%   especially when reading data with list type properties.
%
%   Example:
%   [Tri,Pts] = PLYREAD('cow.ply','tri');
%   trisurf(Tri,Pts(:,1),Pts(:,2),Pts(:,3)); 
%   colormap(gray); axis equal;
%
%   See also: PLYWRITE

% Pascal Getreuer 2004

[fid,Msg] = fopen(Path,'rt');	% open file in read text mode

if fid == -1, error(Msg); end

Buf = fscanf(fid,'%s',1);
if ~strcmp(Buf,'ply')
   fclose(fid);
   error('Not a PLY file.'); 
end


%%% read header %%%

Position = ftell(fid);
Format = '';
NumComments = 0;
Comments = {};				% for storing any file comments
NumElements = 0;
NumProperties = 0;
Elements = [];				% structure for holding the element data
ElementCount = [];		% number of each type of element in file
PropertyTypes = [];		% corresponding structure recording property types
ElementNames = {};		% list of element names in the order they are stored in the file
PropertyNames = [];		% structure of lists of property names

while 1
   Buf = fgetl(fid);   								% read one line from file
   BufRem = Buf;
   Token = {};
   Count = 0;
   
   while ~isempty(BufRem)								% split line into tokens
      [tmp,BufRem] = strtok(BufRem);
      
      if ~isempty(tmp)
         Count = Count + 1;							% count tokens
         Token{Count} = tmp;
      end
   end
   
   if Count 		% parse line
      switch lower(Token{1})
      case 'format'		% read data format
         if Count >= 2
            Format = lower(Token{2});
            
            if Count == 3 & ~strcmp(Token{3},'1.0')
               fclose(fid);
               error('Only PLY format version 1.0 supported.');
            end
         end
      case 'comment'		% read file comment
         NumComments = NumComments + 1;
         Comments{NumComments} = '';
         for i = 2:Count
            Comments{NumComments} = [Comments{NumComments},Token{i},' '];
         end
      case 'element'		% element name
         if Count >= 3
            if isfield(Elements,Token{2})
               fclose(fid);
               error(['Duplicate element name, ''',Token{2},'''.']);
            end
            
            NumElements = NumElements + 1;
            NumProperties = 0;
   	      Elements = setfield(Elements,Token{2},[]);
            PropertyTypes = setfield(PropertyTypes,Token{2},[]);
            ElementNames{NumElements} = Token{2};
            PropertyNames = setfield(PropertyNames,Token{2},{});
            CurElement = Token{2};
            ElementCount(NumElements) = str2double(Token{3});
            
            if isnan(ElementCount(NumElements))
               fclose(fid);
               error(['Bad element definition: ',Buf]); 
            end            
         else
            error(['Bad element definition: ',Buf]);
         end         
      case 'property'	% element property
         if ~isempty(CurElement) & Count >= 3            
            NumProperties = NumProperties + 1;
            eval(['tmp=isfield(Elements.',CurElement,',Token{Count});'],...
               'fclose(fid);error([''Error reading property: '',Buf])');
            
            if tmp
               error(['Duplicate property name, ''',CurElement,'.',Token{2},'''.']);
            end            
            
            % add property subfield to Elements
            eval(['Elements.',CurElement,'.',Token{Count},'=[];'], ...
               'fclose(fid);error([''Error reading property: '',Buf])');            
            % add property subfield to PropertyTypes and save type
            eval(['PropertyTypes.',CurElement,'.',Token{Count},'={Token{2:Count-1}};'], ...
               'fclose(fid);error([''Error reading property: '',Buf])');            
            % record property name order 
            eval(['PropertyNames.',CurElement,'{NumProperties}=Token{Count};'], ...
               'fclose(fid);error([''Error reading property: '',Buf])');
         else
            fclose(fid);
            
            if isempty(CurElement)            
               error(['Property definition without element definition: ',Buf]);
            else               
               error(['Bad property definition: ',Buf]);
            end            
         end         
      case 'end_header'	% end of header, break from while loop
         break;		
      end
   end
end

%%% set reading for specified data format %%%

if isempty(Format)
	warning('Data format unspecified, assuming ASCII.');
   Format = 'ascii';
end

switch Format
case 'ascii'
   Format = 0;
case 'binary_little_endian'
   Format = 1;
case 'binary_big_endian'
   Format = 2;
otherwise
   fclose(fid);
   error(['Data format ''',Format,''' not supported.']);
end

if ~Format   
   Buf = fscanf(fid,'%f');		% read the rest of the file as ASCII data
   BufOff = 1;
else
   % reopen the file in read binary mode
   fclose(fid);
   
   if Format == 1
      fid = fopen(Path,'r','ieee-le.l64');		% little endian
   else
      fid = fopen(Path,'r','ieee-be.l64');		% big endian
   end
   
   % find the end of the header again (using ftell on the old handle doesn't give the correct position)   
   BufSize = 8192;
   Buf = [blanks(10),char(fread(fid,BufSize,'uchar')')];
   i = [];
   tmp = -11;
   
   while isempty(i)
   	i = findstr(Buf,['end_header',13,10]);			% look for end_header + CR/LF
   	i = [i,findstr(Buf,['end_header',10])];		% look for end_header + LF
      
      if isempty(i)
         tmp = tmp + BufSize;
         Buf = [Buf(BufSize+1:BufSize+10),char(fread(fid,BufSize,'uchar')')];
      end
   end
   
   % seek to just after the line feed
   fseek(fid,i + tmp + 11 + (Buf(i + 10) == 13),-1);
end


%%% read element data %%%

% PLY and MATLAB data types (for fread)
PlyTypeNames = {'char','uchar','short','ushort','int','uint','float','double', ...
   'char8','uchar8','short16','ushort16','int32','uint32','float32','double64'};
MatlabTypeNames = {'schar','uchar','int16','uint16','int32','uint32','single','double'};
SizeOf = [1,1,2,2,4,4,4,8];	% size in bytes of each type

for i = 1:NumElements
   % get current element property information
   eval(['CurPropertyNames=PropertyNames.',ElementNames{i},';']);
   eval(['CurPropertyTypes=PropertyTypes.',ElementNames{i},';']);
   NumProperties = size(CurPropertyNames,2);
   
%   fprintf('Reading %s...\n',ElementNames{i});
      
   if ~Format	%%% read ASCII data %%%
      for j = 1:NumProperties
         Token = getfield(CurPropertyTypes,CurPropertyNames{j});
         
         if strcmpi(Token{1},'list')
            Type(j) = 1;
         else
            Type(j) = 0;
			end
      end
      
      % parse buffer
      if ~any(Type)
         % no list types
         Data = reshape(Buf(BufOff:BufOff+ElementCount(i)*NumProperties-1),NumProperties,ElementCount(i))';
         BufOff = BufOff + ElementCount(i)*NumProperties;
      else
         ListData = cell(NumProperties,1);
         
         for k = 1:NumProperties
            ListData{k} = cell(ElementCount(i),1);
         end
         
         % list type
		   for j = 1:ElementCount(i)
   	      for k = 1:NumProperties
      	      if ~Type(k)
         	      Data(j,k) = Buf(BufOff);
            	   BufOff = BufOff + 1;
	            else
   	            tmp = Buf(BufOff);
      	         ListData{k}{j} = Buf(BufOff+(1:tmp))';
         	      BufOff = BufOff + tmp + 1;
            	end
            end
         end
      end
   else		%%% read binary data %%%
      % translate PLY data type names to MATLAB data type names
      ListFlag = 0;		% = 1 if there is a list type 
      SameFlag = 1;     % = 1 if all types are the same
      
      for j = 1:NumProperties
         Token = getfield(CurPropertyTypes,CurPropertyNames{j});
         
         if ~strcmp(Token{1},'list')			% non-list type
	         tmp = rem(strmatch(Token{1},PlyTypeNames,'exact')-1,8)+1;
         
            if ~isempty(tmp)
               TypeSize(j) = SizeOf(tmp);
               Type{j} = MatlabTypeNames{tmp};
               TypeSize2(j) = 0;
               Type2{j} = '';
               
               SameFlag = SameFlag & strcmp(Type{1},Type{j});
	         else
   	         fclose(fid);
               error(['Unknown property data type, ''',Token{1},''', in ', ...
                     ElementNames{i},'.',CurPropertyNames{j},'.']);
         	end
         else											% list type
            if length(Token) == 3
               ListFlag = 1;
               SameFlag = 0;
               tmp = rem(strmatch(Token{2},PlyTypeNames,'exact')-1,8)+1;
               tmp2 = rem(strmatch(Token{3},PlyTypeNames,'exact')-1,8)+1;
         
               if ~isempty(tmp) & ~isempty(tmp2)
                  TypeSize(j) = SizeOf(tmp);
                  Type{j} = MatlabTypeNames{tmp};
                  TypeSize2(j) = SizeOf(tmp2);
                  Type2{j} = MatlabTypeNames{tmp2};
	   	      else
   	   	      fclose(fid);
               	error(['Unknown property data type, ''list ',Token{2},' ',Token{3},''', in ', ...
                        ElementNames{i},'.',CurPropertyNames{j},'.']);
               end
            else
               fclose(fid);
               error(['Invalid list syntax in ',ElementNames{i},'.',CurPropertyNames{j},'.']);
            end
         end
      end
      
      % read file
      if ~ListFlag
         if SameFlag
            % no list types, all the same type (fast)
            Data = fread(fid,[NumProperties,ElementCount(i)],Type{1})';
         else
            % no list types, mixed type
            Data = zeros(ElementCount(i),NumProperties);
            
         	for j = 1:ElementCount(i)
        			for k = 1:NumProperties
               	Data(j,k) = fread(fid,1,Type{k});
              	end
         	end
         end
      else
         ListData = cell(NumProperties,1);
         
         for k = 1:NumProperties
            ListData{k} = cell(ElementCount(i),1);
         end
         
         if NumProperties == 1
            BufSize = 512;
            SkipNum = 4;
            j = 0;
            
            % list type, one property (fast if lists are usually the same length)
            while j < ElementCount(i)
               Position = ftell(fid);
               % read in BufSize count values, assuming all counts = SkipNum
               [Buf,BufSize] = fread(fid,BufSize,Type{1},SkipNum*TypeSize2(1));
               Miss = find(Buf ~= SkipNum);					% find first count that is not SkipNum
               fseek(fid,Position + TypeSize(1),-1); 		% seek back to after first count                              
               
               if isempty(Miss)									% all counts are SkipNum
                  Buf = fread(fid,[SkipNum,BufSize],[int2str(SkipNum),'*',Type2{1}],TypeSize(1))';
                  fseek(fid,-TypeSize(1),0); 				% undo last skip
                  
                  for k = 1:BufSize
                     ListData{1}{j+k} = Buf(k,:);
                  end
                  
                  j = j + BufSize;
                  BufSize = floor(1.5*BufSize);
               else
                  if Miss(1) > 1									% some counts are SkipNum
                     Buf2 = fread(fid,[SkipNum,Miss(1)-1],[int2str(SkipNum),'*',Type2{1}],TypeSize(1))';                     
                     
                     for k = 1:Miss(1)-1
                        ListData{1}{j+k} = Buf2(k,:);
                     end
                     
                     j = j + k;
                  end
                  
                  % read in the list with the missed count
                  SkipNum = Buf(Miss(1));
                  j = j + 1;
                  ListData{1}{j} = fread(fid,[1,SkipNum],Type2{1});
                  BufSize = ceil(0.6*BufSize);
               end
            end
         else
            % list type(s), multiple properties (slow)
            Data = zeros(ElementCount(i),NumProperties);
            
            for j = 1:ElementCount(i)
         		for k = 1:NumProperties
            		if isempty(Type2{k})
               		Data(j,k) = fread(fid,1,Type{k});
            		else
               		tmp = fread(fid,1,Type{k});
               		ListData{k}{j} = fread(fid,[1,tmp],Type2{k});
		            end
      		   end
      		end
         end
      end
   end
   
   % put data into Elements structure
   for k = 1:NumProperties
   	if (~Format & ~Type(k)) | (Format & isempty(Type2{k}))
      	eval(['Elements.',ElementNames{i},'.',CurPropertyNames{k},'=Data(:,k);']);
      else
      	eval(['Elements.',ElementNames{i},'.',CurPropertyNames{k},'=ListData{k};']);
		end
   end
end

clear Data ListData;
fclose(fid);

if (nargin > 1 & strcmpi(Str,'Tri')) | nargout > 2   
   % find vertex element field
   Name = {'vertex','Vertex','point','Point','pts','Pts'};
   Names = [];
   
   for i = 1:length(Name)
      if any(strcmp(ElementNames,Name{i}))
         Names = getfield(PropertyNames,Name{i});
         Name = Name{i};         
         break;
      end
   end
   
   if any(strcmp(Names,'x')) & any(strcmp(Names,'y')) & any(strcmp(Names,'z'))
      eval(['varargout{1}=[Elements.',Name,'.x,Elements.',Name,'.y,Elements.',Name,'.z];']);
   else
      varargout{1} = zeros(1,3);
	end
           
   varargout{2} = Elements;
   varargout{3} = Comments;
   Elements = [];
   
   % find face element field
   Name = {'face','Face','poly','Poly','tri','Tri'};
   Names = [];
   
   for i = 1:length(Name)
      if any(strcmp(ElementNames,Name{i}))
         Names = getfield(PropertyNames,Name{i});
         Name = Name{i};
         break;
      end
   end
   
   if ~isempty(Names)
      % find vertex indices property subfield
	   PropertyName = {'vertex_indices','vertex_indexes','vertex_index','indices','indexes'};           
      
   	for i = 1:length(PropertyName)
      	if any(strcmp(Names,PropertyName{i}))
         	PropertyName = PropertyName{i};
	         break;
   	   end
      end
      
      if ~iscell(PropertyName)
         % convert face index lists to triangular connectivity
         eval(['FaceIndices=varargout{2}.',Name,'.',PropertyName,';']);
  			N = length(FaceIndices);
   		Elements = zeros(N*2,3);
   		Extra = 0;   

			for k = 1:N
   			Elements(k,:) = FaceIndices{k}(1:3);
   
   			for j = 4:length(FaceIndices{k})
      			Extra = Extra + 1;      
	      		Elements(N + Extra,:) = [Elements(k,[1,j-1]),FaceIndices{k}(j)];
   			end
         end
         Elements = Elements(1:N+Extra,:) + 1;
      end
   end
else
   varargout{1} = Comments;
end

================================================
FILE: datasets/evaluations/dtu_parallel/reducePts_haa.m
================================================
function [ptsOut,indexSet] = reducePts_haa(pts, dst)

%Reduces a point set, pts, in a stochastic manner, such that the minimum sdistance
% between points is 'dst'. Writen by abd, edited by haa, then by raje

nPoints=size(pts,2);

indexSet=true(nPoints,1);
RandOrd=randperm(nPoints);

%tic
NS = KDTreeSearcher(pts');
%toc

% search the KNTree for close neighbours in a chunk-wise fashion to save memory if point cloud is really big
Chunks=1:min(4e6,nPoints-1):nPoints;
Chunks(end)=nPoints;

for cChunk=1:(length(Chunks)-1)
    Range=Chunks(cChunk):Chunks(cChunk+1);
    
    idx = rangesearch(NS,pts(:,RandOrd(Range))',dst);
    
    for i = 1:size(idx,1)
        id =RandOrd(i-1+Chunks(cChunk));
        if (indexSet(id))
            indexSet(idx{i}) = 0;
            indexSet(id) = 1;
        end
    end
end

ptsOut = pts(:,indexSet);

disp(['downsample factor: ' num2str(nPoints/sum(indexSet))]);

================================================
FILE: datasets/lists/blendedmvs/low_res_all.txt
================================================
5c1f33f1d33e1f2e4aa6dda4
5bfe5ae0fe0ea555e6a969ca
5bff3c5cfe0ea555e6bcbf3a
58eaf1513353456af3a1682a
5bfc9d5aec61ca1dd69132a2
5bf18642c50e6f7f8bdbd492
5bf26cbbd43923194854b270
5bf17c0fd439231948355385
5be3ae47f44e235bdbbc9771
5be3a5fb8cfdd56947f6b67c
5bbb6eb2ea1cfa39f1af7e0c
5ba75d79d76ffa2c86cf2f05
5bb7a08aea1cfa39f1a947ab
5b864d850d072a699b32f4ae
5b6eff8b67b396324c5b2672
5b6e716d67b396324c2d77cb
5b69cc0cb44b61786eb959bf
5b62647143840965efc0dbde
5b60fa0c764f146feef84df0
5b558a928bbfb62204e77ba2
5b271079e0878c3816dacca4
5b08286b2775267d5b0634ba
5afacb69ab00705d0cefdd5b
5af28cea59bc705737003253
5af02e904c8216544b4ab5a2
5aa515e613d42d091d29d300
5c34529873a8df509ae57b58
5c34300a73a8df509add216d
5c1af2e2bee9a723c963d019
5c1892f726173c3a09ea9aeb
5c0d13b795da9479e12e2ee9
5c062d84a96e33018ff6f0a6
5bfd0f32ec61ca1dd69dc77b
5bf21799d43923194842c001
5bf3a82cd439231948877aed
5bf03590d4392319481971dc
5beb6e66abd34c35e18e66b9
5be883a4f98cee15019d5b83
5be47bf9b18881428d8fbc1d
5bcf979a6d5f586b95c258cd
5bce7ac9ca24970bce4934b6
5bb8a49aea1cfa39f1aa7f75
5b78e57afc8fcf6781d0c3ba
5b21e18c58e2823a67a10dd8
5b22269758e2823a67a3bd03
5b192eb2170cf166458ff886
5ae2e9c5fe405c5076abc6b2
5adc6bd52430a05ecb2ffb85
5ab8b8e029f5351f7f2ccf59
5abc2506b53b042ead637d86
5ab85f1dac4291329b17cb50
5a969eea91dfc339a9a3ad2c
5a8aa0fab18050187cbe060e
5a7d3db14989e929563eb153
5a69c47d0d5d0a7f3b2e9752
5a618c72784780334bc1972d
5a6464143d809f1d8208c43c
5a588a8193ac3d233f77fbca
5a57542f333d180827dfc132
5a572fd9fc597b0478a81d14
5a563183425d0f5186314855
5a4a38dad38c8a075495b5d2
5a48d4b2c7dab83a7d7b9851
5a489fb1c7dab83a7d7b1070
5a48ba95c7dab83a7d7b44ed
5a3ca9cb270f0e3f14d0eddb
5a3cb4e4270f0e3f14d12f43
5a3f4aba5889373fbbc5d3b5
5a0271884e62597cdee0d0eb
59e864b2a9e91f2c5529325f
599aa591d5b41f366fed0d58
59350ca084b7f26bf5ce6eb8
59338e76772c3e6384afbb15
5c20ca3a0843bc542d94e3e2
5c1dbf200843bc542d8ef8c4
5c1b1500bee9a723c96c3e78
5bea87f4abd34c35e1860ab5
5c2b3ed5e611832e8aed46bf
57f8d9bbe73f6760f10e916a
5bf7d63575c26f32dbf7413b
5be4ab93870d330ff2dce134
5bd43b4ba6b28b1ee86b92dd
5bccd6beca24970bce448134
5bc5f0e896b66a2cd8f9bd36
5b908d3dc6ab78485f3d24a9
5b2c67b5e0878c381608b8d8
5b4933abf2b5f44e95de482a
5b3b353d8d46a939f93524b9
5acf8ca0f3d8a750097e4b15
5ab8713ba3799a1d138bd69a
5aa235f64a17b335eeaf9609
5aa0f9d7a9efce63548c69a1
5a8315f624b8e938486e0bd8
5a48c4e9c7dab83a7d7b5cc7
59ecfd02e225f6492d20fcc9
59f87d0bfa6280566fb38c9a
59f363a8b45be22330016cad
59f70ab1e5c5d366af29bf3e
59e75a2ca9e91f2c5526005d
5947719bf1b45630bd096665
5947b62af1b45630bd0c2a02
59056e6760bb961de55f3501
58f7f7299f5b5647873cb110
58cf4771d0f5fb221defe6da
58d36897f387231e6c929903
58c4bb4f4a69c55606122be4
5b7a3890fc8fcf6781e2593a
5c189f2326173c3a09ed7ef3
5b950c71608de421b1e7318f
5a6400933d809f1d8200af15
59d2657f82ca7774b1ec081d
5ba19a8a360c7c30c1c169df
59817e4a1bd4b175e7038d19

================================================
FILE: datasets/lists/blendedmvs/val.txt
================================================
5b7a3890fc8fcf6781e2593a
5c189f2326173c3a09ed7ef3
5b950c71608de421b1e7318f
5a6400933d809f1d8200af15
59d2657f82ca7774b1ec081d
5ba19a8a360c7c30c1c169df
59817e4a1bd4b175e7038d19

================================================
FILE: datasets/lists/dtu/test.txt
================================================
scan1
scan4
scan9
scan10
scan11
scan12
scan13
scan15
scan23
scan24
scan29
scan32
scan33
scan34
scan48
scan49
scan62
scan75
scan77
scan110
scan114
scan118

================================================
FILE: datasets/lists/dtu/train.txt
================================================
scan2
scan6
scan7
scan8
scan14
scan16
scan18
scan19
scan20
scan22
scan30
scan31
scan36
scan39
scan41
scan42
scan44
scan45
scan46
scan47
scan50
scan51
scan52
scan53
scan55
scan57
scan58
scan60
scan61
scan63
scan64
scan65
scan68
scan69
scan70
scan71
scan72
scan74
scan76
scan83
scan84
scan85
scan87
scan88
scan89
scan90
scan91
scan92
scan93
scan94
scan95
scan96
scan97
scan98
scan99
scan100
scan101
scan102
scan103
scan104
scan105
scan107
scan108
scan109
scan111
scan112
scan113
scan115
scan116
scan119
scan120
scan121
scan122
scan123
scan124
scan125
scan126
scan127
scan128

================================================
FILE: datasets/lists/dtu/val.txt
================================================
scan3
scan5
scan17
scan21
scan28
scan35
scan37
scan38
scan40
scan43
scan56
scan59
scan66
scan67
scan82
scan86
scan106
scan117

================================================
FILE: datasets/lists/tnt/advanced.txt
================================================
Auditorium
Ballroom
Courtroom
Museum
Palace
Temple

================================================
FILE: datasets/lists/tnt/intermediate.txt
================================================
Family
Horse
Francis
Lighthouse
M60
Panther
Playground
Train

================================================
FILE: datasets/tnt.py
================================================
# -*- coding: utf-8 -*-
# @Description: Data preprocessing and organization for Tanks and Temples dataset.
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07

import os
import cv2
import numpy as np
from PIL import Image

from torch.utils.data import Dataset

from datasets.data_io import *


class TNTDataset(Dataset):
    def __init__(self, root_dir, list_file, split, n_views, **kwargs):
        super(TNTDataset, self).__init__()

        self.root_dir = root_dir
        self.list_file = list_file
        self.split = split
        self.n_views = n_views

        self.cam_mode = kwargs.get("cam_mode", "origin")    # origin / short_range
        if self.cam_mode == 'short_range': assert self.split == "intermediate"
        self.img_mode = kwargs.get("img_mode", "resize")    # resize / crop

        self.total_depths = 192
        self.depth_interval_table = {
            # intermediate
            'Family': 2.5e-3, 'Francis': 1e-2, 'Horse': 1.5e-3, 'Lighthouse': 1.5e-2, 'M60': 5e-3, 'Panther': 5e-3, 'Playground': 7e-3, 'Train': 5e-3, 
            # advanced
            'Auditorium': 3e-2, 'Ballroom': 2e-2, 'Courtroom': 2e-2, 'Museum': 2e-2, 'Palace': 1e-2, 'Temple': 1e-2
        }
        self.img_wh = kwargs.get("img_wh", (-1, 1024))

        self.metas = self.build_metas()


    def build_metas(self):
        metas = []

        with open(os.path.join(self.list_file)) as f:
            scans = [line.rstrip() for line in f.readlines()]

        for scan in scans:
            with open(os.path.join(self.root_dir, self.split, scan, 'pair.txt')) as f:
                num_viewpoint = int(f.readline())
                for view_idx in range(num_viewpoint):
                    ref_view = int(f.readline().rstrip())
                    src_views = [int(x) for x in f.readline().rstrip().split()[1::2]]
                    if len(src_views) != 0:
                        metas += [(scan, -1, ref_view, src_views)]
        return metas

   
    def read_cam_file(self, filename):
        with open(filename) as f:
            lines = [line.rstrip() for line in f.readlines()]
        # extrinsics: line [1,5), 4x4 matrix
        extrinsics = np.fromstring(' '.join(lines[1:5]), dtype=np.float32, sep=' ')
        extrinsics = extrinsics.reshape((4, 4))
        # intrinsics: line [7-10), 3x3 matrix
        intrinsics = np.fromstring(' '.join(lines[7:10]), dtype=np.float32, sep=' ')
        intrinsics = intrinsics.reshape((3, 3))
        
        depth_min = float(lines[11].split()[0])
        depth_max = float(lines[11].split()[-1])

        return intrinsics, extrinsics, depth_min, depth_max


    def read_img(self, filename):
        img = Image.open(filename)
        np_img = np.array(img, dtype=np.float32) / 255.
        return np_img


    def scale_tnt_input(self, intrinsics, img):
        if self.img_mode == "crop":
            intrinsics[1,2] = intrinsics[1,2] - 28  # 1080 -> 1024
            img = img[28:1080-28, :, :]
        elif self.img_mode == "resize": 
            height, width = img.shape[:2]

            max_w, max_h = self.img_wh[0], self.img_wh[1]
            if max_w == -1:
                max_w = width

            img = cv2.resize(img, (max_w, max_h))

            scale_w = 1.0 * max_w / width
            intrinsics[0, :] *= scale_w
            scale_h = 1.0 * max_h / height
            intrinsics[1, :] *= scale_h

        return intrinsics, img


    def __len__(self):
        return len(self.metas)


    def __getitem__(self, idx):
        scan, _, ref_view, src_views = self.metas[idx]
        view_ids = [ref_view] + src_views[:self.n_views-1]

        imgs = []
        depth_min = None
        depth_max = None

        proj_matrices_0 = []
        proj_matrices_1 = []
        proj_matrices_2 = []
        proj_matrices_3 = []

        for i, vid in enumerate(view_ids):
            img_filename = os.path.join(self.root_dir, self.split, scan, f'images/{vid:08d}.jpg')
            if self.cam_mode == 'short_range':
                # can only use for Intermediate
                proj_mat_filename = os.path.join(self.root_dir, self.split, scan, f'cams_{scan.lower()}/{vid:08d}_cam.txt')
            elif self.cam_mode == 'origin':
                proj_mat_filename = os.path.join(self.root_dir, self.split, scan, f'cams/{vid:08d}_cam.txt')

            img = self.read_img(img_filename)

            intrinsics, extrinsics, depth_min_, depth_max_ = self.read_cam_file(proj_mat_filename)
            intrinsics, img = self.scale_tnt_input(intrinsics, img)
            imgs.append(img.transpose(2,0,1))

            proj_mat_0 = np.zeros(shape=(2, 4, 4), dtype=np.float32)
            proj_mat_1 = np.zeros(shape=(2, 4, 4), dtype=np.float32)
            proj_mat_2 = np.zeros(shape=(2, 4, 4), dtype=np.float32)
            proj_mat_3 = np.zeros(shape=(2, 4, 4), dtype=np.float32)

            intrinsics[:2,:] *= 0.125
            proj_mat_0[0,:4,:4] = extrinsics.copy()
            proj_mat_0[1,:3,:3] = intrinsics.copy()
            int_mat_0 = intrinsics.copy()

            intrinsics[:2,:] *= 2
            proj_mat_1[0,:4,:4] = extrinsics.copy()
            proj_mat_1[1,:3,:3] = intrinsics.copy()
            int_mat_1 = intrinsics.copy()

            intrinsics[:2,:] *= 2
            proj_mat_2[0,:4,:4] = extrinsics.copy()
            proj_mat_2[1,:3,:3] = intrinsics.copy()
            int_mat_2 = intrinsics.copy()

            intrinsics[:2,:] *= 2
            proj_mat_3[0,:4,:4] = extrinsics.copy()
            proj_mat_3[1,:3,:3] = intrinsics.copy()
            int_mat_3 = intrinsics.copy() 

            proj_matrices_0.append(proj_mat_0)
            proj_matrices_1.append(proj_mat_1)
            proj_matrices_2.append(proj_mat_2)
            proj_matrices_3.append(proj_mat_3)

            # reference view
            if i == 0:
                depth_min =  depth_min_
                if self.cam_mode == 'short_range':
                    depth_max = depth_min + self.total_depths * self.depth_interval_table[scan]
                elif self.cam_mode == 'origin':
                    depth_max = depth_max_

        proj={}
        proj['stage1'] = np.stack(proj_matrices_0)
        proj['stage2'] = np.stack(proj_matrices_1)
        proj['stage3'] = np.stack(proj_matrices_2)
        proj['stage4'] = np.stack(proj_matrices_3)

        intrinsics_matrices = {
            "stage1": int_mat_0,
            "stage2": int_mat_1,
            "stage3": int_mat_2,
            "stage4": int_mat_3
        }

        sample = {
            "imgs": imgs,
            "proj_matrices": proj,
            "intrinsics_matrices": intrinsics_matrices,
            "depth_values": np.array([depth_min, depth_max], dtype=np.float32),
            "filename": scan + '/{}/' + '{:0>8}'.format(view_ids[0]) + "{}"
        }

        return sample

================================================
FILE: fusions/dtu/_open3d.py
================================================
# -*- coding: utf-8 -*-
# @Description: Point cloud fusion strategy for DTU dataset based on Open3D Library.
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07

import torch
import numpy as np
import sys
import argparse
import errno, os
import glob
import os.path as osp
import re
import cv2
from PIL import Image
import gc
import open3d as o3d

import torch
import torch.nn.functional as F
import numpy as np


parser = argparse.ArgumentParser(description='Depth fusion with consistency check.')
parser.add_argument('--root_path', type=str, default='[/path/to/]dtu-test-1200')
parser.add_argument('--depth_path', type=str, default='')
parser.add_argument('--data_list', type=str, default='')
parser.add_argument('--ply_path', type=str, default='')
parser.add_argument('--dist_thresh', type=float, default=0.001)
parser.add_argument('--prob_thresh', type=float, default=0.6)
parser.add_argument('--num_consist', type=int, default=10)
parser.add_argument('--device', type=str, default='cpu')

args = parser.parse_args()


def homo_warping(src_fea, src_proj, ref_proj, depth_values):
    # src_fea: [B, C, H, W]
    # src_proj: [B, 4, 4]
    # ref_proj: [B, 4, 4]
    # depth_values: [B, Ndepth] o [B, Ndepth, H, W]
    # out: [B, C, Ndepth, H, W]
    batch, channels = src_fea.shape[0], src_fea.shape[1]
    height, width = src_fea.shape[2], src_fea.shape[3]

    with torch.no_grad():
        proj = torch.matmul(src_proj, torch.inverse(ref_proj))
        rot = proj[:, :3, :3]  # [B,3,3]
        trans = proj[:, :3, 3:4]  # [B,3,1]

        y, x = torch.meshgrid([torch.arange(0, height, dtype=torch.float32, device=src_fea.device),
                               torch.arange(0, width, dtype=torch.float32, device=src_fea.device)])
        y, x = y.contiguous(), x.contiguous()
        y, x = y.view(height * width), x.view(height * width)
        xyz = torch.stack((x, y, torch.ones_like(x)))  # [3, H*W]
        xyz = torch.unsqueeze(xyz, 0).repeat(batch, 1, 1)  # [B, 3, H*W]
        rot_xyz = torch.matmul(rot, xyz)  # [B, 3, H*W]

        rot_depth_xyz = rot_xyz.unsqueeze(2) * depth_values.view(-1, 1, 1, height*width)  # [B, 3, 1, H*W]

        proj_xyz = rot_depth_xyz + trans.view(batch, 3, 1, 1)  # [B, 3, Ndepth, H*W]
        proj_xy = proj_xyz[:, :2, :, :] / proj_xyz[:, 2:3, :, :]  # [B, 2, Ndepth, H*W]
        proj_x_normalized = proj_xy[:, 0, :, :] / ((width - 1) / 2) - 1
        proj_y_normalized = proj_xy[:, 1, :, :] / ((height - 1) / 2) - 1
        proj_xy = torch.stack((proj_x_normalized, proj_y_normalized), dim=3)  # [B, Ndepth, H*W, 2]
        grid = proj_xy

    warped_src_fea = F.grid_sample(src_fea, grid.view(batch,  height, width, 2), mode='bilinear',
                                   padding_mode='zeros')
    warped_src_fea = warped_src_fea.view(batch, channels, height, width)

    return warped_src_fea


def generate_points_from_depth(depth, proj):
    '''
    :param depth: (B, 1, H, W)
    :param proj: (B, 4, 4)
    :return: point_cloud (B, 3, H, W)
    '''
    batch, height, width = depth.shape[0], depth.shape[2], depth.shape[3]
    inv_proj = torch.inverse(proj)

    rot = inv_proj[:, :3, :3]  # [B,3,3]
    trans = inv_proj[:, :3, 3:4]  # [B,3,1]

    y, x = torch.meshgrid([torch.arange(0, height, dtype=torch.float32, device=depth.device),
                           torch.arange(0, width, dtype=torch.float32, device=depth.device)])
    y, x = y.contiguous(), x.contiguous()
    y, x = y.view(height * width), x.view(height * width)
    xyz = torch.stack((x, y, torch.ones_like(x)))  # [3, H*W]
    xyz = torch.unsqueeze(xyz, 0).repeat(batch, 1, 1)  # [B, 3, H*W]
    rot_xyz = torch.matmul(rot, xyz)  # [B, 3, H*W]
    rot_depth_xyz = rot_xyz * depth.view(batch, 1, -1)
    proj_xyz = rot_depth_xyz + trans.view(batch, 3, 1)  # [B, 3, H*W]
    proj_xyz = proj_xyz.view(batch, 3, height, width)

    return proj_xyz


def mkdir_p(path):
    try:
        os.makedirs(path)
    except OSError as exc:
        if exc.errno == errno.EEXIST and os.path.isdir(path):
            pass
        else:
            raise


def read_pfm(filename):
    file = open(filename, 'rb')
    color = None
    width = None
    height = None
    scale = None
    endian = None

    header = file.readline().decode('utf-8').rstrip()
    if header == 'PF':
        color = True
    elif header == 'Pf':
        color = False
    else:
        raise Exception('Not a PFM file.')

    dim_match = re.match(r'^(\d+)\s(\d+)\s$', file.readline().decode('utf-8'))
    if dim_match:
        width, height = map(int, dim_match.groups())
    else:
        raise Exception('Malformed PFM header.')

    scale = float(file.readline().rstrip())
    if scale < 0:  # little-endian
        endian = '<'
        scale = -scale
    else:
        endian = '>'  # big-endian

    data = np.fromfile(file, endian + 'f')
    shape = (height, width, 3) if color else (height, width)

    data = np.reshape(data, shape)
    data = np.flipud(data)
    file.close()
    return data, scale


def write_pfm(file, image, scale=1):
    file = open(file, 'wb')
    color = None
    if image.dtype.name != 'float32':
        raise Exception('Image dtype must be float32.')

    image = np.flipud(image)

    if len(image.shape) == 3 and image.shape[2] == 3: # color image
        color = True
    elif len(image.shape) == 2 or len(image.shape) == 3 and image.shape[2] == 1: # greyscale
        color = False
    else:
        raise Exception('Image must have H x W x 3, H x W x 1 or H x W dimensions.')

    file.write('PF\n'.encode() if color else 'Pf\n'.encode())
    file.write('%d %d\n'.encode() % (image.shape[1], image.shape[0]))

    endian = image.dtype.byteorder

    if endian == '<' or endian == '=' and sys.byteorder == 'little':
        scale = -scale

    file.write('%f\n'.encode() % scale)

    image_string = image.tostring()
    file.write(image_string)
    file.close()


def write_ply(file, points):
    pcd = o3d.geometry.PointCloud()
    pcd.points = o3d.utility.Vector3dVector(points[:, :3])
    pcd.colors = o3d.utility.Vector3dVector(points[:, 3:] / 255.)
    o3d.io.write_point_cloud(file, pcd, write_ascii=False)


def filter_depth(ref_depth, src_depths, ref_proj, src_projs):
    '''
    :param ref_depth: (1, 1, H, W)
    :param src_depths: (B, 1, H, W)
    :param ref_proj: (1, 4, 4)
    :param src_proj: (B, 4, 4)
    :return: ref_pc: (1, 3, H, W), aligned_pcs: (B, 3, H, W), dist: (B, 1, H, W)
    '''

    ref_pc = generate_points_from_depth(ref_depth, ref_proj)
    src_pcs = generate_points_from_depth(src_depths, src_projs)

    aligned_pcs = homo_warping(src_pcs, src_projs, ref_proj, ref_depth)

    x_2 = (ref_pc[:, 0] - aligned_pcs[:, 0])**2
    y_2 = (ref_pc[:, 1] - aligned_pcs[:, 1])**2
    z_2 = (ref_pc[:, 2] - aligned_pcs[:, 2])**2
    dist = torch.sqrt(x_2 + y_2 + z_2).unsqueeze(1)

    return ref_pc, aligned_pcs, dist


def parse_cameras(path):
    cam_txt = open(path).readlines()
    f = lambda xs: list(map(lambda x: list(map(float, x.strip().split())), xs))

    extr_mat = f(cam_txt[1:5])
    intr_mat = f(cam_txt[7:10])

    extr_mat = np.array(extr_mat, np.float32)
    intr_mat = np.array(intr_mat, np.float32)

    return extr_mat, intr_mat


def load_data(root_path, depth_path, scene_name, thresh):

    depths = []
    projs = []
    rgbs = []

    for view in range(49):
        img_filename = "{}/{}/images/{:08d}.jpg".format(depth_path, scene_name, view)
        cam_filename = "{}/{}/cams/{:08d}_cam.txt".format(depth_path, scene_name, view)
        depth_filename = "{}/{}/depth_est/{:08d}.pfm".format(depth_path, scene_name, view)
        confidence_filename = "{}/{}/confidence/{:08d}.pfm".format(depth_path, scene_name, view)


        extr_mat, intr_mat = parse_cameras(cam_filename)
        proj_mat = np.eye(4)
        proj_mat[:3, :4] = np.dot(intr_mat[:3, :3], extr_mat[:3, :4])
        projs.append(torch.from_numpy(proj_mat))

        dep_map, _ = read_pfm(depth_filename)
        h, w = dep_map.shape
        conf_map, _ = read_pfm(confidence_filename)
        conf_map = cv2.resize(conf_map, (w, h), interpolation=cv2.INTER_LINEAR)

        dep_map = dep_map * (conf_map>thresh).astype(np.float32)
        depths.append(torch.from_numpy(dep_map).unsqueeze(0))

        rgb = np.array(Image.open(img_filename))
        rgbs.append(rgb)

    depths = torch.stack(depths).float()
    projs = torch.stack(projs).float()
    if args.device == 'cuda' and torch.cuda.is_available():
        depths = depths.cuda()
        projs = projs.cuda()

    return depths, projs, rgbs


def extract_points(pc, mask, rgb):
    pc = pc.cpu().numpy()
    mask = mask.cpu().numpy()

    mask = np.reshape(mask, (-1,))
    pc = np.reshape(pc, (-1, 3))
    rgb = np.reshape(rgb, (-1, 3))

    points = pc[np.where(mask)]
    colors = rgb[np.where(mask)]

    points_with_color = np.concatenate([points, colors], axis=1)

    return points_with_color


def open3d_filter():
    with torch.no_grad():
        mkdir_p(args.ply_path)
        all_scenes = open(args.data_list, 'r').readlines()
        all_scenes = list(map(str.strip, all_scenes))

        for i, scene in enumerate(all_scenes):

            print("{}/{} {}:".format(i, len(all_scenes), scene), '------------------------')

            depths, projs, rgbs = load_data(args.root_path, args.depth_path, scene, args.prob_thresh)
            tot_frame = depths.shape[0]
            height, width = depths.shape[2], depths.shape[3]
            points = []

            print('Scene: {} total: {} frames'.format(scene, tot_frame))
            for i in range(tot_frame):
                pc_buff = torch.zeros((3, height, width), device=depths.device, dtype=depths.dtype)
                val_cnt = torch.zeros((1, height, width), device=depths.device, dtype=depths.dtype)
                j = 0
                batch_size = 20

                while True:
                    ref_pc, pcs, dist = filter_depth(ref_depth=depths[i:i+1], src_depths=depths[j:min(j+batch_size, tot_frame)],
                                                    ref_proj=projs[i:i+1], src_projs=projs[j:min(j+batch_size, tot_frame)])
                    masks = (dist < args.dist_thresh).float()
                    masked_pc = pcs * masks
                    pc_buff += masked_pc.sum(dim=0, keepdim=False)
                    val_cnt += masks.sum(dim=0, keepdim=False)

                    j += batch_size
                    if j >= tot_frame:
                        break

                final_mask = (val_cnt >= args.num_consist).squeeze(0)
                avg_points = torch.div(pc_buff, val_cnt).permute(1, 2, 0)

                final_pc = extract_points(avg_points, final_mask, rgbs[i])
                points.append(final_pc)
                if i==0 or i==tot_frame-1:
                    print('Processing {} {}/{} ...'.format(scene, i+1, tot_frame))

            ply_id = int(scene[4:])
            write_ply('{}/mvsnet{:03d}.ply'.format(args.ply_path, ply_id), np.concatenate(points, axis=0))
            del points, depths, rgbs, projs

            gc.collect()

            print('Save {}/mvsnet{:03d}.ply successful.'.format(args.ply_path, ply_id))


if __name__ == '__main__':
    open3d_filter()


================================================
FILE: fusions/dtu/gipuma.py
================================================
# -*- coding: utf-8 -*-
# @Description: Point cloud fusion strategy for DTU dataset: Gipuma (fusibile).
#     Refer to: https://github.com/YoYo000/MVSNet/blob/master/mvsnet/depthfusion.py
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07

from __future__ import print_function

import os, re, sys, shutil
from struct import *
import numpy as np
import argparse
import cv2
from tensorflow.python.lib.io import file_io


parser = argparse.ArgumentParser()
parser.add_argument('--root_dir', type=str, default='[/path/to]/dtu-test-1200', help='root directory of dtu dataset')
parser.add_argument('--list_file', type=str, default='datasets/lists/dtu/train.txt', help='file contains the scans list')

parser.add_argument('--depth_folder', type=str, default = './outputs/')
parser.add_argument('--out_folder', type=str, default = 'fusibile_fused')
parser.add_argument('--plydir', type=str, default='')
parser.add_argument('--quandir', type=str, default='')
parser.add_argument('--fusibile_exe_path', type=str, default = 'fusion/fusibile')
parser.add_argument('--prob_threshold', type=float, default = '0.8')
parser.add_argument('--disp_threshold', type=float, default = '0.13')
parser.add_argument('--num_consistent', type=float, default = '3')
parser.add_argument('--downsample_factor', type=int, default='1')

args = parser.parse_args()


# preprocess ====================================

def load_cam(file, interval_scale=1):
    """ read camera txt file """
    cam = np.zeros((2, 4, 4))
    words = file.read().split()
    # read extrinsic
    for i in range(0, 4):
        for j in range(0, 4):
            extrinsic_index = 4 * i + j + 1
            cam[0][i][j] = words[extrinsic_index]

    # read intrinsic
    for i in range(0, 3):
        for j in range(0, 3):
            intrinsic_index = 3 * i + j + 18
            cam[1][i][j] = words[intrinsic_index]

    if len(words) == 29:
        cam[1][3][0] = words[27]
        cam[1][3][1] = float(words[28]) * interval_scale
        cam[1][3][2] = 1100
        cam[1][3][3] = cam[1][3][0] + cam[1][3][1] * cam[1][3][2]
    elif len(words) == 30:
        cam[1][3][0] = words[27]
        cam[1][3][1] = float(words[28]) * interval_scale
        cam[1][3][2] = words[29]
        cam[1][3][3] = cam[1][3][0] + cam[1][3][1] * cam[1][3][2]
    elif len(words) == 31:
        cam[1][3][0] = words[27]
        cam[1][3][1] = float(words[28]) * interval_scale
        cam[1][3][2] = words[29]
        cam[1][3][3] = words[30]
    else:
        cam[1][3][0] = 0
        cam[1][3][1] = 0
        cam[1][3][2] = 0
        cam[1][3][3] = 0

    return cam


def load_pfm(file):
    color = None
    width = None
    height = None
    scale = None
    data_type = None
    header = file.readline().decode('UTF-8').rstrip()

    if header == 'PF':
        color = True
    elif header == 'Pf':
        color = False
    else:
        raise Exception('Not a PFM file.')
    dim_match = re.match(r'^(\d+)\s(\d+)\s$', file.readline().decode('UTF-8'))
    if dim_match:
        width, height = map(int, dim_match.groups())
    else:
        raise Exception('Malformed PFM header.')
    # scale = float(file.readline().rstrip())
    scale = float((file.readline()).decode('UTF-8').rstrip())
    if scale < 0: # little-endian
        data_type = '<f'
    else:
        data_type = '>f' # big-endian
    data_string = file.read()
    data = np.fromstring(data_string, data_type)
    shape = (height, width, 3) if color else (height, width)
    data = np.reshape(data, shape)
    data = cv2.flip(data, 0)
    return data


def write_pfm(file, image, scale=1):
    file = file_io.FileIO(file, mode='wb')
    color = None

    if image.dtype.name != 'float32':
        raise Exception('Image dtype must be float32.')

    image = np.flipud(image)

    if len(image.shape) == 3 and image.shape[2] == 3: # color image
        color = True
    elif len(image.shape) == 2 or len(image.shape) == 3 and image.shape[2] == 1: # greyscale
        color = False
    else:
        raise Exception('Image must have H x W x 3, H x W x 1 or H x W dimensions.')

    file.write('PF\n' if color else 'Pf\n')
    file.write('%d %d\n' % (image.shape[1], image.shape[0]))

    endian = image.dtype.byteorder

    if endian == '<' or endian == '=' and sys.byteorder == 'little':
        scale = -scale

    file.write('%f\n' % scale)

    image_string = image.tostring()
    file.write(image_string)

    file.close()

# ================================================


def read_gipuma_dmb(path):
    '''read Gipuma .dmb format image'''

    with open(path, "rb") as fid:
        
        image_type = unpack('<i', fid.read(4))[0]
        height = unpack('<i', fid.read(4))[0]
        width = unpack('<i', fid.read(4))[0]
        channel = unpack('<i', fid.read(4))[0]
        
        array = np.fromfile(fid, np.float32)
    array = array.reshape((width, height, channel), order="F")
    return np.transpose(array, (1, 0, 2)).squeeze()


def write_gipuma_dmb(path, image):
    '''write Gipuma .dmb format image'''
    
    image_shape = np.shape(image)
    width = image_shape[1]
    height = image_shape[0]
    if len(image_shape) == 3:
        channels = image_shape[2]
    else:
        channels = 1

    if len(image_shape) == 3:
        image = np.transpose(image, (2, 0, 1)).squeeze()

    with open(path, "wb") as fid:
        # fid.write(pack(1))
        fid.write(pack('<i', 1))
        fid.write(pack('<i', height))
        fid.write(pack('<i', width))
        fid.write(pack('<i', channels))
        image.tofile(fid)
    return 


def mvsnet_to_gipuma_dmb(in_path, out_path):
    '''convert mvsnet .pfm output to Gipuma .dmb format'''
    
    image = load_pfm(open(in_path))
    write_gipuma_dmb(out_path, image)

    return 


def mvsnet_to_gipuma_cam(in_path, out_path):
    '''convert mvsnet camera to gipuma camera format'''

    cam = load_cam(open(in_path))

    extrinsic = cam[0:4][0:4][0]
    intrinsic = cam[0:4][0:4][1]
    intrinsic[3][0] = 0
    intrinsic[3][1] = 0
    intrinsic[3][2] = 0
    intrinsic[3][3] = 0

    intrinsic[:2, :] /= args.downsample_factor

    projection_matrix = np.matmul(intrinsic, extrinsic)
    projection_matrix = projection_matrix[0:3][:]
    
    f = open(out_path, "w")
    for i in range(0, 3):
        for j in range(0, 4):
            f.write(str(projection_matrix[i][j]) + ' ')
        f.write('\n')
    f.write('\n')
    f.close()

    return


def fake_gipuma_normal(in_depth_path, out_normal_path):
    
    depth_image = read_gipuma_dmb(in_depth_path)
    image_shape = np.shape(depth_image)

    normal_image = np.ones_like(depth_image)
    normal_image = np.reshape(normal_image, (image_shape[0], image_shape[1], 1))
    normal_image = np.tile(normal_image, [1, 1, 3])
    normal_image = normal_image / 1.732050808

    mask_image = np.squeeze(np.where(depth_image > 0, 1, 0))
    mask_image = np.reshape(mask_image, (image_shape[0], image_shape[1], 1))
    mask_image = np.tile(mask_image, [1, 1, 3])
    mask_image = np.float32(mask_image)

    normal_image = np.multiply(normal_image, mask_image)
    normal_image = np.float32(normal_image)

    write_gipuma_dmb(out_normal_path, normal_image)
    return 


def mvsnet_to_gipuma(scan_folder, scan, root_dir, gipuma_point_folder):
    
    image_folder = os.path.join(root_dir, 'Rectified', scan)
    cam_folder = os.path.join(root_dir, 'Cameras')
    depth_folder = os.path.join(scan_folder, 'depth_est')

    gipuma_cam_folder = os.path.join(gipuma_point_folder, 'cams')
    gipuma_image_folder = os.path.join(gipuma_point_folder, 'images')
    if not os.path.isdir(gipuma_point_folder):
        os.mkdir(gipuma_point_folder)
    if not os.path.isdir(gipuma_cam_folder):
        os.mkdir(gipuma_cam_folder)
    if not os.path.isdir(gipuma_image_folder):
        os.mkdir(gipuma_image_folder)

    # convert cameras 
    for view in range(0,49):
        in_cam_file = os.path.join(cam_folder, "{:08d}_cam.txt".format(view))
        out_cam_file = os.path.join(gipuma_cam_folder, "{:08d}.png.P".format(view))
        mvsnet_to_gipuma_cam(in_cam_file, out_cam_file)

    # copy images to gipuma image folder    
    for view in range(0,49):
        in_image_file = os.path.join(image_folder, "rect_{:03d}_3_r5000.png".format(view+1))# Our image start from 1
        out_image_file = os.path.join(gipuma_image_folder, "{:08d}.png".format(view))
        # shutil.copy(in_image_file, out_image_file)

        in_image = cv2.imread(in_image_file)
        out_image = cv2.resize(in_image, None, fx=1.0/args.downsample_factor, fy=1.0/args.downsample_factor, interpolation=cv2.INTER_LINEAR)
        cv2.imwrite(out_image_file, out_image)

    # convert depth maps and fake normal maps
    gipuma_prefix = '2333__'
    for view in range(0,49):

        sub_depth_folder = os.path.join(gipuma_point_folder, gipuma_prefix+"{:08d}".format(view))
        if not os.path.isdir(sub_depth_folder):
            os.mkdir(sub_depth_folder)
        in_depth_pfm = os.path.join(depth_folder, "{:08d}_prob_filtered.pfm".format(view))
        out_depth_dmb = os.path.join(sub_depth_folder, 'disp.dmb')
        fake_normal_dmb = os.path.join(sub_depth_folder, 'normals.dmb')
        mvsnet_to_gipuma_dmb(in_depth_pfm, out_depth_dmb)
        fake_gipuma_normal(out_depth_dmb, fake_normal_dmb)


def probability_filter(scan_folder, prob_threshold):
    depth_folder = os.path.join(scan_folder, 'depth_est')
    prob_folder = os.path.join(scan_folder, 'confidence')
    
    # convert cameras 
    for view in range(0,49):
        init_depth_map_path = os.path.join(depth_folder, "{:08d}.pfm".format(view)) # New dataset outputs depth start from 0.
        prob_map_path = os.path.join(prob_folder, "{:08d}.pfm".format(view)) # Same as above
        out_depth_map_path = os.path.join(depth_folder, "{:08d}_prob_filtered.pfm".format(view)) # Gipuma start from 0

        depth_map = load_pfm(open(init_depth_map_path))
        prob_map = load_pfm(open(prob_map_path))
        depth_map[prob_map < prob_threshold] = 0
        write_pfm(out_depth_map_path, depth_map)


def depth_map_fusion(point_folder, fusibile_exe_path, disp_thresh, num_consistent):

    cam_folder = os.path.join(point_folder, 'cams')
    image_folder = os.path.join(point_folder, 'images')
    depth_min = 0.001
    depth_max = 100000
    normal_thresh = 360

    cmd = fusibile_exe_path
    cmd = cmd + ' -input_folder ' + point_folder + '/'
    cmd = cmd + ' -p_folder ' + cam_folder + '/'
    cmd = cmd + ' -images_folder ' + image_folder + '/'
    cmd = cmd + ' --depth_min=' + str(depth_min)
    cmd = cmd + ' --depth_max=' + str(depth_max)
    cmd = cmd + ' --normal_thresh=' + str(normal_thresh)
    cmd = cmd + ' --disp_thresh=' + str(disp_thresh)
    cmd = cmd + ' --num_consistent=' + str(num_consistent)
    print (cmd)
    os.system(cmd)

    return 


def collectPly(point_folder, scan_id):
    model_name = 'final3d_model.ply'
    model_dir = [item for item in os.listdir(point_folder) if item.startswith("consistencyCheck")][-1]

    old = os.path.join(point_folder, model_dir, model_name)
    fresh = os.path.join(args.plydir, "mvsnet") + scan_id.zfill(3) + ".ply"
    shutil.move(old, fresh)


if __name__ == '__main__':

    root_dir = args.root_dir
    depth_folder = args.depth_folder
    out_folder = args.out_folder
    fusibile_exe_path = args.fusibile_exe_path
    prob_threshold = args.prob_threshold
    disp_threshold = args.disp_threshold
    num_consistent = args.num_consistent

    # Read test list
    testlist = args.list_file
    with open(testlist) as f:
        scans = f.readlines()
        scans = [line.rstrip() for line in scans]

    print("Start Gipuma(GPU) fusion!")

    if not os.path.isdir(args.plydir):
        os.mkdir(args.plydir)

    # Fusion
    for i, scan in enumerate(scans):
        print("{}/{} {}:".format(i, len(scans), scan), '------------------------')

        scan_folder = os.path.join(depth_folder, scan)
        fusibile_workspace = os.path.join(depth_folder, out_folder, scan)

        if not os.path.isdir(os.path.join(depth_folder, out_folder)):
            os.mkdir(os.path.join(depth_folder, out_folder))

        if not os.path.isdir(fusibile_workspace):
            os.mkdir(fusibile_workspace)

        # probability filtering
        print ('filter depth map with probability map')
        probability_filter(scan_folder, prob_threshold)

        # convert to gipuma format
        print ('Convert mvsnet output to gipuma input')
        mvsnet_to_gipuma(scan_folder, scan, root_dir, fusibile_workspace)

        # depth map fusion with gipuma 
        print ('Run depth map fusion & filter')
        depth_map_fusion(fusibile_workspace, fusibile_exe_path, disp_threshold, num_consistent)

        # collect .ply results to summary folder
        print('Collect {} ply'.format(scan))
        collectPly(fusibile_workspace, scan[4:])

    print("Gipuma(GPU) fusion done!")
    shutil.rmtree(os.path.join(depth_folder, out_folder))
    print("fusibile_fused remove done!")

================================================
FILE: fusions/dtu/pcd.py
================================================
# -*- coding: utf-8 -*-
# @Description: Point cloud fusion strategy for DTU dataset: Basic PCD.
#     Refer to: https://github.com/xy-guo/MVSNet_pytorch/blob/master/eval.py
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07

import argparse, os, sys, cv2, re, logging, time
import numpy as np
from plyfile import PlyData, PlyElement
from PIL import Image

from multiprocessing import Pool
from functools import partial
import signal


parser = argparse.ArgumentParser(description='filter, and fuse')

parser.add_argument('--testpath', default='[/path/to]/dtu-test-1200', help='testing data dir for some scenes')
parser.add_argument('--testlist', default="datasets/lists/dtu/test.txt", help='testing scene list')

parser.add_argument('--outdir', default='./outputs/[exp_name]', help='output dir')
parser.add_argument('--logdir', default='./checkpoints/debug', help='the directory to save checkpoints/logs')
parser.add_argument('--nolog', action='store_true', help='do not logging into .log file')
parser.add_argument('--plydir', default='./outputs/[exp_name]/pcd_fusion_plys/', help='output dir')

parser.add_argument('--num_worker', type=int, default=4, help='depth_filer worker')

parser.add_argument('--conf', type=float, default=0.9, help='prob confidence')
parser.add_argument('--thres_view', type=int, default=5, help='threshold of num view')

args = parser.parse_args()


def read_pfm(filename):
    file = open(filename, 'rb')
    color = None
    width = None
    height = None
    scale = None
    endian = None

    header = file.readline().decode('utf-8').rstrip()
    if header == 'PF':
        color = True
    elif header == 'Pf':
        color = False
    else:
        raise Exception('Not a PFM file.')

    dim_match = re.match(r'^(\d+)\s(\d+)\s$', file.readline().decode('utf-8'))
    if dim_match:
        width, height = map(int, dim_match.groups())
    else:
        raise Exception('Malformed PFM header.')

    scale = float(file.readline().rstrip())
    if scale < 0:  # little-endian
        endian = '<'
        scale = -scale
    else:
        endian = '>'  # big-endian

    data = np.fromfile(file, endian + 'f')
    shape = (height, width, 3) if color else (height, width)

    data = np.reshape(data, shape)
    data = np.flipud(data)
    file.close()
    return data, scale


def read_camera_parameters(filename):
    with open(filename) as f:
        lines = f.readlines()
        lines = [line.rstrip() for line in lines]
    # extrinsics: line [1,5), 4x4 matrix
    extrinsics = np.fromstring(' '.join(lines[1:5]), dtype=np.float32, sep=' ').reshape((4, 4))
    # intrinsics: line [7-10), 3x3 matrix
    intrinsics = np.fromstring(' '.join(lines[7:10]), dtype=np.float32, sep=' ').reshape((3, 3))
    return intrinsics, extrinsics


def read_img(filename):
    img = Image.open(filename)
    # scale 0~255 to 0~1
    np_img = np.array(img, dtype=np.float32) / 255.
    return np_img


def read_mask(filename):
    return read_img(filename) > 0.5


def save_mask(filename, mask):
    assert mask.dtype == np.bool
    mask = mask.astype(np.uint8) * 255
    Image.fromarray(mask).save(filename)


def read_pair_file(filename):
    data = []
    with open(filename) as f:
        num_viewpoint = int(f.readline())
        # 49 viewpoints
        for view_idx in range(num_viewpoint):
            ref_view = int(f.readline().rstrip())
            src_views = [int(x) for x in f.readline().rstrip().split()[1::2]]
            if len(src_views) > 0:
                data.append((ref_view, src_views))
    return data


# project the reference point cloud into the source view, then project back
def reproject_with_depth(depth_ref, intrinsics_ref, extrinsics_ref, depth_src, intrinsics_src, extrinsics_src):
    width, height = depth_ref.shape[1], depth_ref.shape[0]
    ## step1. project reference pixels to the source view
    # reference view x, y
    x_ref, y_ref = np.meshgrid(np.arange(0, width), np.arange(0, height))
    x_ref, y_ref = x_ref.reshape([-1]), y_ref.reshape([-1])
    # reference 3D space
    xyz_ref = np.matmul(np.linalg.inv(intrinsics_ref),
                        np.vstack((x_ref, y_ref, np.ones_like(x_ref))) * depth_ref.reshape([-1]))
    # source 3D space
    xyz_src = np.matmul(np.matmul(extrinsics_src, np.linalg.inv(extrinsics_ref)),
                        np.vstack((xyz_ref, np.ones_like(x_ref))))[:3]
    # source view x, y
    K_xyz_src = np.matmul(intrinsics_src, xyz_src)
    xy_src = K_xyz_src[:2] / K_xyz_src[2:3]

    ## step2. reproject the source view points with source view depth estimation
    # find the depth estimation of the source view
    x_src = xy_src[0].reshape([height, width]).astype(np.float32)
    y_src = xy_src[1].reshape([height, width]).astype(np.float32)
    sampled_depth_src = cv2.remap(depth_src, x_src, y_src, interpolation=cv2.INTER_LINEAR)
    # mask = sampled_depth_src > 0

    # source 3D space
    # NOTE that we should use sampled source-view depth_here to project back
    xyz_src = np.matmul(np.linalg.inv(intrinsics_src),
                        np.vstack((xy_src, np.ones_like(x_ref))) * sampled_depth_src.reshape([-1]))
    # reference 3D space
    xyz_reprojected = np.matmul(np.matmul(extrinsics_ref, np.linalg.inv(extrinsics_src)),
                                np.vstack((xyz_src, np.ones_like(x_ref))))[:3]
    # source view x, y, depth
    depth_reprojected = xyz_reprojected[2].reshape([height, width]).astype(np.float32)
    K_xyz_reprojected = np.matmul(intrinsics_ref, xyz_reprojected)
    xy_reprojected = K_xyz_reprojected[:2] / K_xyz_reprojected[2:3]
    x_reprojected = xy_reprojected[0].reshape([height, width]).astype(np.float32)
    y_reprojected = xy_reprojected[1].reshape([height, width]).astype(np.float32)

    return depth_reprojected, x_reprojected, y_reprojected, x_src, y_src


def check_geometric_consistency(depth_ref, intrinsics_ref, extrinsics_ref, depth_src, intrinsics_src, extrinsics_src):
    width, height = depth_ref.shape[1], depth_ref.shape[0]
    x_ref, y_ref = np.meshgrid(np.arange(0, width), np.arange(0, height))
    depth_reprojected, x2d_reprojected, y2d_reprojected, x2d_src, y2d_src = reproject_with_depth(depth_ref, intrinsics_ref, extrinsics_ref,
                                                     depth_src, intrinsics_src, extrinsics_src)
    # check |p_reproj-p_1| < 1
    dist = np.sqrt((x2d_reprojected - x_ref) ** 2 + (y2d_reprojected - y_ref) ** 2)

    # check |d_reproj-d_1| / d_1 < 0.01
    depth_diff = np.abs(depth_reprojected - depth_ref)
    relative_depth_diff = depth_diff / depth_ref

    mask = np.logical_and(dist < 1, relative_depth_diff < 0.01)
    depth_reprojected[~mask] = 0

    return mask, depth_reprojected, x2d_src, y2d_src


def filter_depth(pair_folder, scan_folder, out_folder, plyfilename):
    # the pair file
    pair_file = os.path.join(pair_folder, "pair.txt")
    # for the final point cloud
    vertexs = []
    vertex_colors = []

    pair_data = read_pair_file(pair_file)

    # for each reference view and the corresponding source views
    for ref_view, src_views in pair_data:
        # src_views = src_views[:args.num_view]
        # load the camera parameters
        ref_intrinsics, ref_extrinsics = read_camera_parameters(
            os.path.join(scan_folder, 'cams/{:0>8}_cam.txt'.format(ref_view)))
        # load the reference image
        ref_img = read_img(os.path.join(scan_folder, 'images/{:0>8}.jpg'.format(ref_view)))
        # load the estimated depth of the reference view
        ref_depth_est = read_pfm(os.path.join(out_folder, 'depth_est/{:0>8}.pfm'.format(ref_view)))[0]
        # load the photometric mask of the reference view
        confidence = read_pfm(os.path.join(out_folder, 'confidence/{:0>8}.pfm'.format(ref_view)))[0]
        photo_mask = confidence > args.conf

        all_srcview_depth_ests = []
        all_srcview_x = []
        all_srcview_y = []
        all_srcview_geomask = []

        # compute the geometric mask
        geo_mask_sum = 0
        for src_view in src_views:
            # camera parameters of the source view
            src_intrinsics, src_extrinsics = read_camera_parameters(
                os.path.join(scan_folder, 'cams/{:0>8}_cam.txt'.format(src_view)))
            # the estimated depth of the source view
            src_depth_est = read_pfm(os.path.join(out_folder, 'depth_est/{:0>8}.pfm'.format(src_view)))[0]

            geo_mask, depth_reprojected, x2d_src, y2d_src = check_geometric_consistency(ref_depth_est, ref_intrinsics, ref_extrinsics,
                                                                      src_depth_est,
                                                                      src_intrinsics, src_extrinsics)
            geo_mask_sum += geo_mask.astype(np.int32)
            all_srcview_depth_ests.append(depth_reprojected)
            all_srcview_x.append(x2d_src)
            all_srcview_y.append(y2d_src)
            all_srcview_geomask.append(geo_mask)

        depth_est_averaged = (sum(all_srcview_depth_ests) + ref_depth_est) / (geo_mask_sum + 1)
        # at least 3 source views matched
        geo_mask = geo_mask_sum >= args.thres_view
        final_mask = np.logical_and(photo_mask, geo_mask)

        os.makedirs(os.path.join(out_folder, "mask"), exist_ok=True)
        save_mask(os.path.join(out_folder, "mask/{:0>8}_photo.png".format(ref_view)), photo_mask)
        save_mask(os.path.join(out_folder, "mask/{:0>8}_geo.png".format(ref_view)), geo_mask)
        save_mask(os.path.join(out_folder, "mask/{:0>8}_final.png".format(ref_view)), final_mask)

        logger.info("processing {}, ref-view{:0>2}, photo/geo/final-mask:{:.3f}/{:.3f}/{:.3f}".format(scan_folder, ref_view,
                                                                                    photo_mask.mean(),
                                                                                    geo_mask.mean(), final_mask.mean()))

        height, width = depth_est_averaged.shape[:2]
        x, y = np.meshgrid(np.arange(0, width), np.arange(0, height))
        # valid_points = np.logical_and(final_mask, ~used_mask[ref_view])
        valid_points = final_mask
        logger.info("valid_points: {}".format(valid_points.mean()))
        x, y, depth = x[valid_points], y[valid_points], depth_est_averaged[valid_points]
        #color = ref_img[1:-16:4, 1::4, :][valid_points]  # hardcoded for DTU dataset
        color = ref_img[valid_points]

        xyz_ref = np.matmul(np.linalg.inv(ref_intrinsics),
                            np.vstack((x, y, np.ones_like(x))) * depth)
        xyz_world = np.matmul(np.linalg.inv(ref_extrinsics),
                              np.vstack((xyz_ref, np.ones_like(x))))[:3]
        vertexs.append(xyz_world.transpose((1, 0)))
        vertex_colors.append((color * 255).astype(np.uint8))

    vertexs = np.concatenate(vertexs, axis=0)
    vertex_colors = np.concatenate(vertex_colors, axis=0)
    vertexs = np.array([tuple(v) for v in vertexs], dtype=[('x', 'f4'), ('y', 'f4'), ('z', 'f4')])
    vertex_colors = np.array([tuple(v) for v in vertex_colors], dtype=[('red', 'u1'), ('green', 'u1'), ('blue', 'u1')])

    vertex_all = np.empty(len(vertexs), vertexs.dtype.descr + vertex_colors.dtype.descr)
    for prop in vertexs.dtype.names:
        vertex_all[prop] = vertexs[prop]
    for prop in vertex_colors.dtype.names:
        vertex_all[prop] = vertex_colors[prop]

    el = PlyElement.describe(vertex_all, 'vertex')
    PlyData([el]).write(plyfilename)
    logger.info("saving the final model to " + plyfilename)


def init_worker():
    '''
    Catch Ctrl+C signal to termiante workers
    '''
    signal.signal(signal.SIGINT, signal.SIG_IGN)


def pcd_filter_worker(scan):
    scan_id = int(scan[4:])
    save_name = 'mvsnet{:0>3}.ply'.format(scan_id)

    pair_folder = os.path.join(args.testpath, "Cameras")
    scan_folder = os.path.join(args.outdir, scan)
    out_folder = os.path.join(args.outdir, scan)
    filter_depth(pair_folder, scan_folder, out_folder, os.path.join(args.plydir, save_name))


def pcd_filter(testlist, number_worker):

    partial_func = partial(pcd_filter_worker)

    p = Pool(number_worker, init_worker)
    try:
        p.map(partial_func, testlist)
    except KeyboardInterrupt:
        logger.info("....\nCaught KeyboardInterrupt, terminating workers")
        p.terminate()
    else:
        p.close()
    p.join()


def initLogger():
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
    curTime = time.strftime('%Y%m%d-%H%M', time.localtime(time.time()))
    if not os.path.isdir(args.logdir):
        os.mkdir(args.logdir)
    logfile = os.path.join(args.logdir, 'fusion-' + curTime + '.log')
    formatter = logging.Formatter("%(asctime)s - %(filename)s[line:%(lineno)d] - %(levelname)s: %(message)s")
    if not args.nolog:
        fileHandler = logging.FileHandler(logfile, mode='a')
        fileHandler.setFormatter(formatter)
        logger.addHandler(fileHandler)
    consoleHandler = logging.StreamHandler(sys.stdout)
    consoleHandler.setFormatter(formatter)
    logger.addHandler(consoleHandler)
    logger.info("Logger initialized.")
    logger.info("Writing logs to file: {}".format(logfile))
    logger.info("Current time: {}".format(curTime))

    return logger


if __name__ == '__main__':

    logger = initLogger()

    if not os.path.isdir(args.plydir):
        os.mkdir(args.plydir)

    with open(args.testlist) as f:
        content = f.readlines()
        testlist = [line.rstrip() for line in content]

    pcd_filter(testlist, args.num_worker)

================================================
FILE: fusions/tnt/dypcd.py
================================================
# -*- coding: utf-8 -*-
# @Description: Point cloud fusion strategy for Tanks and Temples dataset: DYnamic PCD.
#     Refer to: https://github.com/yhw-yhw/D2HC-RMVSNet/blob/master/fusion.py
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07

import os
import cv2
import signal
import numpy as np
from PIL import Image
from functools import partial
from multiprocessing import Pool
from plyfile import PlyData, PlyElement
import argparse
import re, json

from sklearn.preprocessing import scale

parser = argparse.ArgumentParser()
parser.add_argument("--root_dir", type=str, default="[/path/to/]tankandtemples/")
parser.add_argument('--out_dir', type=str, default='outputs/[exp_name]')
parser.add_argument('--ply_path', type=str, default='outputs/[exp_name]/dypcd_fusion_plys')

parser.add_argument('--split', type=str, default='intermediate', choices=['intermediate', 'advanced'])
parser.add_argument('--list_file', type=str, default='datasets/lists/tnt/intermediate.txt')
parser.add_argument('--num_workers', type=int, default=1)
parser.add_argument('--single_processor', action='store_true')

parser.add_argument('--rescale', action='store_true')
parser.add_argument('--max_w', type=int)
parser.add_argument('--max_h', type=int)
parser.add_argument('--cam_mode', type=str, default='origin', choices=['origin', 'short_range'])
parser.add_argument('--img_mode', type=str, default='resize', choices=['resize', 'crop'])

parser.add_argument('--dist_base', type=float, default=1 / 4)
parser.add_argument('--rel_diff_base', type=float, default=1 / 1300)

args = parser.parse_args()


tnt_fusion_exps = [
    {
        "ply_path": "dypcd_fusion_plys_mean",
        "param_strategy": "mean",
    },
    {
        "ply_path": "dypcd_fusion_plys",
        "param_strategy": "hyper_param",
        "hyper_param_table": {    # -1 -> mean()
            'Family': 0.6,
            'Francis': 0.6,
            'Horse': 0.2,
            'Lighthouse': 0.7,
            'M60': 0.6,
            'Panther': 0.6,
            'Playground': 0.7,
            'Train': 0.6,

            'Auditorium': 0.1,
            'Ballroom': 0.4,
            'Courtroom': 0.4,
            'Museum': 0.5,
            'Palace': 0.5,
            'Temple': 0.4
        }
    },
]


def read_pfm(filename):
    file = open(filename, 'rb')
    color = None
    width = None
    height = None
    scale = None
    endian = None

    header = file.readline().decode('utf-8').rstrip()
    if header == 'PF':
        color = True
    elif header == 'Pf':
        color = False
    else:
        raise Exception('Not a PFM file.')

    dim_match = re.match(r'^(\d+)\s(\d+)\s$', file.readline().decode('utf-8'))
    if dim_match:
        width, height = map(int, dim_match.groups())
    else:
        raise Exception('Malformed PFM header.')

    scale = float(file.readline().rstrip())
    if scale < 0:  # little-endian
        endian = '<'
        scale = -scale
    else:
        endian = '>'  # big-endian

    data = np.fromfile(file, endian + 'f')
    shape = (height, width, 3) if color else (height, width)

    data = np.reshape(data, shape)
    data = np.flipud(data)
    file.close()
    return data, scale


# save a binary mask
def save_mask(filename, mask):
    assert mask.dtype == np.bool
    mask = mask.astype(np.uint8) * 255
    Image.fromarray(mask).save(filename)


# read an image
def read_img(filename):
    img = Image.open(filename)
    # scale 0~255 to 0~1
    np_img = np.array(img, dtype=np.float32) / 255.
    return np_img


# read intrinsics and extrinsics
def read_camera_parameters(filename):
    with open(filename) as f:
        lines = f.readlines()
        lines = [line.rstrip() for line in lines]
    # extrinsics: line [1,5), 4x4 matrix
    extrinsics = np.fromstring(' '.join(lines[1:5]), dtype=np.float32, sep=' ').reshape((4, 4))
    # intrinsics: line [7-10), 3x3 matrix
    intrinsics = np.fromstring(' '.join(lines[7:10]), dtype=np.float32, sep=' ').reshape((3, 3))
    # TODO: assume the feature is 1/4 of the original image size
    # intrinsics[:2, :] /= 4
    return intrinsics, extrinsics


# read a pair file, [(ref_view1, [src_view1-1, ...]), (ref_view2, [src_view2-1, ...]), ...]
def read_pair_file(filename):
    data = []
    with open(filename) as f:
        num_viewpoint = int(f.readline())
        # 49 viewpoints
        for view_idx in range(num_viewpoint):
            ref_view = int(f.readline().rstrip())
            src_views = [int(x) for x in f.readline().rstrip().split()[1::2]]
            if len(src_views) > 0:
                data.append((ref_view, src_views))
    return data


# project the reference point cloud into the source view, then project back
def reproject_with_depth(depth_ref, intrinsics_ref, extrinsics_ref, depth_src, intrinsics_src, extrinsics_src):
    width, height = depth_ref.shape[1], depth_ref.shape[0]
    ## step1. project reference pixels to the source view
    # reference view x, y
    x_ref, y_ref = np.meshgrid(np.arange(0, width), np.arange(0, height))
    x_ref, y_ref = x_ref.reshape([-1]), y_ref.reshape([-1])
    # reference 3D space
    xyz_ref = np.matmul(np.linalg.inv(intrinsics_ref),
                        np.vstack((x_ref, y_ref, np.ones_like(x_ref))) * depth_ref.reshape([-1]))
    # source 3D space
    xyz_src = np.matmul(np.matmul(extrinsics_src, np.linalg.inv(extrinsics_ref)),
                        np.vstack((xyz_ref, np.ones_like(x_ref))))[:3]
    # source view x, y
    K_xyz_src = np.matmul(intrinsics_src, xyz_src)
    xy_src = K_xyz_src[:2] / K_xyz_src[2:3]

    ## step2. reproject the source view points with source view depth estimation
    # find the depth estimation of the source view
    x_src = xy_src[0].reshape([height, width]).astype(np.float32)
    y_src = xy_src[1].reshape([height, width]).astype(np.float32)
    sampled_depth_src = cv2.remap(depth_src, x_src, y_src, interpolation=cv2.INTER_LINEAR)
    # mask = sampled_depth_src > 0

    # source 3D space
    # NOTE that we should use sampled source-view depth_here to project back
    xyz_src = np.matmul(np.linalg.inv(intrinsics_src),
                        np.vstack((xy_src, np.ones_like(x_ref))) * sampled_depth_src.reshape([-1]))
    # reference 3D space
    xyz_reprojected = np.matmul(np.matmul(extrinsics_ref, np.linalg.inv(extrinsics_src)),
                                np.vstack((xyz_src, np.ones_like(x_ref))))[:3]
    # source view x, y, depth
    depth_reprojected = xyz_reprojected[2].reshape([height, width]).astype(np.float32)
    K_xyz_reprojected = np.matmul(intrinsics_ref, xyz_reprojected)
    K_xyz_reprojected[2:3][K_xyz_reprojected[2:3]==0] += 0.00001
    xy_reprojected = K_xyz_reprojected[:2] / K_xyz_reprojected[2:3]
    x_reprojected = xy_reprojected[0].reshape([height, width]).astype(np.float32)
    y_reprojected = xy_reprojected[1].reshape([height, width]).astype(np.float32)

    return depth_reprojected, x_reprojected, y_reprojected, x_src, y_src


def check_geometric_consistency(depth_ref, intrinsics_ref, extrinsics_ref, depth_src, intrinsics_src, extrinsics_src):
    width, height = depth_ref.shape[1], depth_ref.shape[0]
    x_ref, y_ref = np.meshgrid(np.arange(0, width), np.arange(0, height))
    depth_reprojected, x2d_reprojected, y2d_reprojected, x2d_src, y2d_src = reproject_with_depth(depth_ref, intrinsics_ref, extrinsics_ref,
                                                                                                 depth_src, intrinsics_src, extrinsics_src)
    # check |p_reproj-p_1| < 1
    dist = np.sqrt((x2d_reprojected - x_ref) ** 2 + (y2d_reprojected - y_ref) ** 2)

    # check |d_reproj-d_1| / d_1 < 0.01
    depth_diff = np.abs(depth_reprojected - depth_ref)
    relative_depth_diff = depth_diff / depth_ref

    mask = None
    masks = []
    for i in range(2, 11):
        # mask = np.logical_and(dist < i / 4, relative_depth_diff < i / 1300)
        mask = np.logical_and(dist < i * args.dist_base, relative_depth_diff < i * args.rel_diff_base)
        masks.append(mask)
    depth_reprojected[~mask] = 0

    return masks, mask, depth_reprojected, x2d_src, y2d_src


def scale_input(intrinsics, img):
    if args.img_mode == "crop":
        intrinsics[1,2] = intrinsics[1,2] - 28  # 1080 -> 1024
        img = img[28:1080-28, :, :]
    elif args.img_mode == "resize": 
        height, width = img.shape[:2]
        img = cv2.resize(img, (width, 1024))
        scale_h = 1.0 * 1024 / height
        intrinsics[1, :] *= scale_h

    return intrinsics, img


def filter_depth(scene, root_dir, split, out_dir, plyfilename, fusion_exp):
    # num_stage = len(args.ndepths)

    # the pair file
    pair_file = os.path.join(root_dir, split, scene, "pair.txt")
    # for the final point cloud
    vertexs = []
    vertex_colors = []

    pair_data = read_pair_file(pair_file)
    nviews = len(pair_data)

    # for each reference view and the corresponding source views
    for ref_view, src_views in pair_data:
        # src_views = src_views[:args.num_view]
        # load the camera parameters
        if args.cam_mode == 'short_range':
            ref_intrinsics, ref_extrinsics = read_camera_parameters(
                os.path.join(root_dir, split, scene, 'cams_{}/{:0>8}_cam.txt'.format(scene.lower(), ref_view)))
        elif args.cam_mode == 'origin':
            ref_intrinsics, ref_extrinsics = read_camera_parameters(
                os.path.join(root_dir, split, scene, 'cams/{:0>8}_cam.txt'.format(ref_view)))

        ref_img = read_img(os.path.join(root_dir, split, scene, 'images/{:0>8}.jpg'.format(ref_view)))
        ref_depth_est = read_pfm(os.path.join(out_dir, scene, 'depth_est/{:0>8}.pfm'.format(ref_view)))[0]
        confidence = read_pfm(os.path.join(out_dir, scene, 'confidence/{:0>8}.pfm'.format(ref_view)))[0]

        if fusion_exp['param_strategy'] == 'mean':
            if ref_view % 50 == 0: print("-- thresh: {}".format(confidence.mean()))
            photo_mask = confidence > confidence.mean()
        elif fusion_exp['param_strategy'] == 'hyper_param':
            conf_thresh = fusion_exp['hyper_param_table'][scene]
            if conf_thresh == -1:
                photo_mask = confidence > confidence.mean()
                if ref_view % 50 == 0: print("-- thresh: mean() {}".format(confidence.mean()))
            else:
                photo_mask = confidence > conf_thresh
                if ref_view % 50 == 0: print("-- thresh: {}".format(conf_thresh))
            
        
        flag_img = ref_img
        ref_intrinsics, _ = scale_input(ref_intrinsics, flag_img)

        all_srcview_depth_ests = []
        all_srcview_x = []
        all_srcview_y = []
        all_srcview_geomask = []

        # compute the geometric mask
        geo_mask_sum = 0
        dy_range = len(src_views) + 1
        geo_mask_sums = [0] * (dy_range - 2)
        for src_view in src_views:
            # camera parameters of the source view
            if args.cam_mode == 'short_range':
                src_intrinsics, src_extrinsics = read_camera_parameters(
                    os.path.join(root_dir, split, scene, 'cams_{}/{:0>8}_cam.txt'.format(scene.lower(), src_view)))
            elif args.cam_mode == 'origin':
                src_intrinsics, src_extrinsics = read_camera_parameters(
                    os.path.join(root_dir, split, scene, 'cams/{:0>8}_cam.txt'.format(src_view)))
            # the estimated depth of the source view
            src_depth_est = read_pfm(os.path.join(out_dir, scene, 'depth_est/{:0>8}.pfm'.format(src_view)))[0]

            src_intrinsics, _ = scale_input(src_intrinsics, flag_img)
                
            masks, geo_mask, depth_reprojected, x2d_src, y2d_src = check_geometric_consistency(ref_depth_est, ref_intrinsics,
                                                                                               ref_extrinsics, src_depth_est,
                                                                                               src_intrinsics, src_extrinsics)
            geo_mask_sum += geo_mask.astype(np.int32)
            for i in range(2, dy_range):
                geo_mask_sums[i - 2] += masks[i - 2].astype(np.int32)

            all_srcview_depth_ests.append(depth_reprojected)
            all_srcview_x.append(x2d_src)
            all_srcview_y.append(y2d_src)
            all_srcview_geomask.append(geo_mask)

        depth_est_averaged = (sum(all_srcview_depth_ests) + ref_depth_est) / (geo_mask_sum + 1)
        # at least args.thres_view source views matched
        geo_mask = geo_mask_sum >= dy_range
        for i in range(2, dy_range):
            geo_mask = np.logical_or(geo_mask, geo_mask_sums[i - 2] >= i)

        final_mask = np.logical_and(photo_mask, geo_mask)

        if ref_view < 3:
            os.makedirs(os.path.join(out_dir, scene, "mask"), exist_ok=True)
            save_mask(os.path.join(out_dir, scene, "mask/{:0>8}_photo.png".format(ref_view)), photo_mask)
            save_mask(os.path.join(out_dir, scene, "mask/{:0>8}_geo.png".format(ref_view)), geo_mask)
            save_mask(os.path.join(out_dir, scene, "mask/{:0>8}_final.png".format(ref_view)), final_mask)

        print("processing {}, ref-view{:0>2}, photo/geo/final-mask:{:.3f}/{:.3f}/{:.3f}".format(os.path.join(out_dir, scene), ref_view,
                                                                                    photo_mask.mean(),
                                                                                    geo_mask.mean(), final_mask.mean()))
        
        height, width = depth_est_averaged.shape[:2]
        x, y = np.meshgrid(np.arange(0, width), np.arange(0, height))
        # valid_points = np.logical_and(final_mask, ~used_mask[ref_view])
        valid_points = final_mask
        print("valid_points {:.3f}".format(valid_points.mean()))
        x, y, depth = x[valid_points], y[valid_points], depth_est_averaged[valid_points]
 
        # color = ref_img[:-24, :, :][valid_points]
        color = ref_img[28:1080-28, :, :][valid_points]

        xyz_ref = np.matmul(np.linalg.inv(ref_intrinsics),
                            np.vstack((x, y, np.ones_like(x))) * depth)
        xyz_world = np.matmul(np.linalg.inv(ref_extrinsics),
                              np.vstack((xyz_ref, np.ones_like(x))))[:3]
        vertexs.append(xyz_world.transpose((1, 0)))
        vertex_colors.append((color * 255).astype(np.uint8))

    vertexs = np.concatenate(vertexs, axis=0)
    vertex_colors = np.concatenate(vertex_colors, axis=0)
    vertexs = np.array([tuple(v) for v in vertexs], dtype=[('x', 'f4'), ('y', 'f4'), ('z', 'f4')])
    vertex_colors = np.array([tuple(v) for v in vertex_colors], dtype=[('red', 'u1'), ('green', 'u1'), ('blue', 'u1')])

    vertex_all = np.empty(len(vertexs), vertexs.dtype.descr + vertex_colors.dtype.descr)
    for prop in vertexs.dtype.names:
        vertex_all[prop] = vertexs[prop]
    for prop in vertex_colors.dtype.names:
        vertex_all[prop] = vertex_colors[prop]

    el = PlyElement.describe(vertex_all, 'vertex')
    PlyData([el]).write(plyfilename)
    print("saving the final model to", plyfilename)


def dypcd_filter_worker(scene):
    save_name = '{}.ply'.format(scene)

    filter_depth(scene, args.root_dir, args.split, args.out_dir, os.path.join(args.out_dir, fusion_exp['ply_path'], save_name), fusion_exp)


def init_worker():
    signal.signal(signal.SIGINT, signal.SIG_IGN)


if __name__ == '__main__':
    
    with open(os.path.join(args.list_file)) as f:
        testlist = [line.rstrip() for line in f.readlines()]

    for fusion_exp in tnt_fusion_exps:

        if not os.path.isdir(os.path.join(args.out_dir, fusion_exp['ply_path'])):
            os.mkdir(os.path.join(args.out_dir, fusion_exp['ply_path']))
        

        if args.single_processor:
            for scene in testlist:
                save_name = '{}.ply'.format(scene)
                filter_depth(scene, args.root_dir, args.split, args.out_dir, os.path.join(args.out_dir, fusion_exp['ply_path'], save_name), fusion_exp)

        else:
            partial_func = partial(dypcd_filter_worker)
            p = Pool(args.num_workers, init_worker)
            try:
                p.map(partial_func, testlist)
            except KeyboardInterrupt:
                print("....\nCaught KeyboardInterrupt, terminating workers")
                p.terminate()
            else:
                p.close()
            p.join()

================================================
FILE: models/__init__.py
================================================
from models.geomvsnet import GeoMVSNet
from models.loss import geomvsnet_loss

================================================
FILE: models/filter.py
================================================
# -*- coding: utf-8 -*-
# @Description: Basic implementation of Frequency Domain Filtering strategy (Sec 3.2 in the paper).
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07

import torch
import numpy as np
import matplotlib.pyplot as plt


def frequency_domain_filter(depth, rho_ratio):
    """
    large rho_ratio -> more information filtered
    """
    f = torch.fft.fft2(depth)
    fshift = torch.fft.fftshift(f)

    b, h, w = depth.shape
    k_h, k_w = h/rho_ratio, w/rho_ratio

    fshift[:,:int(h/2-k_h/2),:] = 0
    fshift[:,int(h/2+k_h/2):,:] = 0
    fshift[:,:,:int(w/2-k_w/2)] = 0
    fshift[:,:,int(w/2+k_w/2):] = 0

    ishift = torch.fft.ifftshift(fshift)
    idepth = torch.fft.ifft2(ishift)
    depth_filtered = torch.abs(idepth)

    return depth_filtered


def visual_fft_fig(fshift):
    fft_fig = torch.abs(20 * torch.log(fshift))
    plt.figure(figsize=(10, 10))
    plt.subplot(121)
    plt.imshow(fft_fig[0,:,:], cmap = 'gray')

================================================
FILE: models/geometry.py
================================================
# -*- coding: utf-8 -*-
# @Description: Geometric Prior Guided Feature Fusion & Probability Volume Geometry Embedding (Sec 3.1 in the paper).
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07

import math
import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F

from models.submodules import ConvBnReLU3D


class GeoFeatureFusion(nn.Module):
    def __init__(self, convolutional_layer_encoding="z", mask_type="basic", add_origin_feat_flag=True):
        super(GeoFeatureFusion, self).__init__()        

        self.convolutional_layer_encoding = convolutional_layer_encoding    # std / uv / z / xyz
        self.mask_type = mask_type  # basic / mean
        self.add_origin_feat_flag = add_origin_feat_flag    # True / False

        if self.convolutional_layer_encoding == "std":
            self.geoplanes = 0
        elif self.convolutional_layer_encoding == "uv":
            self.geoplanes = 2
        elif self.convolutional_layer_encoding == "z":
            self.geoplanes = 1
        elif self.convolutional_layer_encoding == "xyz":
            self.geoplanes = 3
            self.geofeature = GeometryFeature()

        # rgb encoder
        self.rgb_conv_init = convbnrelu(in_channels=4, out_channels=8, kernel_size=5, stride=1, padding=2)

        self.rgb_encoder_layer1 = BasicBlockGeo(inplanes=8, planes=16, stride=2, geoplanes=self.geoplanes)
        self.rgb_encoder_layer2 = BasicBlockGeo(inplanes=16, planes=32, stride=1, geoplanes=self.geoplanes)
        self.rgb_encoder_layer3 = BasicBlockGeo(inplanes=32, planes=64, stride=2, geoplanes=self.geoplanes)
        self.rgb_encoder_layer4 = BasicBlockGeo(inplanes=64, planes=128, stride=1, geoplanes=self.geoplanes)
        self.rgb_encoder_layer5 = BasicBlockGeo(inplanes=128, planes=256, stride=2, geoplanes=self.geoplanes)

        self.rgb_decoder_layer4 = deconvbnrelu(in_channels=256, out_channels=128, kernel_size=5, stride=2, padding=2, output_padding=1)
        self.rgb_decoder_layer2 = deconvbnrelu(in_channels=128, out_channels=32, kernel_size=5, stride=2, padding=2, output_padding=1)
        self.rgb_decoder_layer0 = deconvbnrelu(in_channels=32, out_channels=16, kernel_size=3, stride=1, padding=1, output_padding=0)
        self.rgb_decoder_layer= deconvbnrelu(in_channels=16, out_channels=8, kernel_size=5, stride=2, padding=2, output_padding=1)
        self.rgb_decoder_output = deconvbnrelu(in_channels=8, out_channels=2, kernel_size=3, stride=1, padding=1, output_padding=0)


        # depth encoder
        self.depth_conv_init = convbnrelu(in_channels=2, out_channels=8, kernel_size=5, stride=1, padding=2)

        self.depth_layer1 = BasicBlockGeo(inplanes=8, planes=16, stride=2, geoplanes=self.geoplanes)
        self.depth_layer2 = BasicBlockGeo(inplanes=16, planes=32, stride=1, geoplanes=self.geoplanes)
        self.depth_layer3 = BasicBlockGeo(inplanes=64, planes=64, stride=2, geoplanes=self.geoplanes)
        self.depth_layer4 = BasicBlockGeo(inplanes=64, planes=128, stride=1, geoplanes=self.geoplanes)
        self.depth_layer5 = BasicBlockGeo(inplanes=256, planes=256, stride=2, geoplanes=self.geoplanes)

        self.decoder_layer3 = deconvbnrelu(in_channels=256, out_channels=128, kernel_size=5, stride=2, padding=2, output_padding=1)
        self.decoder_layer4 = deconvbnrelu(in_channels=128, out_channels=64, kernel_size=3, stride=1, padding=1, output_padding=0)
        self.decoder_layer5 = deconvbnrelu(in_channels=64, out_channels=32, kernel_size=5, stride=2, padding=2, output_padding=1)
        self.decoder_layer6 = deconvbnrelu(in_channels=32, out_channels=16, kernel_size=3, stride=1, padding=1, output_padding=0)
        self.decoder_layer7 = deconvbnrelu(in_channels=16, out_channels=8, kernel_size=5, stride=2, padding=2, output_padding=1)


        # output
        self.rgbdepth_decoder_stage1 = deconvbnrelu(in_channels=32, out_channels=32, kernel_size=5, stride=2, padding=2, output_padding=1)
        self.rgbdepth_decoder_stage2 = deconvbnrelu(in_channels=16, out_channels=16, kernel_size=5, stride=2, padding=2, output_padding=1)
        self.rgbdepth_decoder_stage3 = deconvbnrelu(in_channels=8, out_channels=8, kernel_size=3, stride=1, padding=1, output_padding=0)

        self.final_decoder_stage1 = deconvbnrelu(in_channels=32, out_channels=32, kernel_size=3, stride=1, padding=1, output_padding=0)
        self.final_decoder_stage2 = deconvbnrelu(in_channels=16, out_channels=16, kernel_size=3, stride=1, padding=1, output_padding=0)
        self.final_decoder_stage3 = deconvbnrelu(in_channels=8, out_channels=8, kernel_size=3, stride=1, padding=1, output_padding=0)


        self.softmax = nn.Softmax(dim=1)
        self.pooling = nn.AvgPool2d(kernel_size=2)
        self.sparsepooling = SparseDownSampleClose(stride=2)

        weights_init(self)


    def forward(self, rgb, depth, confidence, depth_values, stage_idx, origin_feat, intrinsics_matrices_stage):

        rgb = rgb
        depth_min, depth_max = depth_values[:,0,None,None,None], depth_values[:,-1,None,None,None]
        d = (depth - depth_min) / (depth_max - depth_min)

        if self.mask_type == "basic":
            valid_mask = torch.where(d>0, torch.full_like(d, 1.0), torch.full_like(d, 0.0))
        elif self.mask_type == "mean":
            valid_mask = torch.where(torch.logical_and(d>0, confidence>confidence.mean()), torch.full_like(d, 1.0), torch.full_like(d, 0.0))


        # pre-data preparation
        if self.convolutional_layer_encoding in ["uv", "xyz"]:
            B, _, W, H = rgb.shape
            position = AddCoordsNp(H, W)
            position = position.call()
            position = torch.from_numpy(position).to(rgb.device).repeat(B, 1, 1, 1).transpose(-1, 1)
            unorm = position[:, 0:1, :, :]
            vnorm = position[:, 1:2, :, :]

            vnorm_s2 = self.pooling(vnorm)
            vnorm_s3 = self.pooling(vnorm_s2)
            vnorm_s4 = self.pooling(vnorm_s3)

            unorm_s2 = self.pooling(unorm)
            unorm_s3 = self.pooling(unorm_s2)
            unorm_s4 = self.pooling(unorm_s3)

        if self.convolutional_layer_encoding in ["z", "xyz"]:
            d_s2, vm_s2 = self.sparsepooling(d, valid_mask)
            d_s3, vm_s3 = self.sparsepooling(d_s2, vm_s2)
            d_s4, vm_s4 = self.sparsepooling(d_s3, vm_s3)
        
        if self.convolutional_layer_encoding == "xyz":
            K = intrinsics_matrices_stage
            f352 = K[:, 1, 1]
            f352 = f352.unsqueeze(1)
            f352 = f352.unsqueeze(2)
            f352 = f352.unsqueeze(3)
            c352 = K[:, 1, 2]
            c352 = c352.unsqueeze(1)
            c352 = c352.unsqueeze(2)
            c352 = c352.unsqueeze(3)
            f1216 = K[:, 0, 0]
            f1216 = f1216.unsqueeze(1)
            f1216 = f1216.unsqueeze(2)
            f1216 = f1216.unsqueeze(3)
            c1216 = K[:, 0, 2]
            c1216 = c1216.unsqueeze(1)
            c1216 = c1216.unsqueeze(2)
            c1216 = c1216.unsqueeze(3)


        # geometric info
        if self.convolutional_layer_encoding == "std":
            geo_s1 = None
            geo_s2 = None
            geo_s3 = None
            geo_s4 = None
        elif self.convolutional_layer_encoding == "uv":
            geo_s1 = torch.cat((vnorm, unorm), dim=1)
            geo_s2 = torch.cat((vnorm_s2, unorm_s2), dim=1)
            geo_s3 = torch.cat((vnorm_s3, unorm_s3), dim=1)
            geo_s4 = torch.cat((vnorm_s4, unorm_s4), dim=1)
        elif self.convolutional_layer_encoding == "z":
            geo_s1 = d
            geo_s2 = d_s2
            geo_s3 = d_s3
            geo_s4 = d_s4
        elif self.convolutional_layer_encoding == "xyz":
            geo_s1 = self.geofeature(d, vnorm, unorm, H, W, c352, c1216, f352, f1216)
            geo_s2 = self.geofeature(d_s2, vnorm_s2, unorm_s2, H / 2, W / 2, c352, c1216, f352, f1216)
            geo_s3 = self.geofeature(d_s3, vnorm_s3, unorm_s3, H / 4, W / 4, c352, c1216, f352, f1216)
            geo_s4 = self.geofeature(d_s4, vnorm_s4, unorm_s4, H / 8, W / 8, c352, c1216, f352, f1216)

        # -----------------------------------------------------------------------------------------

        # 128*160 -> 256*320 -> 512*640
        rgb_feature = self.rgb_conv_init(torch.cat((rgb, d), dim=1))            # b 8 h w
        rgb_feature1 = self.rgb_encoder_layer1(rgb_feature, geo_s1, geo_s2)     # b 16 h/2 w/2
        rgb_feature2 = self.rgb_encoder_layer2(rgb_feature1, geo_s2, geo_s2)    # b 32 h/2 w/2
        rgb_feature3 = self.rgb_encoder_layer3(rgb_feature2, geo_s2, geo_s3)    # b 64 h/4 w/4
        rgb_feature4 = self.rgb_encoder_layer4(rgb_feature3, geo_s3, geo_s3)    # b 128 h/4 w/4
        rgb_feature5 = self.rgb_encoder_layer5(rgb_feature4, geo_s3, geo_s4)    # b 256 h/8 w/8

        rgb_feature_decoder4 = self.rgb_decoder_layer4(rgb_feature5)
        rgb_feature4_plus = rgb_feature_decoder4 + rgb_feature4         # b 128 h/4 w/4

        rgb_feature_decoder2 = self.rgb_decoder_layer2(rgb_feature4_plus)
        rgb_feature2_plus = rgb_feature_decoder2 + rgb_feature2         # b 32 h/2 w/2

        rgb_feature_decoder0 = self.rgb_decoder_layer0(rgb_feature2_plus)
        rgb_feature0_plus = rgb_feature_decoder0 + rgb_feature1          # b 16 h/2 w/2

        rgb_feature_decoder = self.rgb_decoder_layer(rgb_feature0_plus)
        rgb_feature_plus = rgb_feature_decoder + rgb_feature            # b 8 h w

        rgb_output = self.rgb_decoder_output(rgb_feature_plus)          # b 2 h w

        rgb_depth = rgb_output[:, 0:1, :, :]
        rgb_conf = rgb_output[:, 1:2, :, :]

        # -----------------------------------------------------------------------------------------

        sparsed_feature = self.depth_conv_init(torch.cat((d, rgb_depth), dim=1))    # b 8 h w
        sparsed_feature1 = self.depth_layer1(sparsed_feature, geo_s1, geo_s2)       # b 16 h/2 w/2
        sparsed_feature2 = self.depth_layer2(sparsed_feature1, geo_s2, geo_s2)      # b 32 h/2 w/2

        sparsed_feature2_plus = torch.cat([rgb_feature2_plus, sparsed_feature2], 1)
        sparsed_feature3 = self.depth_layer3(sparsed_feature2_plus, geo_s2, geo_s3) # b 64 h/4 w/4
        sparsed_feature4 = self.depth_layer4(sparsed_feature3, geo_s3, geo_s3)      # b 128 h/4 w/4

        sparsed_feature4_plus = torch.cat([rgb_feature4_plus, sparsed_feature4], 1)
        sparsed_feature5 = self.depth_layer5(sparsed_feature4_plus, geo_s3, geo_s4) # b 256 h/8 w/8

        # -----------------------------------------------------------------------------------------

        fusion3 = rgb_feature5 + sparsed_feature5
        decoder_feature3 = self.decoder_layer3(fusion3) # b 128 h/4 w/4

        fusion4 = sparsed_feature4 + decoder_feature3
        decoder_feature4 = self.decoder_layer4(fusion4) # b 64 h/4 w/4

        if stage_idx >= 1: 
            decoder_feature5 = self.decoder_layer5(decoder_feature4)
            fusion5 = sparsed_feature2 + decoder_feature5   # b 32 h/2 w/2
            if stage_idx == 1:
                rgbdepth_feature = self.rgbdepth_decoder_stage1(fusion5)
                if self.add_origin_feat_flag:
                    final_feature = self.final_decoder_stage1(rgbdepth_feature + origin_feat)
                else:
                    final_feature = self.final_decoder_stage1(rgbdepth_feature)

        if stage_idx >= 2:
            decoder_feature6 = self.decoder_layer6(decoder_feature5)
            fusion6 = sparsed_feature1 + decoder_feature6   # b 16 h/2 w/2
            if stage_idx == 2:
                rgbdepth_feature = self.rgbdepth_decoder_stage2(fusion6)
                if self.add_origin_feat_flag:
                    final_feature = self.final_decoder_stage2(rgbdepth_feature + origin_feat)
                else:
                    final_feature = self.final_decoder_stage2(rgbdepth_feature)

        if stage_idx >= 3:
            decoder_feature7 = self.decoder_layer7(decoder_feature6)
            fusion7 = sparsed_feature + decoder_feature7    # b 8 h w
            if stage_idx == 3:
                rgbdepth_feature = self.rgbdepth_decoder_stage3(fusion7)
                if self.add_origin_feat_flag:
                    final_feature = self.final_decoder_stage3(rgbdepth_feature + origin_feat)
                else:
                    final_feature = self.final_decoder_stage3(rgbdepth_feature)


        return final_feature


class GeoRegNet2d(nn.Module):
    def __init__(self, input_channel=128, base_channel=32, convolutional_layer_encoding="std"):
        super(GeoRegNet2d, self).__init__()

        self.convolutional_layer_encoding = convolutional_layer_encoding    # std / uv / z / xyz
        self.mask_type = "basic"    # basic / mean

        if self.convolutional_layer_encoding == "std":
            self.geoplanes = 0
        elif self.convolutional_layer_encoding == "z":
            self.geoplanes = 1

        self.conv_init = ConvBnReLU3D(input_channel, out_channels=8, kernel_size=(1,3,3), pad=(0,1,1))
        self.encoder_layer1 = Reg_BasicBlockGeo(inplanes=8, planes=16, kernel_size=(1,3,3), stride=(1,2,2), padding=(0,1,1), geoplanes=self.geoplanes)
        self.encoder_layer2 = Reg_BasicBlockGeo(inplanes=16, planes=32,  kernel_size=(1,3,3), stride=1, padding=(0,1,1), geoplanes=self.geoplanes)
        self.encoder_layer3 = Reg_BasicBlockGeo(inplanes=32, planes=64, kernel_size=(1,3,3), stride=(1,2,2), padding=(0,1,1), geoplanes=self.geoplanes)
        self.encoder_layer4 = Reg_BasicBlockGeo(inplanes=64, planes=128,  kernel_size=(1,3,3), stride=1, padding=(0,1,1), geoplanes=self.geoplanes)
        self.encoder_layer5 = Reg_BasicBlockGeo(inplanes=128, planes=256, kernel_size=(1,3,3), stride=(1,2,2), padding=(0,1,1), geoplanes=self.geoplanes)

        self.decoder_layer4 = reg_deconvbnrelu(in_channels=256, out_channels=128, kernel_size=(1,5,5), stride=(1,2,2), padding=(0,2,2), output_padding=(0,1,1))
        self.decoder_layer3 = reg_deconvbnrelu(in_channels=128, out_channels=64, kernel_size=(1,3,3), stride=1, padding=(0,1,1), output_padding=0)
        self.decoder_layer2 = reg_deconvbnrelu(in_channels=64, out_channels=32, kernel_size=(1,5,5), stride=(1,2,2), padding=(0,2,2), output_padding=(0,1,1))
        self.decoder_layer1 = reg_deconvbnrelu(in_channels=32, out_channels=16, kernel_size=(1,3,3), stride=1, padding=(0,1,1), output_padding=0)
        self.decoder_layer = reg_deconvbnrelu(in_channels=16, out_channels=8, kernel_size=(1,5,5), stride=(1,2,2), padding=(0,2,2), output_padding=(0,1,1))

        self.prob = reg_deconvbnrelu(in_channels=8, out_channels=1, kernel_size=(1,3,3), stride=1, padding=(0,1,1), output_padding=0)

        self.depthpooling = nn.MaxPool3d((2,1,1),(2,1,1))
        self.basicpooling = nn.MaxPool3d((1,2,2), (1,2,2))

        weights_init(self)


    def forward(self, x, stage_idx, geo_reg_data=None):

        B, C, D, W, H = x.shape

        if stage_idx >= 1 and self.convolutional_layer_encoding == "z":
            prob_volume = geo_reg_data["prob_volume_last"].unsqueeze(1)  # B 1 D H W
        else:
            assert self.convolutional_layer_encoding == "std"


        # geometric info
        if self.convolutional_layer_encoding == "std":
            geo_s1 = None
            geo_s2 = None
            geo_s3 = None
            geo_s4 = None
        elif self.convolutional_layer_encoding == "z":
            if stage_idx == 2:
                geo_s1 = self.depthpooling(prob_volume)
            else:
                geo_s1 = prob_volume   # B 1 D H W
            geo_s2 = self.basicpooling(geo_s1)
            geo_s3 = self.basicpooling(geo_s2)
  
        feature = self.conv_init(x)     # B 8 D H W
        feature1 = self.encoder_layer1(feature, geo_s1, geo_s1)     # B  16 D H/2 W/2
        feature2 = self.encoder_layer2(feature1, geo_s2, geo_s2)    # B  32 D H/2 W/2
        feature3 = self.encoder_layer3(feature2, geo_s2, geo_s2)    # B  64 D H/4 W/4
        feature4 = self.encoder_layer4(feature3, geo_s3, geo_s3)    # B 128 D H/4 W/4
        feature5 = self.encoder_layer5(feature4, geo_s3, geo_s3)    # B 256 D H/8 W/8

        feature_decoder4 = self.decoder_layer4(feature5)
        feature4_plus = feature_decoder4 + feature4           # B 128 D H/4 W/4

        feature_decoder3 = self.decoder_layer3(feature4_plus)
        feature3_plus = feature_decoder3 + feature3           # B 64 D H/4 W/4

        feature_decoder2 = self.decoder_layer2(feature3_plus)
        feature2_plus = feature_decoder2 + feature2           # B 32 D H/2 W/2

        feature_decoder1 = self.decoder_layer1(feature2_plus)
        feature1_plus = feature_decoder1 + feature1           # B 16 D H/2 W/2

        feature_decoder = self.decoder_layer(feature1_plus)
        feature_plus = feature_decoder + feature            # B  8 D H W

        x = self.prob(feature_plus)

        return x.squeeze(1)


# --------------------------------------------------------------


class BasicBlockGeo(nn.Module):
    expansion = 1
    __constants__ = ['downsample']

    def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
                 base_width=64, dilation=1, norm_layer=None, geoplanes=3):
        super(BasicBlockGeo, self).__init__()

        if norm_layer is None:
            norm_layer = nn.BatchNorm2d

        if groups != 1 or base_width != 64:
            raise ValueError('BasicBlock only supports groups=1 and base_width=64')
        if dilation > 1:
            raise NotImplementedError("Dilation > 1 not supported in BasicBlock")

        self.conv1 = conv3x3(inplanes + geoplanes, planes, stride)
        self.bn1 = norm_layer(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes+geoplanes, planes)
        self.bn2 = norm_layer(planes)
        if stride != 1 or inplanes != planes:
            downsample = nn.Sequential(
                conv1x1(inplanes+geoplanes, planes, stride),
                norm_layer(planes),
            )
        self.downsample = downsample
        self.stride = stride

    def forward(self, x, g1=None, g2=None):
        identity = x
        if g1 is not None:
            x = torch.cat((x, g1), 1)
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        if g2 is not None:
            out = torch.cat((g2,out), 1)
        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out


class GeometryFeature(nn.Module):
    def __init__(self):
        super(GeometryFeature, self).__init__()

    def forward(self, z, vnorm, unorm, h, w, ch, cw, fh, fw):
        x = z*(0.5*h*(vnorm+1)-ch)/fh
        y = z*(0.5*w*(unorm+1)-cw)/fw
        return torch.cat((x, y, z),1)


class SparseDownSampleClose(nn.Module):
    def __init__(self, stride):
        super(SparseDownSampleClose, self).__init__()
        self.pooling = nn.MaxPool2d(stride, stride)
        self.large_number = 600
    def forward(self, d, mask):
        encode_d = - (1-mask)*self.large_number - d

        d = - self.pooling(encode_d)
        mask_result = self.pooling(mask)
        d_result = d - (1-mask_result)*self.large_number

        return d_result, mask_result


def convbnrelu(in_channels, out_channels, kernel_size=3,stride=1, padding=1):
    return nn.Sequential(
		nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding, bias=False),
		nn.BatchNorm2d(out_channels),
		nn.ReLU(inplace=True)
	)


def deconvbnrelu(in_channels, out_channels, kernel_size=5, stride=2, padding=2, output_padding=1):
    return nn.Sequential(
		nn.ConvTranspose2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding, output_padding=output_padding, bias=False),
		nn.BatchNorm2d(out_channels),
		nn.ReLU(inplace=True)
	)


def weights_init(m):
    """Initialize filters with Gaussian random weights"""
    if isinstance(m, nn.Conv2d):
        n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
        m.weight.data.normal_(0, math.sqrt(2. / n))
        if m.bias is not None:
            m.bias.data.zero_()
    elif isinstance(m, nn.ConvTranspose2d):
        n = m.kernel_size[0] * m.kernel_size[1] * m.in_channels
        m.weight.data.normal_(0, math.sqrt(2. / n))
        if m.bias is not None:
            m.bias.data.zero_()
    elif isinstance(m, nn.BatchNorm2d):
        m.weight.data.fill_(1)
        m.bias.data.zero_()


def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1, bias=False, padding=1):
    """3x3 convolution with padding"""
    if padding >= 1:
        padding = dilation
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=padding, groups=groups, bias=bias, dilation=dilation)


def conv1x1(in_planes, out_planes, stride=1, groups=1, bias=False):
    """1x1 convolution"""
    return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, groups=groups, bias=bias)


class AddCoordsNp():
	"""Add coords to a tensor"""
	def __init__(self, x_dim=64, y_dim=64, with_r=False):
		self.x_dim = x_dim
		self.y_dim = y_dim
		self.with_r = with_r

	def call(self):
		"""
		input_tensor: (batch, x_dim, y_dim, c)
		"""
		xx_ones = np.ones([self.x_dim], dtype=np.int32)
		xx_ones = np.expand_dims(xx_ones, 1)

		xx_range = np.expand_dims(np.arange(self.y_dim), 0)

		xx_channel = np.matmul(xx_ones, xx_range)
		xx_channel = np.expand_dims(xx_channel, -1)

		yy_ones = np.ones([self.y_dim], dtype=np.int32)
		yy_ones = np.expand_dims(yy_ones, 0)

		yy_range = np.expand_dims(np.arange(self.x_dim), 1)

		yy_channel = np.matmul(yy_range, yy_ones)
		yy_channel = np.expand_dims(yy_channel, -1)

		xx_channel = xx_channel.astype('float32') / (self.y_dim - 1)
		yy_channel = yy_channel.astype('float32') / (self.x_dim - 1)

		xx_channel = xx_channel*2 - 1
		yy_channel = yy_channel*2 - 1

		ret = np.concatenate([xx_channel, yy_channel], axis=-1)

		if self.with_r:
			rr = np.sqrt( np.square(xx_channel-0.5) + np.square(yy_channel-0.5))
			ret = np.concatenate([ret, rr], axis=-1)

		return ret


# --------------------------------------------------------------


class Reg_BasicBlockGeo(nn.Module):

    def __init__(self, inplanes, planes, kernel_size, stride, padding, downsample=None, groups=1,
                 base_width=64, dilation=1, norm_layer=nn.BatchNorm3d, geoplanes=3):
        super(Reg_BasicBlockGeo, self).__init__()

        self.conv1 = regconv3D(inplanes + geoplanes, planes, kernel_size=(1,3,3), stride=1, padding=(0,1,1))
        self.bn1 = norm_layer(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = regconv3D(planes+geoplanes, planes, kernel_size, stride, padding)
        self.bn2 = norm_layer(planes)
        if stride != 1 or inplanes != planes:
            downsample = nn.Sequential(
                regconv1x1(inplanes+geoplanes, planes, kernel_size, stride, padding),
                norm_layer(planes),
            )
        self.downsample = downsample
        self.stride = stride

    def forward(self, x, g1=None, g2=None):
        identity = x
        if g1 is not None:
            x = torch.cat((x, g1), 1)
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        if g2 is not None:
            out = torch.cat((g2,out), 1)
        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out


def regconv3D(in_planes, out_planes, kernel_size, stride, padding, groups=1, dilation=1, bias=False):
    return nn.Conv3d(in_planes, out_planes, kernel_size=kernel_size, stride=stride,
                     padding=padding, groups=groups, bias=bias, dilation=dilation)


def regconv1x1(in_planes, out_planes, kernel_size, stride, padding, groups=1, bias=False):
    return nn.Conv3d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=padding, groups=groups, bias=bias)


def reg_deconvbnrelu(in_channels, out_channels, kernel_size, stride, padding, output_padding):
    return nn.Sequential(
		nn.ConvTranspose3d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding, output_padding=output_padding, bias=False),
		nn.BatchNorm3d(out_channels),
		nn.ReLU(inplace=True)
	)

================================================
FILE: models/geomvsnet.py
================================================
# -*- coding: utf-8 -*-
# @Description: Main network architecture for GeoMVSNet.
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07

import math
import torch
import torch.nn as nn
import torch.nn.functional as F

from models.submodules import homo_warping, init_inverse_range, schedule_inverse_range, FPN, Reg2d
from models.geometry import GeoFeatureFusion, GeoRegNet2d
from models.filter import frequency_domain_filter


class GeoMVSNet(nn.Module):
    def __init__(self, levels, hypo_plane_num_stages, depth_interal_ratio_stages, 
                    feat_base_channel, reg_base_channel, group_cor_dim_stages):
        super(GeoMVSNet, self).__init__()
        
        self.levels = levels
        self.hypo_plane_num_stages = hypo_plane_num_stages
        self.depth_interal_ratio_stages = depth_interal_ratio_stages

        self.StageNet = StageNet()

        # feature settings
        self.FeatureNet = FPN(base_channels=feat_base_channel)
        self.coarest_separate_flag = True
        if self.coarest_separate_flag:
            self.CoarestFeatureNet = FPN(base_channels=feat_base_channel)
        self.GeoFeatureFusionNet = GeoFeatureFusion(
            convolutional_layer_encoding="z", mask_type="basic", add_origin_feat_flag=True)

        # cost regularization settings
        self.RegNet_stages = nn.ModuleList()
        self.group_cor_dim_stages = group_cor_dim_stages
        self.geo_reg_flag = True
        self.geo_reg_encodings = ['std', 'z', 'z', 'z']     # must use std in idx-0
        for stage_idx in range(self.levels):
            in_dim = group_cor_dim_stages[stage_idx]
            if self.geo_reg_flag:
                self.RegNet_stages.append(GeoRegNet2d(input_channel=in_dim, base_channel=reg_base_channel, convolutional_layer_encoding=self.geo_reg_encodings[stage_idx]))
            else:
                self.RegNet_stages.append(Reg2d(input_channel=in_dim, base_channel=reg_base_channel))

        # frequency domain filter settings
        self.curriculum_learning_rho_ratios = [9, 4, 2, 1]


    def forward(self, imgs, proj_matrices, intrinsics_matrices, depth_values, filename=None):
        
        features = []
        if self.coarest_separate_flag:
            coarsest_features = []
        for nview_idx in range(len(imgs)):
            img = imgs[nview_idx]
            features.append(self.FeatureNet(img))   # B C H W
            if self.coarest_separate_flag:
                coarsest_features.append(self.CoarestFeatureNet(img))
        
        # coarse-to-fine
        outputs = {}
        for stage_idx in range(self.levels):
            stage_name = "stage{}".format(stage_idx + 1)
            B, C, H, W = features[0][stage_name].shape
            proj_matrices_stage = proj_matrices[stage_name]
            intrinsics_matrices_stage = intrinsics_matrices[stage_name]

            # @Note features
            if stage_idx == 0:
                if self.coarest_separate_flag:
                    features_stage = [feat[stage_name] for feat in coarsest_features]
                else:
                    features_stage = [feat[stage_name] for feat in features]
            elif stage_idx >= 1:
                features_stage = [feat[stage_name] for feat in features]
                
                ref_img_stage = F.interpolate(imgs[0], size=None, scale_factor=1./2**(3-stage_idx), mode="bilinear", align_corners=False)
                depth_last = F.interpolate(depth_last.unsqueeze(1), size=None, scale_factor=2, mode="bilinear", align_corners=False)
                confidence_last = F.interpolate(confidence_last.unsqueeze(1), size=None, scale_factor=2, mode="bilinear", align_corners=False)
                
                # reference feature
                features_stage[0] = self.GeoFeatureFusionNet(
                    ref_img_stage, depth_last, confidence_last, depth_values, 
                    stage_idx, features_stage[0], intrinsics_matrices_stage
                )


            # @Note depth hypos
            if stage_idx == 0:
                depth_hypo = init_inverse_range(depth_values, self.hypo_plane_num_stages[stage_idx], img[0].device, img[0].dtype, H, W)
            else:
                inverse_min_depth, inverse_max_depth = outputs_stage['inverse_min_depth'].detach(), outputs_stage['inverse_max_depth'].detach()
                depth_hypo = schedule_inverse_range(inverse_min_depth, inverse_max_depth, self.hypo_plane_num_stages[stage_idx], H, W)  # B D H W


            # @Note cost regularization
            geo_reg_data = {}
            if self.geo_reg_flag:
                geo_reg_data['depth_values'] = depth_values
                if stage_idx >= 1 and self.geo_reg_encodings[stage_idx] == 'z':
                    prob_volume_last = F.interpolate(prob_volume_last, size=None, scale_factor=2, mode="bilinear", align_corners=False)
                    geo_reg_data["prob_volume_last"] = prob_volume_last

            outputs_stage = self.StageNet(
                stage_idx, features_stage, proj_matrices_stage, depth_hypo=depth_hypo, 
                regnet=self.RegNet_stages[stage_idx], group_cor_dim=self.group_cor_dim_stages[stage_idx], 
                depth_interal_ratio=self.depth_interal_ratio_stages[stage_idx], 
                geo_reg_data=geo_reg_data
            )


            # @Note frequency domain filter
            depth_est = outputs_stage['depth']
            depth_est_filtered = frequency_domain_filter(depth_est, rho_ratio=self.curriculum_learning_rho_ratios[stage_idx])
            outputs_stage['depth_filtered'] = depth_est_filtered
            depth_last = depth_est_filtered


            confidence_last = outputs_stage['photometric_confidence']
            prob_volume_last = outputs_stage['prob_volume']

            outputs[stage_name] = outputs_stage
            outputs.update(outputs_stage)

        return outputs


class StageNet(nn.Module):
    def __init__(self, attn_temp=2):
        super(StageNet, self).__init__()
        self.attn_temp = attn_temp

    def forward(self, stage_idx, features, proj_matrices, depth_hypo, regnet, 
                    group_cor_dim, depth_interal_ratio, geo_reg_data=None):

        # @Note step1: feature extraction
        proj_matrices = torch.unbind(proj_matrices, 1)
        ref_feature, src_features = features[0], features[1:]
        ref_proj, src_projs = proj_matrices[0], proj_matrices[1:]
        B, D, H, W = depth_hypo.shape
        C = ref_feature.shape[1]


        # @Note step2: cost aggregation
        ref_volume =  ref_feature.unsqueeze(2).repeat(1, 1, D, 1, 1)
        cor_weight_sum = 1e-8
        cor_feats = 0
        for src_idx, (src_fea, src_proj) in enumerate(zip(src_features, src_projs)):
            save_fn = None
            src_proj_new = src_proj[:, 0].clone()
            src_proj_new[:, :3, :4] = torch.matmul(src_proj[:, 1, :3, :3], src_proj[:, 0, :3, :4])
            ref_proj_new = ref_proj[:, 0].clone()
            ref_proj_new[:, :3, :4] = torch.matmul(ref_proj[:, 1, :3, :3], ref_proj[:, 0, :3, :4])
            warped_src = homo_warping(src_fea, src_proj_new, ref_proj_new, depth_hypo)  # B C D H W

            warped_src = warped_src.reshape(B, group_cor_dim, C//group_cor_dim, D, H, W)
            ref_volume = ref_volume.reshape(B, group_cor_dim, C//group_cor_dim, D, H, W)
            cor_feat = (warped_src * ref_volume).mean(2)  # B G D H W
            del warped_src, src_proj, src_fea

            cor_weight = torch.softmax(cor_feat.sum(1) / self.attn_temp, 1) / math.sqrt(C)  # B D H W
            cor_weight_sum += cor_weight  # B D H W
            cor_feats += cor_weight.unsqueeze(1) * cor_feat  # B C D H W
            del cor_weight, cor_feat

        cost_volume = cor_feats / cor_weight_sum.unsqueeze(1)  # B C D H W
        del cor_weight_sum, src_features
        
    
        # @Note step3: cost regularization
        if geo_reg_data == {}:
            # basic
            cost_reg = regnet(cost_volume)
        else:
            # probability volume geometry embedding
            cost_reg = regnet(cost_volume, stage_idx, geo_reg_data)
        del cost_volume
        prob_volume = F.softmax(cost_reg, dim=1)  # B D H W


        # @Note step4: depth regression
        prob_max_indices = prob_volume.max(1, keepdim=True)[1]  # B 1 H W
        depth = torch.gather(depth_hypo, 1, prob_max_indices).squeeze(1)  # B H W

        with torch.no_grad():
            photometric_confidence = prob_volume.max(1)[0]  # B H W
            photometric_confidence = F.interpolate(photometric_confidence.unsqueeze(1), scale_factor=1, mode='bilinear', align_corners=True).squeeze(1)
        
        last_depth_itv = 1./depth_hypo[:,2,:,:] - 1./depth_hypo[:,1,:,:]
        inverse_min_depth = 1/depth + depth_interal_ratio * last_depth_itv  # B H W
        inverse_max_depth = 1/depth - depth_interal_ratio * last_depth_itv  # B H W


        output_stage = {
            "depth": depth,  
            "photometric_confidence": photometric_confidence, 
            "depth_hypo": depth_hypo, 
            "prob_volume": prob_volume,
            "inverse_min_depth": inverse_min_depth, 
            "inverse_max_depth": inverse_max_depth,
        }
        return output_stage

================================================
FILE: models/loss.py
================================================
# -*- coding: utf-8 -*-
# @Description: Loss Functions (Sec 3.4 in the paper).
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07

import torch


def geomvsnet_loss(inputs, depth_gt_ms, mask_ms, **kwargs):

    stage_lw = kwargs.get("stage_lw", [1, 1, 1, 1])
    depth_values = kwargs.get("depth_values")
    depth_min, depth_max = depth_values[:,0], depth_values[:,-1]
    
    total_loss = torch.tensor(0.0, dtype=torch.float32, device=mask_ms["stage1"].device, requires_grad=False)
    pw_loss_stages = []
    dds_loss_stages = []
    for stage_idx, (stage_inputs, stage_key) in enumerate([(inputs[k], k) for k in inputs.keys() if "stage" in k]):
        
        depth = stage_inputs['depth_filtered']
        prob_volume = stage_inputs['prob_volume']
        depth_value = stage_inputs['depth_hypo']

        depth_gt = depth_gt_ms[stage_key]
        mask = mask_ms[stage_key] > 0.5


        # pw loss
        pw_loss = pixel_wise_loss(prob_volume, depth_gt, mask, depth_value)
        pw_loss_stages.append(pw_loss)

        # dds loss
        dds_loss = depth_distribution_similarity_loss(depth, depth_gt, mask, depth_min, depth_max)
        dds_loss_stages.append(dds_loss)
        
        # total loss
        lam1, lam2 = 0.8, 0.2
        total_loss = total_loss + stage_lw[stage_idx] * (lam1 * pw_loss + lam2 * dds_loss)
  

    depth_pred = stage_inputs['depth']
    depth_gt = depth_gt_ms[stage_key]
    epe = cal_metrics(depth_pred, depth_gt, mask, depth_min, depth_max)
    
    return total_loss, epe, pw_loss_stages, dds_loss_stages


def pixel_wise_loss(prob_volume, depth_gt, mask, depth_value):
    mask_true = mask
    valid_pixel_num = torch.sum(mask_true, dim=[1,2])+1e-12

    shape = depth_gt.shape

    depth_num = depth_value.shape[1]
    depth_value_mat = depth_value

    gt_index_image = torch.argmin(torch.abs(depth_value_mat-depth_gt.unsqueeze(1)), dim=1)

    gt_index_image = torch.mul(mask_true, gt_index_image.type(torch.float))
    gt_index_image = torch.round(gt_index_image).type(torch.long).unsqueeze(1)

    gt_index_volume = torch.zeros(shape[0], depth_num, shape[1], shape[2]).type(mask_true.type()).scatter_(1, gt_index_image, 1)
    cross_entropy_image = -torch.sum(gt_index_volume * torch.log(prob_volume+1e-12), dim=1).squeeze(1)
    masked_cross_entropy_image = torch.mul(mask_true, cross_entropy_image)
    masked_cross_entropy = torch.sum(masked_cross_entropy_image, dim=[1, 2])

    masked_cross_entropy = torch.mean(masked_cross_entropy / valid_pixel_num)
    
    pw_loss = masked_cross_entropy
    return pw_loss


def depth_distribution_similarity_loss(depth, depth_gt, mask, depth_min, depth_max):
    depth_norm = depth * 128 / (depth_max - depth_min)[:,None,None]
    depth_gt_norm = depth_gt * 128 / (depth_max - depth_min)[:,None,None]

    M_bins = 48
    kl_min = torch.min(torch.min(depth_gt), depth.mean()-3.*depth.std())
    kl_max = torch.max(torch.max(depth_gt), depth.mean()+3.*depth.std())
    bins = torch.linspace(kl_min, kl_max, steps=M_bins)

    kl_divs = []
    for i in range(len(bins) - 1):
        bin_mask = (depth_gt >= bins[i]) & (depth_gt < bins[i+1])
        merged_mask = mask & bin_mask 

        if merged_mask.sum() > 0:
            p = depth_norm[merged_mask]
            q = depth_gt_norm[merged_mask]
            kl_div = torch.nn.functional.kl_div(torch.log(p)-torch.log(q), p, reduction='batchmean')
            kl_div = torch.log(kl_div)
            kl_divs.append(kl_div)

    dds_loss = sum(kl_divs)
    return dds_loss


def cal_metrics(depth_pred, depth_gt, mask, depth_min, depth_max):
    depth_pred_norm = depth_pred * 128 / (depth_max - depth_min)[:,None,None]
    depth_gt_norm = depth_gt * 128 / (depth_max - depth_min)[:,None,None]

    abs_err = torch.abs(depth_pred_norm[mask] - depth_gt_norm[mask])
    epe = abs_err.mean()
    err1= (abs_err<=1).float().mean()*100
    err3 = (abs_err<=3).float().mean()*100
    
    return epe  # err1, err3

================================================
FILE: models/submodules.py
================================================
# -*- coding: utf-8 -*-
# @Description: Some sub-modules for the network.
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07

import torch
import torch.nn as nn
import torch.nn.functional as F


class FPN(nn.Module):
    """FPN aligncorners downsample 4x"""
    def __init__(self, base_channels, gn=False):
        super(FPN, self).__init__()
        self.base_channels = base_channels

        self.conv0 = nn.Sequential(
            Conv2d(3, base_channels, 3, 1, padding=1, gn=gn),
            Conv2d(base_channels, base_channels, 3, 1, padding=1, gn=gn),
        )

        self.conv1 = nn.Sequential(
            Conv2d(base_channels, base_channels * 2, 5, stride=2, padding=2, gn=gn),
            Conv2d(base_channels * 2, base_channels * 2, 3, 1, padding=1, gn=gn),
            Conv2d(base_channels * 2, base_channels * 2, 3, 1, padding=1, gn=gn),
        )

        self.conv2 = nn.Sequential(
            Conv2d(base_channels * 2, base_channels * 4, 5, stride=2, padding=2, gn=gn),
            Conv2d(base_channels * 4, base_channels * 4, 3, 1, padding=1, gn=gn),
            Conv2d(base_channels * 4, base_channels * 4, 3, 1, padding=1, gn=gn),
        )

        self.conv3 = nn.Sequential(
            Conv2d(base_channels * 4, base_channels * 8, 5, stride=2, padding=2, gn=gn),
            Conv2d(base_channels * 8, base_channels * 8, 3, 1, padding=1, gn=gn),
            Conv2d(base_channels * 8, base_channels * 8, 3, 1, padding=1, gn=gn),
        )

        self.out_channels = [8 * base_channels]
        final_chs = base_channels * 8

        self.inner1 = nn.Conv2d(base_channels * 4, final_chs, 1, bias=True)
        self.inner2 = nn.Conv2d(base_channels * 2, final_chs, 1, bias=True)
        self.inner3 = nn.Conv2d(base_channels * 1, final_chs, 1, bias=True)

        self.out1 = nn.Conv2d(final_chs, base_channels * 8, 1, bias=False)
        self.out2 = nn.Conv2d(final_chs, base_channels * 4, 3, padding=1, bias=False)
        self.out3 = nn.Conv2d(final_chs, base_channels * 2, 3, padding=1, bias=False)
        self.out4 = nn.Conv2d(final_chs, base_channels, 3, padding=1, bias=False)

        self.out_channels.append(base_channels * 4)
        self.out_channels.append(base_channels * 2)
        self.out_channels.append(base_channels)

    def forward(self, x):
        conv0 = self.conv0(x)
        conv1 = self.conv1(conv0)
        conv2 = self.conv2(conv1)
        conv3 = self.conv3(conv2)

        intra_feat = conv3
        outputs = {}
        out1 = self.out1(intra_feat)

        intra_feat = F.interpolate(intra_feat, scale_factor=2, mode="bilinear", align_corners=True) + self.inner1(conv2)
        out2 = self.out2(intra_feat)

        intra_feat = F.interpolate(intra_feat, scale_factor=2, mode="bilinear", align_corners=True) + self.inner2(conv1)
        out3 = self.out3(intra_feat)

        intra_feat = F.interpolate(intra_feat, scale_factor=2, mode="bilinear", align_corners=True) + self.inner3(conv0)
        out4 = self.out4(intra_feat)

        outputs["stage1"] = out1
        outputs["stage2"] = out2
        outputs["stage3"] = out3
        outputs["stage4"] = out4

        return outputs


class Reg2d(nn.Module):
    def __init__(self, input_channel=128, base_channel=32):
        super(Reg2d, self).__init__()
        
        self.conv0 = ConvBnReLU3D(input_channel, base_channel, kernel_size=(1,3,3), pad=(0,1,1))
        self.conv1 = ConvBnReLU3D(base_channel, base_channel*2, kernel_size=(1,3,3), stride=(1,2,2), pad=(0,1,1))
        self.conv2 = ConvBnReLU3D(base_channel*2, base_channel*2)

        self.conv3 = ConvBnReLU3D(base_channel*2, base_channel*4, kernel_size=(1,3,3), stride=(1,2,2), pad=(0,1,1))
        self.conv4 = ConvBnReLU3D(base_channel*4, base_channel*4)

        self.conv5 = ConvBnReLU3D(base_channel*4, base_channel*8, kernel_size=(1,3,3), stride=(1,2,2), pad=(0,1,1))
        self.conv6 = ConvBnReLU3D(base_channel*8, base_channel*8)

        self.conv7 = nn.Sequential(
            nn.ConvTranspose3d(base_channel*8, base_channel*4, kernel_size=(1,3,3), padding=(0,1,1), output_padding=(0,1,1), stride=(1,2,2), bias=False),
            nn.BatchNorm3d(base_channel*4),
            nn.ReLU(inplace=True))

        self.conv9 = nn.Sequential(
            nn.ConvTranspose3d(base_channel*4, base_channel*2, kernel_size=(1,3,3), padding=(0,1,1), output_padding=(0,1,1), stride=(1,2,2), bias=False),
            nn.BatchNorm3d(base_channel*2),
            nn.ReLU(inplace=True))

        self.conv11 = nn.Sequential(
            nn.ConvTranspose3d(base_channel*2, base_channel, kernel_size=(1,3,3), padding=(0,1,1), output_padding=(0,1,1), stride=(1,2,2), bias=False),
            nn.BatchNorm3d(base_channel),
            nn.ReLU(inplace=True))

        self.prob = nn.Conv3d(8, 1, 1, stride=1, padding=0)

    def forward(self, x):
        conv0 = self.conv0(x)
        conv2 = self.conv2(self.conv1(conv0))
        conv4 = self.conv4(self.conv3(conv2))
        x = self.conv6(self.conv5(conv4))
        x = conv4 + self.conv7(x)
        x = conv2 + self.conv9(x)
        x = conv0 + self.conv11(x)
        x = self.prob(x)

        return x.squeeze(1)


def homo_warping(src_fea, src_proj, ref_proj, depth_values):
    # src_fea: [B, C, H, W]
    # src_proj: [B, 4, 4]
    # ref_proj: [B, 4, 4]
    # depth_values: [B, Ndepth] o [B, Ndepth, H, W]
    # out: [B, C, Ndepth, H, W]
    C = src_fea.shape[1]
    Hs,Ws = src_fea.shape[-2:]
    B,num_depth,Hr,Wr = depth_values.shape

    with torch.no_grad():
        proj = torch.matmul(src_proj, torch.inverse(ref_proj))
        rot = proj[:, :3, :3]  # [B,3,3]
        trans = proj[:, :3, 3:4]  # [B,3,1]

        y, x = torch.meshgrid([torch.arange(0, Hr, dtype=torch.float32, device=src_fea.device),
                               torch.arange(0, Wr, dtype=torch.float32, device=src_fea.device)])
        y = y.reshape(Hr*Wr)
        x = x.reshape(Hr*Wr)
        xyz = torch.stack((x, y, torch.ones_like(x)))  # [3, H*W]
        xyz = torch.unsqueeze(xyz, 0).repeat(B, 1, 1)  # [B, 3, H*W]
        rot_xyz = torch.matmul(rot, xyz)  # [B, 3, H*W]
        rot_depth_xyz = rot_xyz.unsqueeze(2).repeat(1, 1, num_depth, 1) * depth_values.reshape(B, 1, num_depth, -1)  # [B, 3, Ndepth, H*W]
        proj_xyz = rot_depth_xyz + trans.reshape(B, 3, 1, 1)  # [B, 3, Ndepth, H*W]
        # FIXME divide 0
        temp = proj_xyz[:, 2:3, :, :]
        temp[temp==0] = 1e-9
        proj_xy = proj_xyz[:, :2, :, :] / temp  # [B, 2, Ndepth, H*W]
        # proj_xy = proj_xyz[:, :2, :, :] / proj_xyz[:, 2:3, :, :]  # [B, 2, Ndepth, H*W]

        proj_x_normalized = proj_xy[:, 0, :, :] / ((Ws - 1) / 2) - 1
        proj_y_normalized = proj_xy[:, 1, :, :] / ((Hs - 1) / 2) - 1
        proj_xy = torch.stack((proj_x_normalized, proj_y_normalized), dim=3)  # [B, Ndepth, H*W, 2]
        grid = proj_xy
    if len(src_fea.shape)==4:
        warped_src_fea = F.grid_sample(src_fea, grid.reshape(B, num_depth * Hr, Wr, 2), mode='bilinear', padding_mode='zeros', align_corners=True)
        warped_src_fea = warped_src_fea.reshape(B, C, num_depth, Hr, Wr)
    elif len(src_fea.shape)==5:
        warped_src_fea = []
        for d in range(src_fea.shape[2]):
            warped_src_fea.append(F.grid_sample(src_fea[:,:,d], grid.reshape(B, num_depth, Hr, Wr, 2)[:,d], mode='bilinear', padding_mode='zeros', align_corners=True))
        warped_src_fea = torch.stack(warped_src_fea, dim=2)

    return warped_src_fea


def init_inverse_range(cur_depth, ndepths, device, dtype, H, W):
    inverse_depth_min = 1. / cur_depth[:, 0]  # (B,)
    inverse_depth_max = 1. / cur_depth[:, -1]
    itv = torch.arange(0, ndepths, device=device, dtype=dtype, requires_grad=False).reshape(1, -1,1,1).repeat(1, 1, H, W)  / (ndepths - 1)  # 1 D H W
    inverse_depth_hypo = inverse_depth_max[:,None, None, None] + (inverse_depth_min - inverse_depth_max)[:,None, None, None] * itv

    return 1./inverse_depth_hypo


def schedule_inverse_range(inverse_min_depth, inverse_max_depth, ndepths, H, W):
    # cur_depth_min, (B, H, W)
    # cur_depth_max: (B, H, W)
    itv = torch.arange(0, ndepths, device=inverse_min_depth.device, dtype=inverse_min_depth.dtype, requires_grad=False).reshape(1, -1,1,1).repeat(1, 1, H//2, W//2)  / (ndepths - 1)  # 1 D H W

    inverse_depth_hypo = inverse_max_depth[:,None, :, :] + (inverse_min_depth - inverse_max_depth)[:,None, :, :] * itv  # B D H W
    inverse_depth_hypo = F.interpolate(inverse_depth_hypo.unsqueeze(1), [ndepths, H, W], mode='trilinear', align_corners=True).squeeze(1)
    return 1./inverse_depth_hypo


# --------------------------------------------------------------


def init_bn(module):
    if module.weight is not None:
        nn.init.ones_(module.weight)
    if module.bias is not None:
        nn.init.zeros_(module.bias)
    return


def init_uniform(module, init_method):
    if module.weight is not None:
        if init_method == "kaiming":
            nn.init.kaiming_uniform_(module.weight)
        elif init_method == "xavier":
            nn.init.xavier_uniform_(module.weight)
    return


class ConvBnReLU3D(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, pad=1):
        super(ConvBnReLU3D, self).__init__()
        self.conv = nn.Conv3d(in_channels, out_channels, kernel_size, stride=stride, padding=pad, bias=False)
        self.bn = nn.BatchNorm3d(out_channels)

    def forward(self, x):
        return F.relu(self.bn(self.conv(x)), inplace=True)


class Conv2d(nn.Module):

    def __init__(self, in_channels, out_channels, kernel_size, stride=1,
                 relu=True, bn_momentum=0.1, init_method="xavier", gn=False, group_channel=8, **kwargs):
        super(Conv2d, self).__init__()
        bn = not gn
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride=stride,
                              bias=(not bn), **kwargs)
        self.kernel_size = kernel_size
        self.stride = stride
        self.bn = nn.BatchNorm2d(out_channels, momentum=bn_momentum) if bn else None
        self.gn = nn.GroupNorm(int(max(1, out_channels / group_channel)), out_channels) if gn else None
        self.relu = relu

    def forward(self, x):
        x = self.conv(x)
        if self.bn is not None:
            x = self.bn(x)
        else:
            x = self.gn(x)
        if self.relu:
            x = F.relu(x, inplace=True)
        return x

    def init_weights(self, init_method):
        init_uniform(self.conv, init_method)
        if self.bn is not None:
            init_bn(self.bn)

================================================
FILE: models/utils/__init__.py
================================================
from models.utils.utils import *

================================================
FILE: models/utils/opts.py
================================================
# -*- coding: utf-8 -*-
# @Description: Options settings & configurations for GeoMVSNet.
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07

import argparse

def get_opts():
    parser = argparse.ArgumentParser(description="args")

    # global settings
    parser.add_argument('--mode', default='train', help='train or test', choices=['train', 'test', 'val'])
    parser.add_argument('--which_dataset', default='dtu', choices=['dtu', 'tnt', 'blendedmvs'], help='which dataset for using')

    parser.add_argument('--n_views', type=int, default=5, help='num of view')
    parser.add_argument('--levels', type=int, default=4, help='num of stages')
    parser.add_argument('--hypo_plane_num_stages', type=str, default="8,8,4,4", help='num of hypothesis planes for each stage')
    parser.add_argument('--depth_interal_ratio_stages', type=str, default="0.5,0.5,0.5,1", help='depth interals for each stage')
    parser.add_argument("--feat_base_channel", type=int, default=8, help='channel num for base feature')
    parser.add_argument("--reg_base_channel", type=int, default=8, help='channel num for regularization')
    parser.add_argument('--group_cor_dim_stages', type=str, default="8,8,4,4", help='group correlation dim')

    parser.add_argument('--batch_size', type=int, default=1, help='batch size for training')
    parser.add_argument('--data_scale', type=str, choices=['mid', 'raw'], help='use mid or raw resolution')
    parser.add_argument('--trainpath', help='data path for training')
    parser.add_argument('--testpath', help='data path for testing')
    parser.add_argument('--trainlist', help='data list for training')
    parser.add_argument('--testlist', help='data list for testing')


    # training config
    parser.add_argument('--stage_lw', type=str, default="1,1,1,1", help='loss weight for different stages')

    parser.add_argument('--epochs', type=int, default=10, help='number of epochs to train')
    parser.add_argument('--lr_scheduler', type=str, default='MS', help='scheduler for learning rate')
    parser.add_argument('--lr', type=float, default=0.001, help='learning rate')
    parser.add_argument('--lrepochs', type=str, default="1,3,5,7,9,11,13,15:1.5", help='epoch ids to downscale lr and the downscale rate')
    parser.add_argument('--wd', type=float, default=0.0, help='weight decay')

    parser.add_argument('--summary_freq', type=int, default=100, help='print and summary frequency')
    parser.add_argument('--save_freq', type=int, default=1, help='save checkpoint frequency')
    parser.add_argument('--eval_freq', type=int, default=1, help='eval frequency')

    parser.add_argument('--robust_train', action='store_true',help='robust training')

    
    # testing config
    parser.add_argument('--split', type=str, choices=['intermediate', 'advanced'], help='intermediate|advanced for tanksandtemples')
    parser.add_argument('--img_mode', type=str, default='resize', choices=['resize', 'crop'], help='image resolution matching strategy for TNT dataset')
    parser.add_argument('--cam_mode', type=str, default='origin', choices=['origin', 'short_range'], help='camera parameter strategy for TNT dataset')

    parser.add_argument('--loadckpt', default=None, help='load a specific checkpoint')
    parser.add_argument('--logdir', default='./checkpoints/debug', help='the directory to save checkpoints/logs')
    parser.add_argument('--nolog', action='store_true', help='do not log into .log file')
    parser.add_argument('--notensorboard', action='store_true', help='do not log into tensorboard')
    parser.add_argument('--save_conf_all_stages', action='store_true', help='save confidence maps for all stages')
    parser.add_argument('--outdir', default='./outputs', help='output dir')
    parser.add_argument('--resume', action='store_true', help='continue to train the model')


    # pytorch config
    parser.add_argument('--device', default='cuda', help='device to use')
    parser.add_argument('--seed', type=int, default=1, metavar='S', help='random seed')
    parser.add_argument('--pin_m', action='store_true', help='data loader pin memory')
    parser.add_argument("--local_rank", type=int, default=0)

    return parser.parse_args()

================================================
FILE: models/utils/utils.py
================================================
# -*- coding: utf-8 -*-
# @Description: Some useful utils.
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07

import random
import numpy as np

import torch
import torchvision.utils as vutils


# torch.no_grad warpper for functions
def make_nograd_func(func):
    def wrapper(*f_args, **f_kwargs):
        with torch.no_grad():
            ret = func(*f_args, **f_kwargs)
        return ret

    return wrapper


# convert a function into recursive style to handle nested dict/list/tuple variables
def make_recursive_func(func):
    def wrapper(vars):
        if isinstance(vars, list):
            return [wrapper(x) for x in vars]
        elif isinstance(vars, tuple):
            return tuple([wrapper(x) for x in vars])
        elif isinstance(vars, dict):
            return {k: wrapper(v) for k, v in vars.items()}
        else:
            return func(vars)

    return wrapper


@make_recursive_func
def tensor2float(vars):
    if isinstance(vars, float):
        return vars
    elif isinstance(vars, torch.Tensor):
        return vars.data.item()
    else:
        raise NotImplementedError("invalid input type {} for tensor2float".format(type(vars)))


@make_recursive_func
def tensor2numpy(vars):
    if isinstance(vars, np.ndarray):
        return vars
    elif isinstance(vars, torch.Tensor):
        return vars.detach().cpu().numpy().copy()
    else:
        raise NotImplementedError("invalid input type {} for tensor2numpy".format(type(vars)))


@make_recursive_func
def tocuda(vars):
    if isinstance(vars, torch.Tensor):
        return vars.to(torch.device("cuda"))
    elif isinstance(vars, str):
        return vars
    else:
        raise NotImplementedError("invalid input type {} for tensor2numpy".format(type(vars)))


def tb_save_scalars(logger, mode, scalar_dict, global_step):
    scalar_dict = tensor2float(scalar_dict)
    for key, value in scalar_dict.items():
        if not isinstance(value, (list, tuple)):
            name = '{}/{}'.format(mode, key)
            logger.add_scalar(name, value, global_step)
        else:
            for idx in range(len(value)):
                name = '{}/{}_{}'.format(mode, key, idx)
                logger.add_scalar(name, value[idx], global_step)


def tb_save_images(logger, mode, images_dict, global_step):
    images_dict = tensor2numpy(images_dict)

    def preprocess(name, img):
        if not (len(img.shape) == 3 or len(img.shape) == 4):
            raise NotImplementedError("invalid img shape {}:{} in save_images".format(name, img.shape))
        if len(img.shape) == 3:
            img = img[:, np.newaxis, :, :]
        img = torch.from_numpy(img[:1])
        return vutils.make_grid(img, padding=0, nrow=1, normalize=True, scale_each=True)

    for key, value in images_dict.items():
        if not isinstance(value, (list, tuple)):
            name = '{}/{}'.format(mode, key)
            logger.add_image(name, preprocess(name, value), global_step)
        else:
            for idx in range(len(value)):
                name = '{}/{}_{}'.format(mode, key, idx)
                logger.add_image(name, preprocess(name, value[idx]), global_step)


class DictAverageMeter(object):
    def __init__(self):
        self.data = {}
        self.count = 0

    def update(self, new_input):
        self.count += 1
        if len(self.data) == 0:
            for k, v in new_input.items():
                if not isinstance(v, float):
                    raise NotImplementedError("invalid data {}: {}".format(k, type(v)))
                self.data[k] = v
        else:
            for k, v in new_input.items():
                if not isinstance(v, float):
                    raise NotImplementedError("invalid data {}: {}".format(k, type(v)))
                self.data[k] += v

    def mean(self):
        return {k: v / self.count for k, v in self.data.items()}


# a wrapper to compute metrics for each image individually
def compute_metrics_for_each_image(metric_func):
    def wrapper(depth_est, depth_gt, mask, *args):
        batch_size = depth_gt.shape[0]
        results = []
        # compute result one by one
        for idx in range(batch_size):
            ret = metric_func(depth_est[idx], depth_gt[idx], mask[idx], *args)
            results.append(ret)
        return torch.stack(results).mean()

    return wrapper


@make_nograd_func
@compute_metrics_for_each_image
def Thres_metrics(depth_est, depth_gt, mask, thres):
    assert isinstance(thres, (int, float))
    depth_est, depth_gt = depth_est[mask], depth_gt[mask]
    errors = torch.abs(depth_est - depth_gt)
    err_mask = errors > thres
    return torch.mean(err_mask.float())


# NOTE: please do not use this to build up training loss
@make_nograd_func
@compute_metrics_for_each_image
def AbsDepthError_metrics(depth_est, depth_gt, mask, thres=None):
    depth_est, depth_gt = depth_est[mask], depth_gt[mask]
    error = (depth_est - depth_gt).abs()
    if thres is not None:
        error = error[(error >= float(thres[0])) & (error <= float(thres[1]))]
        if error.shape[0] == 0:
            return torch.tensor(0, device=error.device, dtype=error.dtype)
    return torch.mean(error)


import torch.distributed as dist
def synchronize():
    """
    Helper function to synchronize (barrier) among all processes when
    using distributed training
    """
    if not dist.is_available():
        return
    if not dist.is_initialized():
        return
    world_size = dist.get_world_size()
    if world_size == 1:
        return
    dist.barrier()


def get_world_size():
    if not dist.is_available():
        return 1
    if not dist.is_initialized():
        return 1
    return dist.get_world_size()


def reduce_scalar_outputs(scalar_outputs):
    world_size = get_world_size()
    if world_size < 2:
        return scalar_outputs
    with torch.no_grad():
        names = []
        scalars = []
        for k in sorted(scalar_outputs.keys()):
            names.append(k)
            scalars.append(scalar_outputs[k])
        scalars = torch.stack(scalars, dim=0)
        dist.reduce(scalars, dst=0)
        if dist.get_rank() == 0:
            # only main process gets accumulated, so only divide by
            # world_size in this case
            scalars /= world_size
        reduced_scalars = {k: v for k, v in zip(names, scalars)}

    return reduced_scalars


import torch
from bisect import bisect_right
class WarmupMultiStepLR(torch.optim.lr_scheduler._LRScheduler):
    def __init__(
        self,
        optimizer,
        milestones,
        gamma=0.1,
        warmup_factor=1.0 / 3,
        warmup_iters=500,
        warmup_method="linear",
        last_epoch=-1,
    ):
        if not list(milestones) == sorted(milestones):
            raise ValueError(
                "Milestones should be a list of" " increasing integers. Got {}",
                milestones,
            )

        if warmup_method not in ("constant", "linear"):
            raise ValueError(
                "Only 'constant' or 'linear' warmup_method accepted"
                "got {}".format(warmup_method)
            )
        self.milestones = milestones
        self.gamma = gamma
        self.warmup_factor = warmup_factor
        self.warmup_iters = warmup_iters
        self.warmup_method = warmup_method
        super(WarmupMultiStepLR, self).__init__(optimizer, last_epoch)

    def get_lr(self):
        warmup_factor = 1
        if self.last_epoch < self.warmup_iters:
            if self.warmup_method == "constant":
                warmup_factor = self.warmup_factor
            elif self.warmup_method == "linear":
                alpha = float(self.last_epoch) / self.warmup_iters
                warmup_factor = self.warmup_factor * (1 - alpha) + alpha
        return [
            base_lr
            * warmup_factor
            * self.gamma ** bisect_right(self.milestones, self.last_epoch)
            for base_lr in self.base_lrs
        ]

    
def set_random_seed(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

================================================
FILE: outputs/visual.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- @Description: Juputer notebook for visualizing depth maps.\n",
    "- @Author: Zhe Zhang (doublez@stu.pku.edu.cn)\n",
    "- @Affiliation: Peking University (PKU)\n",
    "- @LastEditDate: 2023-09-07"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import sys, os\n",
    "sys.path.append('../')\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import re\n",
    "\n",
    "\n",
    "def read_pfm(filename):\n",
    "    file = open(filename, 'rb')\n",
    "    color = None\n",
    "    width = None\n",
    "    height = None\n",
    "    scale = None\n",
    "    endian = None\n",
    "\n",
    "    header = file.readline().decode('utf-8').rstrip()\n",
    "    if header == 'PF':\n",
    "        color = True\n",
    "    elif header == 'Pf':\n",
    "        color = False\n",
    "    else:\n",
    "        raise Exception('Not a PFM file.')\n",
    "\n",
    "    dim_match = re.match(r'^(\\d+)\\s(\\d+)\\s$', file.readline().decode('utf-8'))\n",
    "    if dim_match:\n",
    "        width, height = map(int, dim_match.groups())\n",
    "    else:\n",
    "        raise Exception('Malformed PFM header.')\n",
    "\n",
    "    scale = float(file.readline().rstrip())\n",
    "    if scale < 0:  # little-endian\n",
    "        endian = '<'\n",
    "        scale = -scale\n",
    "    else:\n",
    "        endian = '>'  # big-endian\n",
    "\n",
    "    data = np.fromfile(file, endian + 'f')\n",
    "    shape = (height, width, 3) if color else (height, width)\n",
    "\n",
    "    data = np.reshape(data, shape)\n",
    "    data = np.flipud(data)\n",
    "    file.close()\n",
    "    return data, scale\n",
    "\n",
    "\n",
    "def read_depth(filename):\n",
    "    depth = read_pfm(filename)[0]\n",
    "    return np.array(depth, dtype=np.float32)\n",
    "\n",
    "\n",
    "assert False"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## DTU"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "exp_name = 'dtu/geomvsnet'\n",
    "depth_name = \"00000009.pfm\"\n",
    "\n",
    "scans = os.listdir(os.path.join(exp_name))\n",
    "scans = list(filter(lambda x: x.startswith(\"scan\"), scans))\n",
    "scans.sort(key=lambda x: int(x[4:]))\n",
    "for scan in scans:\n",
    "    depth_filename = os.path.join(exp_name, scan, \"depth_est\", depth_name)\n",
    "    if not os.path.exists(depth_filename): continue\n",
    "    depth = read_depth(depth_filename)\n",
    "\n",
    "    confidence_filename = os.path.join(exp_name, scan, \"confidence\", depth_name)\n",
    "    confidence = read_depth(confidence_filename)\n",
    "\n",
    "    print(scan, depth_name)\n",
    "\n",
    "    plt.figure(figsize=(12, 12))\n",
    "    plt.subplot(1, 2, 1)\n",
    "    plt.xticks([]), plt.yticks([]), plt.axis('off')\n",
    "    plt.imshow(depth, 'viridis',  vmin=500, vmax=830)\n",
    "\n",
    "    plt.subplot(1, 2, 2)\n",
    "    plt.xticks([]), plt.yticks([]), plt.axis('off')\n",
    "    plt.imshow(confidence, 'viridis')\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## TNT"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "exp_name = './tnt/blend/geomvsnet/'\n",
    "depth_name = \"00000009.pfm\"\n",
    "\n",
    "with open(\"../datasets/lists/tnt/intermediate.txt\") as f:\n",
    "    scans_i = [line.rstrip() for line in f.readlines()]\n",
    "\n",
    "with open(\"../datasets/lists/tnt/advanced.txt\") as f:\n",
    "    scans_a = [line.rstrip() for line in f.readlines()]\n",
    "\n",
    "scans = scans_i + scans_a\n",
    "\n",
    "for scan in scans:\n",
    "\n",
    "    depth_filename = os.path.join(exp_name, scan, \"depth_est\", depth_name)\n",
    "    if not os.path.exists(depth_filename): continue\n",
    "    depth = read_depth(depth_filename)\n",
    "\n",
    "    print(scan, depth_name, depth.shape)\n",
    "\n",
    "    plt.figure(figsize=(12, 12))\n",
    "    plt.xticks([]), plt.yticks([]), plt.axis('off')\n",
    "    plt.imshow(depth, 'viridis', vmin=0, vmax=10)\n",
    "\n",
    "    plt.show()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.12"
  },
  "vscode": {
   "interpreter": {
    "hash": "d253918f84404206ad3cf9c22ee3709ef6e34cbea610b0ac9787033d60da5e03"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}


================================================
FILE: requirements.txt
================================================
torch==1.10.0
torchvision
opencv-python
numpy==1.18.1
pillow
scipy
tensorboardX
plyfile
open3d
jupyter
notebook

================================================
FILE: scripts/blend/train_blend.sh
================================================
#!/usr/bin/env bash
source scripts/data_path.sh

THISNAME="geomvsnet"

LOG_DIR="./checkpoints/blend/"$THISNAME 
if [ ! -d $LOG_DIR ]; then
    mkdir -p $LOG_DIR
fi

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node=4 train.py ${@} \
    --which_dataset="blendedmvs" --epochs=16 --logdir=$LOG_DIR \
    --trainpath=$BLENDEDMVS_ROOT --testpath=$BLENDEDMVS_ROOT \
    --trainlist="datasets/lists/blendedmvs/low_res_all.txt" --testlist="datasets/lists/blendedmvs/val.txt" \
    \
    --n_views="7" --batch_size=2 --lr=0.001 --robust_train \
    --lr_scheduler="onecycle"

================================================
FILE: scripts/data_path.sh
================================================
#!/usr/bin/env bash

# DTU
DTU_TRAIN_ROOT="[/path/to/]dtu"
DTU_TEST_ROOT="[/path/to/]dtu-test"
DTU_QUANTITATIVE_ROOT="[/path/to/]dtu-evaluation"

# Tanks and Temples
TNT_ROOT="[/path/to/]tnt"

# BlendedMVS
BLENDEDMVS_ROOT="[/path/to/]blendmvs"

================================================
FILE: scripts/dtu/fusion_dtu.sh
================================================
#!/usr/bin/env bash
source scripts/data_path.sh

THISNAME="geomvsnet"
FUSION_METHOD="open3d"

LOG_DIR="./checkpoints/dtu/"$THISNAME 
DTU_OUT_DIR="./outputs/dtu/"$THISNAME

if [ $FUSION_METHOD = "pcd" ] ; then
python3 fusions/dtu/pcd.py ${@} \
    --testpath=$DTU_TEST_ROOT --testlist="datasets/lists/dtu/test.txt" \
    --outdir=$DTU_OUT_DIR --logdir=$LOG_DIR --nolog \
    --num_worker=1 \
    \
    --thres_view=4 --conf=0.5 \
    \
    --plydir=$DTU_OUT_DIR"/pcd_fusion_plys/"
    
elif [ $FUSION_METHOD = "gipuma" ] ; then
# source [/path/to/]anaconda3/etc/profile.d/conda.sh
# conda activate fusibile
CUDA_VISIBLE_DEVICES=0 python2 fusions/dtu/gipuma.py \
    --root_dir=$DTU_TEST_ROOT --list_file="datasets/lists/dtu/test.txt" \
    --fusibile_exe_path="fusions/fusibile" --out_folder="fusibile_fused" \
    --depth_folder=$DTU_OUT_DIR \
    --downsample_factor=1 \
    \
    --prob_threshold=0.5 --disp_threshold=0.25 --num_consistent=3 \
    \
    --plydir=$DTU_OUT_DIR"/gipuma_fusion_plys/"

elif [ $FUSION_METHOD = "open3d" ] ; then
CUDA_VISIBLE_DEVICES=0 python fusions/dtu/_open3d.py --device="cuda" \
    --root_path=$DTU_TEST_ROOT \
    --depth_path=$DTU_OUT_DIR \
    --data_list="datasets/lists/dtu/test.txt" \
    \
    --prob_thresh=0.3 --dist_thresh=0.2 --num_consist=4 \
    \
    --ply_path=$DTU_OUT_DIR"/open3d_fusion_plys/"

fi


================================================
FILE: scripts/dtu/matlab_quan_dtu.sh
================================================
#!/usr/bin/env bash
source scripts/data_path.sh

OUTNAME="geomvsnet"

FUSIONMETHOD="open3d"

# Evaluation
echo "<<<<<<<<<< start parallel evaluation"
METHOD='mvsnet'
PLYPATH='../../../outputs/dtu/'$OUTNAME'/'$FUSIONMETHOD'_fusion_plys/'
RESULTPATH='../../../outputs/dtu/'$OUTNAME'/'$FUSIONMETHOD'_quantitative/'
LOGPATH='outputs/dtu/'$OUTNAME'/'$FUSIONMETHOD'_quantitative/'$OUTNAME'.log'

mkdir -p 'outputs/dtu/'$OUTNAME'/'$FUSIONMETHOD'_quantitative/'

set_array=(1 4 9 10 11 12 13 15 23 24 29 32 33 34 48 49 62 75 77 110 114 118)

num_at_once=2   # 1 2 4 5 7 11 22
times=`expr $((${#set_array[*]} / $num_at_once))`
remain=`expr $((${#set_array[*]} - $num_at_once * $times))`
this_group_num=0
pos=0

for ((t=0; t<$times; t++))
do
    if [ "$t" -ge `expr $(($times-$remain))` ] ; then
        this_group_num=`expr $(($num_at_once + 1))`
    else
        this_group_num=$num_at_once
    fi
    
    for set in "${set_array[@]:pos:this_group_num}"
    do
        matlab -nodesktop -nosplash -r "cd datasets/evaluations/dtu_parallel; dataPath='$DTU_QUANTITATIVE_ROOT'; plyPath='$PLYPATH'; resultsPath='$RESULTPATH'; method_string='$METHOD'; thisset='$set'; BaseEvalMain_web" &
    done
    wait

    pos=`expr $(($pos + $this_group_num))`

done
wait


SET=[1,4,9,10,11,12,13,15,23,24,29,32,33,34,48,49,62,75,77,110,114,118]

matlab -nodesktop -nosplash -r "cd datasets/evaluations/dtu_parallel; resultsPath='$RESULTPATH'; method_string='$METHOD'; set='$SET'; ComputeStat_web" > $LOGPATH

================================================
FILE: scripts/dtu/test_dtu.sh
================================================
#!/usr/bin/env bash
source scripts/data_path.sh

THISNAME="geomvsnet"
BESTEPOCH="geomvsnet_release"

LOG_DIR="./checkpoints/dtu/"$THISNAME 
DTU_CKPT_FILE=$LOG_DIR"/model_"$BESTEPOCH".ckpt"
DTU_OUT_DIR="./outputs/dtu/"$THISNAME

CUDA_VISIBLE_DEVICES=0 python3 test.py ${@} \
    --which_dataset="dtu" --loadckpt=$DTU_CKPT_FILE --batch_size=1 \
    --outdir=$DTU_OUT_DIR --logdir=$LOG_DIR --nolog \
    --testpath=$DTU_TEST_ROOT --testlist="datasets/lists/dtu/test.txt" \
    \
    --data_scale="raw" --n_views="5"

================================================
FILE: scripts/dtu/train_dtu.sh
================================================
#!/usr/bin/env bash
source scripts/data_path.sh

THISNAME="geomvsnet"

LOG_DIR="./checkpoints/dtu/"$THISNAME 
if [ ! -d $LOG_DIR ]; then
    mkdir -p $LOG_DIR
fi

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node=4 train.py ${@} \
    --which_dataset="dtu" --epochs=16 --logdir=$LOG_DIR \
    --trainpath=$DTU_TRAIN_ROOT --testpath=$DTU_TRAIN_ROOT \
    --trainlist="datasets/lists/dtu/train.txt" --testlist="datasets/lists/dtu/test.txt" \
    \
    --data_scale="mid" --n_views="5" --batch_size=4 --lr=0.002 --robust_train \
    --lrepochs="1,3,5,7,9,11,13,15:1.5"

================================================
FILE: scripts/dtu/train_dtu_raw.sh
================================================
#!/usr/bin/env bash
source scripts/data_path.sh

THISNAME="geomvsnet_raw"

LOG_DIR="./checkpoints/dtu/"$THISNAME 
if [ ! -d $LOG_DIR ]; then
    mkdir -p $LOG_DIR
fi

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node=4 train.py ${@} \
    --which_dataset="dtu" --epochs=16 --logdir=$LOG_DIR \
    --trainpath=$DTU_TRAIN_ROOT --testpath=$DTU_TRAIN_ROOT \
    --trainlist="datasets/lists/dtu/train.txt" --testlist="datasets/lists/dtu/test.txt" \
    \
    --data_scale="raw" --n_views="5" --batch_size=1 --lr=0.0005 --robust_train \
    --lrepochs="1,3,5,7,9,11,13,15:1.5"

================================================
FILE: scripts/tnt/fusion_tnt.sh
================================================
#!/usr/bin/env bash
source scripts/data_path.sh

THISNAME="blend/geomvsnet"

LOG_DIR="./checkpoints/tnt/"$THISNAME 
TNT_OUT_DIR="./outputs/tnt/"$THISNAME

# Intermediate
python3 fusions/tnt/dypcd.py ${@} \
    --root_dir=$TNT_ROOT --list_file="datasets/lists/tnt/intermediate.txt" --split="intermediate" \
    --out_dir=$TNT_OUT_DIR --ply_path=$TNT_OUT_DIR"/dypcd_fusion_plys" \
    --img_mode="resize" --cam_mode="origin" --single_processor 

# Advanced
python3 fusions/tnt/dypcd.py ${@} \
    --root_dir=$TNT_ROOT --list_file="datasets/lists/tnt/advanced.txt" --split="advanced" \
    --out_dir=$TNT_OUT_DIR --ply_path=$TNT_OUT_DIR"/dypcd_fusion_plys" \
    --img_mode="resize" --cam_mode="origin" --single_processor

================================================
FILE: scripts/tnt/test_tnt.sh
================================================
#!/usr/bin/env bash
source scripts/data_path.sh

THISNAME="blend/geomvsnet"
BESTEPOCH="15"

LOG_DIR="./checkpoints/"$THISNAME
CKPT_FILE=$LOG_DIR"/model_"$BESTEPOCH".ckpt"
TNT_OUT_DIR="./outputs/tnt/"$THISNAME

# Intermediate
CUDA_VISIBLE_DEVICES=0 python3 test.py ${@} \
    --which_dataset="tnt" --loadckpt=$CKPT_FILE --batch_size=1 \
    --outdir=$TNT_OUT_DIR --logdir=$LOG_DIR --nolog \
    --testpath=$TNT_ROOT --testlist="datasets/lists/tnt/intermediate.txt" --split="intermediate" \
    \
    --n_views="11" --img_mode="resize" --cam_mode="origin"

# Advanced
CUDA_VISIBLE_DEVICES=0 python3 test.py ${@} \
    --which_dataset="tnt" --loadckpt=$CKPT_FILE --batch_size=1 \
    --outdir=$TNT_OUT_DIR --logdir=$LOG_DIR --nolog \
    --testpath=$TNT_ROOT --testlist="datasets/lists/tnt/advanced.txt" --split="advanced" \
    \
    --n_views="11" --img_mode="resize" --cam_mode="origin"

================================================
FILE: test.py
================================================
# -*- coding: utf-8 -*-
# @Description: Main process of network testing.
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07

import os, time, sys, gc, cv2, logging
import numpy as np
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
from torch.utils.data import DataLoader

from datasets.data_io import *
from datasets.dtu import DTUDataset
from datasets.tnt import TNTDataset

from models.geomvsnet import GeoMVSNet
from models.utils import *
from models.utils.opts import get_opts


cudnn.benchmark = True

args = get_opts()


def test():
    total_time = 0
    with torch.no_grad():
        for batch_idx, sample in enumerate(TestImgLoader):
            sample_cuda = tocuda(sample)
            start_time = time.time()

            # @Note GeoMVSNet main
            outputs = model(
                sample_cuda["imgs"], 
                sample_cuda["proj_matrices"], sample_cuda["intrinsics_matrices"], 
                sample_cuda["depth_values"], 
                sample["filename"]
            )

            end_time = time.time()
            total_time += end_time - start_time
            outputs = tensor2numpy(outputs)
            del sample_cuda

            filenames = sample["filename"]
            cams = sample["proj_matrices"]["stage{}".format(args.levels)].numpy()
            imgs = sample["imgs"]
            logger.info('Iter {}/{}, Time:{:.3f} Res:{}'.format(batch_idx, len(TestImgLoader), end_time - start_time, imgs[0].shape))


            for filename, cam, img, depth_est, photometric_confidence in zip(filenames, cams, imgs, outputs["depth"], outputs["photometric_confidence"]):
                img = img[0].numpy()    # ref view
                cam = cam[0]            # ref cam

                depth_filename = os.path.join(args.outdir, filename.format('depth_est', '.pfm'))
                confidence_filename = os.path.join(args.outdir, filename.format('confidence', '.pfm'))
                cam_filename = os.path.join(args.outdir, filename.format('cams', '_cam.txt'))
                img_filename = os.path.join(args.outdir, filename.format('images', '.jpg'))
                os.makedirs(depth_filename.rsplit('/', 1)[0], exist_ok=True)
                os.makedirs(confidence_filename.rsplit('/', 1)[0], exist_ok=True)
                if args.which_dataset == 'dtu':
                    os.makedirs(cam_filename.rsplit('/', 1)[0], exist_ok=True)
                    os.makedirs(img_filename.rsplit('/', 1)[0], exist_ok=True)
                
                # save depth maps
                save_pfm(depth_filename, depth_est)

                # save confidence maps
                confidence_list = [outputs['stage{}'.format(i)]['photometric_confidence'].squeeze(0) for i in range(1,5)]
                photometric_confidence = confidence_list[-1]
                if not args.save_conf_all_stages:
                    save_pfm(confidence_filename, photometric_confidence) 
                else:
                    for stage_idx, photometric_confidence in enumerate(confidence_list):
                        if stage_idx != args.levels - 1:
                            confidence_filename = os.path.join(args.outdir, filename.format('confidence', "_stage"+str(stage_idx)+'.pfm'))
                        else:
                            confidence_filename = os.path.join(args.outdir, filename.format('confidence', '.pfm'))
                        save_pfm(confidence_filename, photometric_confidence) 

                # save cams, img
                if args.which_dataset == 'dtu':
                    write_cam(cam_filename, cam)
                    img = np.clip(np.transpose(img, (1, 2, 0)) * 255, 0, 255).astype(np.uint8)
                    img_bgr = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
                    cv2.imwrite(img_filename, img_bgr)

    torch.cuda.empty_cache()
    gc.collect()
    return total_time, len(TestImgLoader)


def initLogger():
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
    curTime = time.strftime('%Y%m%d-%H%M', time.localtime(time.time()))

    if args.which_dataset == 'tnt':
        logfile = os.path.join(args.logdir, 'TNT-test-' + curTime + '.log')
    else:
        logfile = os.path.join(args.logdir, 'test-' + curTime + '.log')
    
    formatter = logging.Formatter("%(asctime)s - %(filename)s[line:%(lineno)d] - %(levelname)s: %(message)s")
    if not args.nolog:
        fileHandler = logging.FileHandler(logfile, mode='a')
        fileHandler.setFormatter(formatter)
        logger.addHandler(fileHandler)
    consoleHandler = logging.StreamHandler(sys.stdout)
    consoleHandler.setFormatter(formatter)
    logger.addHandler(consoleHandler)
    logger.info("Logger initialized.")
    logger.info("Writing logs to file: {}".format(logfile))
    logger.info("Current time: {}".format(curTime))

    settings_str = "All settings:\n"
    for k,v in vars(args).items(): 
        settings_str += '{0}: {1}\n'.format(k,v)
    logger.info(settings_str)

    return logger


if __name__ == '__main__':
    logger = initLogger()

    # dataset, dataloader
    if args.which_dataset == 'dtu':
        test_dataset = DTUDataset(args.testpath, args.testlist, "test", args.n_views, max_wh=(1600, 1200))
    elif args.which_dataset == 'tnt':
        test_dataset = TNTDataset(args.testpath, args.testlist, split=args.split, n_views=args.n_views, img_wh=(-1, 1024), cam_mode=args.cam_mode, img_mode=args.img_mode)

    TestImgLoader = DataLoader(test_dataset, args.batch_size, shuffle=False, num_workers=4, drop_last=False)

    # @Note GeoMVSNet model
    model = GeoMVSNet(
        levels=args.levels, 
        hypo_plane_num_stages=[int(n) for n in args.hypo_plane_num_stages.split(",")], 
        depth_interal_ratio_stages=[float(ir) for ir in args.depth_interal_ratio_stages.split(",")],
        feat_base_channel=args.feat_base_channel, 
        reg_base_channel=args.reg_base_channel,
        group_cor_dim_stages=[int(n) for n in args.group_cor_dim_stages.split(",")],
    )
    
    logger.info("loading model {}".format(args.loadckpt))
    state_dict = torch.load(args.loadckpt, map_location=torch.device("cpu"))
    model.load_state_dict(state_dict['model'], strict=False)

    model.cuda()
    model.eval()

    test()

================================================
FILE: train.py
================================================
# -*- coding: utf-8 -*-
# @Description: Main process of network training & evaluation.
# @Author: Zhe Zhang (doublez@stu.pku.edu.cn)
# @Affiliation: Peking University (PKU)
# @LastEditDate: 2023-09-07

import os, sys, time, gc, datetime, logging, json
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim as optim
import torch.distributed as dist
from torch.utils.data import DataLoader
from tensorboardX import SummaryWriter

from datasets.dtu import DTUDataset
from datasets.blendedmvs import BlendedMVSDataset

from models.geomvsnet import GeoMVSNet
from models.loss import geomvsnet_loss
from models.utils import *
from models.utils.opts import get_opts


cudnn.benchmark = True
num_gpus = int(os.environ["WORLD_SIZE"]) if "WORLD_SIZE" in os.environ else 1
is_distributed = num_gpus > 1

args = get_opts()


def train(model, model_loss, optimizer, TrainImgLoader, TestImgLoader, start_epoch, args):
    if args.lr_scheduler == 'MS':
        milestones = [len(TrainImgLoader) * int(epoch_idx) for epoch_idx in args.lrepochs.split(':')[0].split(',')]
        lr_gamma = 1 / float(args.lrepochs.split(':')[1])
        lr_scheduler = WarmupMultiStepLR(optimizer, milestones, gamma=lr_gamma, warmup_factor=1.0/3, warmup_iters=500, last_epoch=len(TrainImgLoader) * start_epoch - 1)
    elif args.lr_scheduler == 'cos':
        lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=int(args.epochs*len(TrainImgLoader)), eta_min=0)
    elif args.lr_scheduler == 'onecycle':
        lr_scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=args.lr, total_steps=int(args.epochs*len(TrainImgLoader)))
    elif args.lr_scheduler == 'lambda':
        lr_scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda epoch: 0.9 ** ((epoch-1) / len(TrainImgLoader)), last_epoch=len(TrainImgLoader)*start_epoch-1)


    for epoch_idx in range(start_epoch, args.epochs):
        logger.info('Epoch {}:'.format(epoch_idx))
        global_step = len(TrainImgLoader) * epoch_idx

        # training
        for batch_idx, sample in enumerate(TrainImgLoader):
            start_time = time.time()
            global_step = len(TrainImgLoader) * epoch_idx + batch_idx
            do_summary = global_step % args.summary_freq == 0
            loss, scalar_outputs, image_outputs = train_sample(model, model_loss, optimizer, sample, args)
            lr_scheduler.step()
            if (not is_distributed) or (dist.get_rank() == 0):
                if do_summary:
                    if not args.notensorboard:
                        tb_save_scalars(tb_writer, 'train', scalar_outputs, global_step)
                        tb_save_images(tb_writer, 'train', image_outputs, global_step)
                    logger.info("Epoch {}/{}, Iter {}/{}, 2mm_err={:.3f} | lr={:.6f}, train_loss={:.3f}, abs_err={:.3f}, pw_loss={:.3f}, dds_loss={:.3f}, time={:.3f}".format(
                           epoch_idx, args.epochs, batch_idx, len(TrainImgLoader),
                           scalar_outputs["thres2mm_error"],
                           optimizer.param_groups[0]["lr"], 
                           loss,
                           scalar_outputs["abs_depth_error"],
                           scalar_outputs["s3_pw_loss"],
                           scalar_outputs["s3_dds_loss"],
                           time.time() - start_time))
                del scalar_outputs, image_outputs

        # save checkpoint
        if (not is_distributed) or (dist.get_rank() == 0):
            if ((epoch_idx + 1) % args.save_freq == 0) or (epoch_idx == args.epochs-1):
                torch.save({
                    'epoch': epoch_idx,
                    'model': model.module.state_dict(),
                    'optimizer': optimizer.state_dict()},
                    "{}/model_{:0>2}.ckpt".format(args.logdir, epoch_idx))  
        gc.collect()

        # testing
        if (epoch_idx % args.eval_freq == 0) or (epoch_idx == args.epochs - 1):
            avg_test_scalars = DictAverageMeter()
            for batch_idx, sample in enumerate(TestImgLoader):
                start_time = time.time()
                global_step = len(TrainImgLoader) * epoch_idx + batch_idx
                do_summary = global_step % args.summary_freq == 0
                loss, scalar_outputs, image_outputs = test_sample_depth(model, model_loss, sample, args)
                if (not is_distributed) or (dist.get_rank() == 0):
                    if do_summary:
                        if not args.notensorboard:
                            tb_save_scalars(tb_writer, 'test', scalar_outputs, global_step)
                            tb_save_images(tb_writer, 'test', image_outputs, global_step)
                        logger.info(
                            "Epoch {}/{}, Iter {}/{}, 2mm_err={:.3f} | lr={:.6f}, test_loss={:.3f}, abs_err={:.3f}, pw_loss={:.3f}, dds_loss={:.3f}, time={:.3f}".format(
                            epoch_idx, args.epochs, batch_idx, len(TestImgLoader),
                            scalar_outputs["thres2mm_error"],
                            optimizer.param_groups[0]["lr"], 
                            loss,
                            scalar_outputs["abs_depth_error"],
                            scalar_outputs["s3_pw_loss"],
                            scalar_outputs["s3_dds_loss"],
                            time.time() - start_time))
                    avg_test_scalars.update(scalar_outputs)
                    del scalar_outputs, image_outputs

            if (not is_distributed) or (dist.get_rank() == 0):
                if not args.notensorboard:
                    tb_save_scalars(tb_writer, 'fulltest', avg_test_scalars.mean(), global_step)
                logger.info("avg_test_scalars: " + json.dumps(avg_test_scalars.mean()))
            gc.collect()


def train_sample(model, model_loss, optimizer, sample, args):
    model.train()
    optimizer.zero_grad()

    sample_cuda = tocuda(sample)
    depth_gt_ms, mask_ms = sample_cuda["depth"], sample_cuda["mask"]
    depth_gt, mask = depth_gt_ms["stage{}".format(args.levels)], mask_ms["stage{}".format(args.levels)]

    # @Note GeoMVSNet main
    outputs = model(
        sample_cuda["imgs"], 
        sample_cuda["proj_matrices"], sample_cuda["intrinsics_matrices"], 
        sample_cuda["depth_values"]
    )

    depth_est = outputs["depth"]

    loss, epe, pw_loss_stages, dds_loss_stages = model_loss(
        outputs, depth_gt_ms, mask_ms, 
        stage_lw=[float(e) for e in args.stage_lw.split(",") if e], depth_values=sample_cuda["depth_values"]
    )

    loss.backward()
    optimizer.step()

    scalar_outputs = {
        "loss": loss,
        "epe": epe,
        "s0_pw_loss": pw_loss_stages[0],
        "s1_pw_loss": pw_loss_stages[1],
        "s2_pw_loss": pw_loss_stages[2],
        "s3_pw_loss": pw_loss_stages[3],
        "s0_dds_loss": dds_loss_stages[0],
        "s1_dds_loss": dds_loss_stages[1],
        "s2_dds_loss": dds_loss_stages[2],
        "s3_dds_loss": dds_loss_stages[3],
        "abs_depth_error": AbsDepthError_metrics(depth_est, depth_gt, mask > 0.5),
        "thres2mm_error": Thres_metrics(depth_est, depth_gt, mask > 0.5, 2),
        "thres4mm_error": Thres_metrics(depth_est, depth_gt, mask > 0.5, 4),
        "thres8mm_error": Thres_metrics(depth_est, depth_gt, mask > 0.5, 8),
    }

    image_outputs = {
        "depth_est": depth_est * mask,
        "depth_est_nomask": depth_est,
        "depth_gt": sample["depth"]["stage1"],
        "ref_img": sample["imgs"][0],
        "mask": sample["mask"]["stage1"],
        "errormap": (depth_est - depth_gt).abs() * mask,
    }

    if is_distributed:
        scalar_outputs = reduce_scalar_outputs(scalar_outputs)

    return tensor2float(scalar_outputs["loss"]), tensor2float(scalar_outputs), tensor2numpy(image_outputs)


@make_nograd_func
def test_sample_depth(model, model_loss, sample, args):
    if is_distributed:
        model_eval = model.module
    else:
        model_eval = model
    model_eval.eval()

    sample_cuda = tocuda(sample)
    depth_gt_ms, mask_ms = sample_cuda["depth"], sample_cuda["mask"]
    depth_gt, mask = depth_gt_ms["stage{}".format(args.levels)], mask_ms["stage{}".format(args.levels)]

    outputs = model_eval(
        sample_cuda["imgs"], 
        sample_cuda["proj_matrices"], sample_cuda["intrinsics_matrices"], 
        sample_cuda["depth_values"]
    )
    
    depth_est = outputs["depth"]

    loss, epe, pw_loss_stages, dds_loss_stages = model_loss(
        outputs, depth_gt_ms, mask_ms, 
        stage_lw=[float(e) for e in args.stage_lw.split(",") if e], depth_values=sample_cuda["depth_values"]
    )
    
    scalar_outputs = {
        "loss": loss,
        "epe": epe,
        "s0_pw_loss": pw_loss_stages[0],
        "s1_pw_loss": pw_loss_stages[1],
        "s2_pw_loss": pw_loss_stages[2],
        "s3_pw_loss": pw_loss_stages[3],
        "s0_dds_loss": dds_loss_stages[0],
        "s1_dds_loss": dds_loss_stages[1],
        "s2_dds_loss": dds_loss_stages[2],
        "s3_dds_loss": dds_loss_stages[3],
        "abs_depth_error": AbsDepthError_metrics(depth_est, depth_gt, mask > 0.5),
        "thres2mm_error": Thres_metrics(depth_est, depth_gt, mask > 0.5, 2),
        "thres4mm_error": Thres_metrics(depth_est, depth_gt, mask > 0.5, 4),
        "thres8mm_error": Thres_metrics(depth_est, depth_gt, mask > 0.5, 8),
    }

    image_outputs = {
        "depth_est": depth_est * mask,
        "depth_est_nomask": depth_est,
        "depth_gt": sample["depth"]["stage1"],
        "ref_img": sample["imgs"][0],
        "mask": sample["mask"]["stage1"],
        "errormap": (depth_est - depth_gt).abs() * mask
    }

    if is_distributed:
        scalar_outputs = reduce_scalar_outputs(scalar_outputs)

    return tensor2float(scalar_outputs["loss"]), tensor2float(scalar_outputs), tensor2numpy(image_outputs)


def initLogger():
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
    curTime = time.strftime('%Y%m%d-%H%M', time.localtime(time.time()))
    logfile = os.path.join(args.logdir, 'train-' + curTime + '.log')
    formatter = logging.Formatter("%(asctime)s - %(filename)s[line:%(lineno)d] - %(levelname)s: %(message)s")
    fileHandler = logging.FileHandler(logfile, mode='a')
    fileHandler.setFormatter(formatter)
    logger.addHandler(fileHandler)
    consoleHandler = logging.StreamHandler(sys.stdout)
    consoleHandler.setFormatter(formatter)
    logger.addHandler(consoleHandler)
    logger.info("Logger initialized.")
    logger.info("Writing logs to file: {}".format(logfile))
    logger.info("Current time: {}".format(curTime))

    settings_str = "All settings:\n"
    for k,v in vars(args).items(): 
        settings_str += '{0}: {1}\n'.format(k,v)
    logger.info(settings_str)

    return logger


if __name__ == '__main__':
    logger = initLogger()

    if args.resume:
        assert args.mode == "train"
        assert args.loadckpt is None

    if is_distributed:
        torch.cuda.set_device(args.local_rank)
        torch.distributed.init_process_group(backend="nccl", init_method="env://")
        synchronize()

    set_random_seed(args.seed)
    device = torch.device(args.device)


    # tensorboard
    if (not is_distributed) or (dist.get_rank() == 0):
        if not os.path.isdir(args.logdir):
            os.makedirs(args.logdir)
        current_time_str = str(datetime.datetime.now().strftime('%Y%m%d_%H%M%S'))
        logger.info("current time " + current_time_str)
        logger.info("creating new summary file")
        if not args.notensorboard:
            tb_writer = SummaryWriter(args.logdir)


    # @Note GeoMVSNet model
    model = GeoMVSNet(
        levels=args.levels, 
        hypo_plane_num_stages=[int(n) for n in args.hypo_plane_num_stages.split(",")], 
        depth_interal_ratio_stages=[float(ir) for ir in args.depth_interal_ratio_stages.split(",")],
        feat_base_channel=args.feat_base_channel, 
        reg_base_channel=args.reg_base_channel,
        group_cor_dim_stages=[int(n) for n in args.group_cor_dim_stages.split(",")],
    )
    model.to(device)

    model_loss = geomvsnet_loss

    # optimizer
    optimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=args.lr, betas=(0.9, 0.999), weight_decay=args.wd)


    # load parameters
    start_epoch = 0
    if args.resume:
        saved_models = [fn for fn in os.listdir(args.logdir) if fn.endswith(".ckpt")]
        saved_models = sorted(saved_models, key=lambda x: int(x.split('_')[-1].split('.')[0]))
        loadckpt = os.path.join(args.logdir, saved_models[-1])
        logger.info("resuming: " + loadckpt)
        state_dict = torch.load(loadckpt, map_location=torch.device("cpu"))
        model.load_state_dict(state_dict['model'])
        optimizer.load_state_dict(state_dict['optimizer'])
        start_epoch = state_dict['epoch'] + 1


    # distributed
    if (not is_distributed) or (dist.get_rank() == 0):
        logger.info("start at epoch {}".format(start_epoch))
        logger.info('Number of model parameters: {}'.format(sum([p.data.nelement() for p in model.parameters()])))

    if is_distributed:
        if dist.get_rank() == 0:
            logger.info("Let's use {} GPUs in distributed mode!".format(torch.cuda.device_count()))
        model = torch.nn.parallel.DistributedDataParallel(
            model, device_ids=[args.local_rank], output_device=args.local_rank,
            find_unused_parameters=True,
        )
    else:
        if torch.cuda.is_available():
            logger.info("Let's use {} GPUs in parallel mode.".format(torch.cuda.device_count()))
            model = nn.DataParallel(model)


    # dataset, dataloader
    if args.which_dataset == "dtu":
        train_dataset = DTUDataset(args.trainpath, args.trainlist, "train", args.n_views, data_scale=args.data_scale, robust_train=args.robust_train)
        test_dataset = DTUDataset(args.testpath, args.testlist, "val", args.n_views, data_scale=args.data_scale)
    elif args.which_dataset == "blendedmvs":
        train_dataset = BlendedMVSDataset(args.trainpath, args.trainlist, "train", args.n_views, img_wh=(768, 576), robust_train=args.robust_train, augment=False)
        test_dataset = BlendedMVSDataset(args.testpath, args.testlist, "val", args.n_views, img_wh=(768, 576))

    if is_distributed:
        train_sampler = torch.utils.data.DistributedSampler(train_dataset, num_replicas=dist.get_world_size(), rank=dist.get_rank())
        test_sampler = torch.utils.data.DistributedSampler(test_dataset, num_replicas=dist.get_world_size(), rank=dist.get_rank())

        TrainImgLoader = DataLoader(train_dataset, args.batch_size, sampler=train_sampler, num_workers=8, drop_last=True, pin_memory=args.pin_m)
        TestImgLoader = DataLoader(test_dataset, args.batch_size, sampler=test_sampler, num_workers=8, drop_last=False, pin_memory=args.pin_m)
    else:
        TrainImgLoader = DataLoader(train_dataset, args.batch_size, shuffle=True, num_workers=8, drop_last=True, pin_memory=args.pin_m)
        TestImgLoader = DataLoader(test_dataset, args.batch_size, shuffle=False, num_workers=8, drop_last=False, pin_memory=args.pin_m)

    train(model, model_loss, optimizer, TrainImgLoader, TestImgLoader, start_epoch, args)