Repository: HLinChen/VCR-GauS
Branch: main
Commit: aa715d19bfac
Files: 99
Total size: 474.8 KB
Directory structure:
gitextract_oa0m8dnw/
├── .gitmodules
├── LICENSE.md
├── README.md
├── arguments/
│ └── __init__.py
├── bash_scripts/
│ ├── 0_train.sh
│ ├── 1_preprocess_tnt.sh
│ ├── 2_extract_normal_dsine.sh
│ ├── 3_extract_mask.sh
│ ├── 4_extract_normal_geow.sh
│ ├── convert.sh
│ └── install.sh
├── configs/
│ ├── 360_v2/
│ │ └── base.yaml
│ ├── config.py
│ ├── config_base.yaml
│ ├── dtu/
│ │ ├── base.yaml
│ │ └── dtu_scan24.yaml
│ ├── reconstruct.yaml
│ ├── scannetpp/
│ │ └── base.yaml
│ └── tnt/
│ ├── Barn.yaml
│ ├── Caterpillar.yaml
│ ├── Courthouse.yaml
│ ├── Ignatius.yaml
│ ├── Meetingroom.yaml
│ ├── Truck.yaml
│ └── base.yaml
├── environment.yml
├── evaluation/
│ ├── crop_mesh.py
│ ├── eval_dtu/
│ │ ├── eval.py
│ │ ├── evaluate_single_scene.py
│ │ └── render_utils.py
│ ├── eval_tnt.py
│ ├── full_eval.py
│ ├── lpipsPyTorch/
│ │ ├── __init__.py
│ │ └── modules/
│ │ ├── lpips.py
│ │ ├── networks.py
│ │ └── utils.py
│ ├── metrics.py
│ ├── render.py
│ └── tnt_eval/
│ ├── README.md
│ ├── config.py
│ ├── evaluation.py
│ ├── plot.py
│ ├── registration.py
│ ├── requirements.txt
│ ├── run.py
│ ├── trajectory_io.py
│ └── util.py
├── gaussian_renderer/
│ ├── __init__.py
│ └── network_gui.py
├── process_data/
│ ├── convert.py
│ ├── convert_360_to_json.py
│ ├── convert_data_to_json.py
│ ├── convert_dtu_to_json.py
│ ├── convert_tnt_to_json.py
│ ├── extract_mask.py
│ ├── extract_normal.py
│ ├── extract_normal_geo.py
│ ├── visualize_colmap.ipynb
│ └── visualize_transforms.ipynb
├── pyproject.toml
├── python_scripts/
│ ├── run_base.py
│ ├── run_dtu.py
│ ├── run_mipnerf360.py
│ ├── run_tnt.py
│ ├── show_360.py
│ ├── show_dtu.py
│ └── show_tnt.py
├── requirements.txt
├── scene/
│ ├── __init__.py
│ ├── appearance_network.py
│ ├── cameras.py
│ ├── colmap_loader.py
│ ├── dataset_readers.py
│ └── gaussian_model.py
├── tools/
│ ├── __init__.py
│ ├── camera.py
│ ├── camera_utils.py
│ ├── crop_mesh.py
│ ├── denoise_pcd.py
│ ├── depth2mesh.py
│ ├── distributed.py
│ ├── general_utils.py
│ ├── graphics_utils.py
│ ├── image_utils.py
│ ├── loss_utils.py
│ ├── math_utils.py
│ ├── mcube_utils.py
│ ├── mesh_utils.py
│ ├── normal_utils.py
│ ├── prune.py
│ ├── render_utils.py
│ ├── semantic_id.py
│ ├── sh_utils.py
│ ├── system_utils.py
│ ├── termcolor.py
│ ├── visualization.py
│ └── visualize.py
├── train.py
└── trainer.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitmodules
================================================
[submodule "submodules/simple-knn"]
path = submodules/simple-knn
url = https://gitlab.inria.fr/bkerbl/simple-knn.git
[submodule "submodules/diff-gaussian-rasterization"]
path = submodules/diff-gaussian-rasterization
url = https://github.com/HLinChen/diff-gaussian-rasterization
[submodule "SIBR_viewers"]
path = SIBR_viewers
url = https://gitlab.inria.fr/sibr/sibr_core.git
[submodule "submodules/colmap"]
path = submodules/colmap
url = https://github.com/colmap/colmap.git
================================================
FILE: LICENSE.md
================================================
Gaussian-Splatting License
===========================
**Inria** and **the Max Planck Institut for Informatik (MPII)** hold all the ownership rights on the *Software* named **gaussian-splatting**.
The *Software* is in the process of being registered with the Agence pour la Protection des
Programmes (APP).
The *Software* is still being developed by the *Licensor*.
*Licensor*'s goal is to allow the research community to use, test and evaluate
the *Software*.
## 1. Definitions
*Licensee* means any person or entity that uses the *Software* and distributes
its *Work*.
*Licensor* means the owners of the *Software*, i.e Inria and MPII
*Software* means the original work of authorship made available under this
License ie gaussian-splatting.
*Work* means the *Software* and any additions to or derivative works of the
*Software* that are made available under this License.
## 2. Purpose
This license is intended to define the rights granted to the *Licensee* by
Licensors under the *Software*.
## 3. Rights granted
For the above reasons Licensors have decided to distribute the *Software*.
Licensors grant non-exclusive rights to use the *Software* for research purposes
to research users (both academic and industrial), free of charge, without right
to sublicense.. The *Software* may be used "non-commercially", i.e., for research
and/or evaluation purposes only.
Subject to the terms and conditions of this License, you are granted a
non-exclusive, royalty-free, license to reproduce, prepare derivative works of,
publicly display, publicly perform and distribute its *Work* and any resulting
derivative works in any form.
## 4. Limitations
**4.1 Redistribution.** You may reproduce or distribute the *Work* only if (a) you do
so under this License, (b) you include a complete copy of this License with
your distribution, and (c) you retain without modification any copyright,
patent, trademark, or attribution notices that are present in the *Work*.
**4.2 Derivative Works.** You may specify that additional or different terms apply
to the use, reproduction, and distribution of your derivative works of the *Work*
("Your Terms") only if (a) Your Terms provide that the use limitation in
Section 2 applies to your derivative works, and (b) you identify the specific
derivative works that are subject to Your Terms. Notwithstanding Your Terms,
this License (including the redistribution requirements in Section 3.1) will
continue to apply to the *Work* itself.
**4.3** Any other use without of prior consent of Licensors is prohibited. Research
users explicitly acknowledge having received from Licensors all information
allowing to appreciate the adequacy between of the *Software* and their needs and
to undertake all necessary precautions for its execution and use.
**4.4** The *Software* is provided both as a compiled library file and as source
code. In case of using the *Software* for a publication or other results obtained
through the use of the *Software*, users are strongly encouraged to cite the
corresponding publications as explained in the documentation of the *Software*.
## 5. Disclaimer
THE USER CANNOT USE, EXPLOIT OR DISTRIBUTE THE *SOFTWARE* FOR COMMERCIAL PURPOSES
WITHOUT PRIOR AND EXPLICIT CONSENT OF LICENSORS. YOU MUST CONTACT INRIA FOR ANY
UNAUTHORIZED USE: stip-sophia.transfert@inria.fr . ANY SUCH ACTION WILL
CONSTITUTE A FORGERY. THIS *SOFTWARE* IS PROVIDED "AS IS" WITHOUT ANY WARRANTIES
OF ANY NATURE AND ANY EXPRESS OR IMPLIED WARRANTIES, WITH REGARDS TO COMMERCIAL
USE, PROFESSIONNAL USE, LEGAL OR NOT, OR OTHER, OR COMMERCIALISATION OR
ADAPTATION. UNLESS EXPLICITLY PROVIDED BY LAW, IN NO EVENT, SHALL INRIA OR THE
AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES, LOSS OF USE, DATA, OR PROFITS OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING FROM, OUT OF OR
IN CONNECTION WITH THE *SOFTWARE* OR THE USE OR OTHER DEALINGS IN THE *SOFTWARE*.
================================================
FILE: README.md
================================================
VCR-GauS: View Consistent Depth-Normal Regularizer for Gaussian Surface Reconstruction
Hanlin Chen,
Fangyin Wei,
Chen Li,
Tianxin Huang,
Yunsong Wang,
Gim Hee Lee
NeurIPS 2024
VCR-GauS formulates a novel multi-view D-Normal regularizer that enables full optimization of the Gaussian geometric parameters to achieve better surface reconstruction. We further design a confidence term to weigh our D-Normal regularizer to mitigate inconsistencies of normal predictions across multiple views.
# Updates
* **[2024.10.31]**: We uploaded a new version to arXiv, adding theoretical proofs and visualization results for the D-Normal Regularizer.
* **[2024.09.24]**: VCR-GauS is accepted to NeurIPS 2024.
# Installation
Clone the repository and create an anaconda environment using
```
git clone https://github.com/HLinChen/VCR-GauS.git --recursive
cd VCR-GauS
git pull --recurse-submodules
env=vcr
conda create -n $env -y python=3.10
conda activate $env
pip install -e ".[train]"
# you can specify your own cuda path
export CUDA_HOME=/usr/local/cuda-11.8
pip install -r requirements.txt
```
We also uploaded a built anaconda environment [here](https://huggingface.co/hanlin-chen/VCR-GauS/resolve/main/vcr.zip?download=true); you can download it and unzip and put it in your_anaconda_path/envs/ .
For eval TNT with the official scripts, you need to build a new environment with open3d==0.10:
```
env=f1eval
conda create -n $env -y python=3.8
conda activate $env
pip install -e ".[f1eval]"
```
For extract normal maps based on [DSINE](https://baegwangbin.github.io/DSINE/), you need to build a new environment:
```
conda create --name dsine python=3.10
conda activate dsine
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
python -m pip install geffnet
```
Similar to Gaussian Splatting, we also use colmap to process data and you can follow [COLMAP website](https://colmap.github.io/) to install it.
# Dataset
## Tanks and Temples dataset
You can download the proprocessed Tanks and Temples dataset from [here](https://huggingface.co/hanlin-chen/VCR-GauS/resolve/main/tnt.zip?download=true). Or proprocess it by your self:
Download the data from [Tanks and Temples](https://tanksandtemples.org/download/) website.
You will also need to download additional [COLMAP/camera/alignment](https://drive.google.com/file/d/1jAr3IDvhVmmYeDWi0D_JfgiHcl70rzVE/view?resourcekey=) and the images of each scene.
The file structure should look like (you need to move the downloaded images to folder `images_raw`):
```
tanks_and_temples
├─ Barn
│ ├─ Barn_COLMAP_SfM.log (camera poses)
│ ├─ Barn.json (cropfiles)
│ ├─ Barn.ply (ground-truth point cloud)
│ ├─ Barn_trans.txt (colmap-to-ground-truth transformation)
│ └─ images_raw (raw input images downloaded from Tanks and Temples website)
│ ├─ 000001.png
│ ├─ 000002.png
│ ...
├─ Caterpillar
│ ├─ ...
...
```
#### 1. Colmap and bounding box json
Run the following command to generate json and colmap files:
```bash
# Modify --tnt_path to be the Tanks and Temples root directory.
sh bash_scripts/1_preprocess_tnt.sh
```
#### 2. Normal maps
You need to download the [code](https://github.com/baegwangbin/DSINE) and [model weight](https://drive.google.com/drive/folders/1t3LMJIIrSnCGwOEf53Cyg0lkSXd3M4Hm) of DSINE first. Then, modify **CODE_PATH** to be the DSINE root directory, **CKPT** to be the DSINE model path, **DATADIR** to be the TNT root directory in the bash script.
Run the following command to generate normal maps:
```bash
sh bash_scripts/2_extract_normal_dsine.sh
```
#### 3. Semantic masks (optional)
If you don't want to use the semantic masks, you can set **optim.loss_weight.semantic=0** and skip the mask generation.
You need to download the [code](https://github.com/IDEA-Research/Grounded-Segment-Anything) and model of Grounded-SAM first. Then, install the environment based on 'Install without Docker' in the [webside](https://github.com/IDEA-Research/Grounded-Segment-Anything). Next, modify **GSAM_PATH** to be the GSAM root directory, **DATADIR** to be the TNT root directory in the bash script. Run the following command to generate semantic masks:
```bash
sh bash_scripts/3_extract_mask.sh
```
## Other datasets
Please download the Mip-NeRF 360 dataset from the official [webiste](https://jonbarron.info/mipnerf360/), the preprocessed DTU dataset from [2DGS](https://drive.google.com/drive/folders/1SJFgt8qhQomHX55Q4xSvYE2C6-8tFll9). And extract normal maps with DSINE following the above scripts. You can also use [GeoWizard](https://github.com/fuxiao0719/GeoWizard) to extract normal maps by following the script: 'bash_scripts/4_extract_normal_geow.sh', and please install the corresponding environment and download the code as well as model weights first.
# Training and Evaluation
## From the scratch:
```
# you might need to update the data path in the script accordingly
# Tanks and Temples dataset
python python_scripts/run_tnt.py
# Mip-NeRF 360 dataset
python python_scripts/run_mipnerf360.py
```
## Only eval the metrics
We have uploaded the extracted meshes, you can download and eval them by yourselves ([TNT](https://huggingface.co/hanlin-chen/VCR-GauS/resolve/main/tnt_mesh.zip?download=true) and [DTU](https://huggingface.co/Chiller3/VCR-GauS/resolve/main/dtu_mesh.zip?download=true)). You might need to update the **mesh and data path** in the script accordingly. And set **do_train** and **do_extract_mesh** to be False.
```
# Tanks and Temples dataset
python python_scripts/run_tnt.py
# DTU dataset
python python_scripts/run_dtu.py
```
## Additional regularizations:
We also incorporate some regularizations, like depth distortion loss and normal consistency loss, following [2DGS](https://surfsplatting.github.io/) and [GOF](https://niujinshuchong.github.io/gaussian-opacity-fields/). You can play with it by:
- normal consistency loss: setting optim.loss_weight.consistent_normal > 0;
- depth distortion loss:
1. set optim.loss_weight.depth_var > 0
2. set NUM_DIST = 1 in submodules/diff-gaussian-rasterization/cuda_rasterizer/config.h, and reinstall diff-gaussian-rasterization
# Custom Dataset
We use the same data format from 3DGS, please follow [here](https://github.com/graphdeco-inria/gaussian-splatting?tab=readme-ov-file#processing-your-own-scenes) to prepare the your dataset. Then you can train your model and extract a mesh.
```
# Generate bounding box
python process_data/convert_data_to_json.py \
--scene_type outdoor \
--data_dir /your/data/path
# Extract normal maps
# Use DSINE:
python -W ignore process_data/extract_normal.py \
--dsine_path /your/dsine/code/path \
--ckpt /your/ckpt/path \
--img_path /your/data/path/images \
--intrins_path /your/data/path/ \
--output_path /your/data/path/normals
# Or use GeoWizard
python process_data/extract_normal_geo.py \
--code_path ${CODE_PATH} \
--input_dir /your/data/path/images/ \
--output_dir /your/data/path/ \
--ensemble_size 3 \
--denoise_steps 10 \
--seed 0 \
--domain ${DOMAIN_TYPE} # outdoor indoor object
# training
# --model.resolution=2 for using downsampled images with factor 2
# --model.use_decoupled_appearance=True to enable decoupled appearance modeling if your images has changing lighting conditions
python train.py \
--config=configs/reconstruct.yaml \
--logdir=/your/log/path/ \
--model.source_path=/your/data/path/ \
--model.data_device=cpu \
--model.resolution=2 \
--wandb \
--wandb_name vcr-gaus"
# extract the mesh after training
python tools/depth2mesh.py \
--voxel_size 5e-3 \
--max_depth 8 \
--clean \
--cfg_path /your/gaussian/path/config.yaml"
```
# Acknowledgements
This project is built upon [3DGS](https://github.com/graphdeco-inria/gaussian-splatting). Evaluation scripts for DTU and Tanks and Temples dataset are taken from [DTUeval-python](https://github.com/jzhangbs/DTUeval-python) and [TanksAndTemples](https://github.com/isl-org/TanksAndTemples/tree/master/python_toolbox/evaluation) respectively. We also utilize the normal estimation [DSINE](https://github.com/baegwangbin/DSINE) as well as [GeoWizard](https://fuxiao0719.github.io/projects/geowizard/), and semantic segmentation [SAM](https://github.com/facebookresearch/segment-anything) and [Grounded-SAM](https://github.com/IDEA-Research/Grounded-Segment-Anything?tab=readme-ov-file#install-without-docker). In addition, we use the pruning method in [LightGaussin](https://lightgaussian.github.io/). We thank all the authors for their great work and repos.
# Citation
If you find our code or paper useful, please cite
```bibtex
@article{chen2024vcr,
author = {Chen, Hanlin and Wei, Fangyin and Li, Chen and Huang, Tianxin and Wang, Yunsong and Lee, Gim Hee},
title = {VCR-GauS: View Consistent Depth-Normal Regularizer for Gaussian Surface Reconstruction},
journal = {arXiv preprint arXiv:2406.05774},
year = {2024},
}
```
If you find the flatten 3D Gaussian useful, please kindly cite
```bibtex
@article{chen2023neusg,
title={Neusg: Neural implicit surface reconstruction with 3d gaussian splatting guidance},
author={Chen, Hanlin and Li, Chen and Lee, Gim Hee},
journal={arXiv preprint arXiv:2312.00846},
year={2023}
}
```
================================================
FILE: arguments/__init__.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
from argparse import ArgumentParser, Namespace
import sys
import os
class GroupParams:
pass
class ParamGroup:
def __init__(self, parser: ArgumentParser, name : str, fill_none = False):
group = parser.add_argument_group(name)
for key, value in vars(self).items():
shorthand = False
if key.startswith("_"):
shorthand = True
key = key[1:]
t = type(value)
value = value if not fill_none else None
if shorthand:
if t == bool:
group.add_argument("--" + key, ("-" + key[0:1]), default=value, action="store_true")
else:
group.add_argument("--" + key, ("-" + key[0:1]), default=value, type=t)
else:
if t == bool:
group.add_argument("--" + key, default=value, action="store_true")
else:
group.add_argument("--" + key, default=value, type=t)
def extract(self, args):
group = GroupParams()
for arg in vars(args).items():
if arg[0] in vars(self) or ("_" + arg[0]) in vars(self):
setattr(group, arg[0], arg[1])
return group
class ModelParams(ParamGroup):
def __init__(self, parser, sentinel=False):
self.sh_degree = 3
self._source_path = ""
self._model_path = ""
self._images = "images"
self._resolution = -1
self._white_background = False
self.data_device = "cuda"
self.eval = False
super().__init__(parser, "Loading Parameters", sentinel)
def extract(self, args):
g = super().extract(args)
g.source_path = os.path.abspath(g.source_path)
return g
class PipelineParams(ParamGroup):
def __init__(self, parser):
self.convert_SHs_python = False
self.compute_cov3D_python = False
self.debug = False
super().__init__(parser, "Pipeline Parameters")
class OptimizationParams(ParamGroup):
def __init__(self, parser):
self.iterations = 30_000
self.position_lr_init = 0.00016
self.position_lr_final = 0.0000016
self.position_lr_delay_mult = 0.01
self.position_lr_max_steps = 30_000
self.feature_lr = 0.0025
self.opacity_lr = 0.05
self.scaling_lr = 0.005
self.rotation_lr = 0.001
self.percent_dense = 0.01
self.lambda_dssim = 0.2
self.densification_interval = 100
self.opacity_reset_interval = 3000
self.densify_from_iter = 500
self.densify_until_iter = 15_000
self.densify_grad_threshold = 0.0002
self.random_background = False
super().__init__(parser, "Optimization Parameters")
def get_combined_args(parser : ArgumentParser):
cmdlne_string = sys.argv[1:]
cfgfile_string = "Namespace()"
args_cmdline = parser.parse_args(cmdlne_string)
try:
cfgfilepath = os.path.join(args_cmdline.model_path, "cfg_args")
print("Looking for config file in", cfgfilepath)
with open(cfgfilepath) as cfg_file:
print("Config file found: {}".format(cfgfilepath))
cfgfile_string = cfg_file.read()
except TypeError:
print("Config file not found at")
pass
args_cfgfile = eval(cfgfile_string)
merged_dict = vars(args_cfgfile).copy()
for k,v in vars(args_cmdline).items():
if v != None:
merged_dict[k] = v
return Namespace(**merged_dict)
================================================
FILE: bash_scripts/0_train.sh
================================================
GPU=0
export CUDA_VISIBLE_DEVICES=${GPU}
ls
DATASET=tnt
SCENE=Barn
NAME=${SCENE}
PROJECT=vcr_gaus
TRIAL_NAME=vcr_gaus
CFG=configs/${DATASET}/${SCENE}.yaml
DIR=/your/log/path/${PROJECT}/${DATASET}/${NAME}/${TRIAL_NAME}
python train.py \
--config=${CFG} \
--port=-1 \
--logdir=${DIR} \
--model.source_path=/your/data/path/${DATASET}/${SCENE}/ \
--model.resolution=1 \
--model.data_device=cpu \
--wandb \
--wandb_name ${PROJECT}
================================================
FILE: bash_scripts/1_preprocess_tnt.sh
================================================
echo "Compute intrinsics, undistort images and generate json files. This may take a while"
python process_data/convert_tnt_to_json.py \
--tnt_path /your/data/path \
--run_colmap \
--export_json
================================================
FILE: bash_scripts/2_extract_normal_dsine.sh
================================================
export CUDA_VISIBLE_DEVICES=0
DOMAIN_TYPE=indoor
DATADIR=/your/data/path
CODE_PATH=/your/dsine/code/path
CKPT=/your/dsine/code/path/checkpoints/dsine.pt
for SCENE in Barn Caterpillar Courthouse Ignatius Meetingroom Truck;
do
SCENE_PATH=${DATADIR}/${SCENE}
# dsine
python -W ignore process_data/extract_normal.py \
--dsine_path ${CODE_PATH} \
--ckpt ${CKPT} \
--img_path ${SCENE_PATH}/images \
--intrins_path ${SCENE_PATH}/ \
--output_path ${SCENE_PATH}/normals
done
================================================
FILE: bash_scripts/3_extract_mask.sh
================================================
export CUDA_VISIBLE_DEVICES=0
DATADIR=/your/data/path
GSAM_PATH=~/code/gsam
CKPT_PATH=${GSAM_PATH}
for SCENE in Barn Caterpillar Courthouse Ignatius Meetingroom Truck;
do
SCENE_PATH=${DATADIR}/${SCENE}
# meething room scene_tye: indoor, others: outdoor
if [ ${SCENE} = "Meetingroom" ]; then
SCENE_TYPE="indoor"
else
SCENE_TYPE="outdoor"
fi
python -W ignore process_data/extract_mask.py \
--config ${GSAM_PATH}/GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
--grounded_checkpoint ${CKPT_PATH}/groundingdino_swint_ogc.pth \
--sam_hq_checkpoint ${CKPT_PATH}/sam_hq_vit_h.pth \
--gsam_path ${GSAM_PATH} \
--use_sam_hq \
--input_image ${SCENE_PATH}/images/ \
--output_dir ${SCENE_PATH}/masks \
--box_threshold 0.5 \
--text_threshold 0.2 \
--scene ${SCENE} \
--scene_type ${SCENE_TYPE} \
--device "cuda"
done
================================================
FILE: bash_scripts/4_extract_normal_geow.sh
================================================
export CUDA_VISIBLE_DEVICES=0
# DOMAIN_TYPE=outdoor
# DOMAIN_TYPE=indoor
DOMAIN_TYPE=object
DATADIR=/your/data/path/DTU_mask
CODE_PATH=/your/geowizard/path
for SCENE in scan106 scan114 scan122 scan37 scan55 scan65 scan83 scan105 scan110 scan118 scan24 scan40 scan63 scan69 scan97;
do
SCENE_PATH=${DATADIR}/${SCENE}
python process_data/extract_normal_geo.py \
--code_path ${CODE_PATH} \
--input_dir ${SCENE_PATH}/images/ \
--output_dir ${SCENE_PATH}/ \
--ensemble_size 3 \
--denoise_steps 10 \
--seed 0 \
--domain ${DOMAIN_TYPE}
done
================================================
FILE: bash_scripts/convert.sh
================================================
SCENE=Truck
DATA_ROOT=/your/data/path/${SCENE}
python convert.py -s $DATA_ROOT # [--resize] #If not resizing, ImageMagick is not needed
================================================
FILE: bash_scripts/install.sh
================================================
env=vcr
conda create -n $env -y python=3.10
conda activate $env
pip install -e ".[train]"
export CUDA_HOME=/usr/local/cuda-11.2
pip install -r requirements.txt
================================================
FILE: configs/360_v2/base.yaml
================================================
_parent_: configs/reconstruct.yaml
model:
eval: True
llffhold: 8
split: False
optim:
mask_depth_thr: 1
densify_large:
percent_dense: 5e-2
sample_cams:
random: False
num: 100
loss_weight:
semantic: 0
l1_scale: 1
================================================
FILE: configs/config.py
================================================
'''
-----------------------------------------------------------------------------
Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
NVIDIA CORPORATION and its licensors retain all intellectual property
and proprietary rights in and to this software, related documentation
and any modifications thereto. Any use, reproduction, disclosure or
distribution of this software and related documentation without an express
license agreement from NVIDIA CORPORATION is strictly prohibited.
-----------------------------------------------------------------------------
'''
import collections
import functools
import os
import re
import yaml
from tools.distributed import master_only_print as print
from tools.termcolor import cyan, green, yellow
DEBUG = False
USE_JIT = False
class AttrDict(dict):
"""Dict as attribute trick."""
def __init__(self, *args, **kwargs):
super(AttrDict, self).__init__(*args, **kwargs)
self.__dict__ = self
for key, value in self.__dict__.items():
if isinstance(value, dict):
self.__dict__[key] = AttrDict(value)
elif isinstance(value, (list, tuple)):
if value and isinstance(value[0], dict):
self.__dict__[key] = [AttrDict(item) for item in value]
else:
self.__dict__[key] = value
def yaml(self):
"""Convert object to yaml dict and return."""
yaml_dict = {}
for key, value in self.__dict__.items():
if isinstance(value, AttrDict):
yaml_dict[key] = value.yaml()
elif isinstance(value, list):
if value and isinstance(value[0], AttrDict):
new_l = []
for item in value:
new_l.append(item.yaml())
yaml_dict[key] = new_l
else:
yaml_dict[key] = value
else:
yaml_dict[key] = value
return yaml_dict
def __repr__(self):
"""Print all variables."""
ret_str = []
for key, value in self.__dict__.items():
if isinstance(value, AttrDict):
ret_str.append('{}:'.format(key))
child_ret_str = value.__repr__().split('\n')
for item in child_ret_str:
ret_str.append(' ' + item)
elif isinstance(value, list):
if value and isinstance(value[0], AttrDict):
ret_str.append('{}:'.format(key))
for item in value:
# Treat as AttrDict above.
child_ret_str = item.__repr__().split('\n')
for item in child_ret_str:
ret_str.append(' ' + item)
else:
ret_str.append('{}: {}'.format(key, value))
else:
ret_str.append('{}: {}'.format(key, value))
return '\n'.join(ret_str)
class Config(AttrDict):
r"""Configuration class. This should include every human specifiable
hyperparameter values for your training."""
def __init__(self, filename=None, verbose=False):
super(Config, self).__init__()
self.source_filename = filename
# Load the base configuration file.
base_filename = os.path.join(
os.path.dirname(__file__), './config_base.yaml'
)
cfg_base = self.load_config(base_filename)
recursive_update(self, cfg_base)
# Update with given configurations.
cfg_dict = self.load_config(filename)
recursive_update(self, cfg_dict)
if verbose:
print(' imaginaire config '.center(80, '-'))
print(self.__repr__())
print(''.center(80, '-'))
def load_config(self, filename):
# Update with given configurations.
assert os.path.exists(filename), f'File {filename} not exist.'
yaml_loader = yaml.SafeLoader
yaml_loader.add_implicit_resolver(
u'tag:yaml.org,2002:float',
re.compile(u'''^(?:
[-+]?(?:[0-9][0-9_]*)\\.[0-9_]*(?:[eE][-+]?[0-9]+)?
|[-+]?(?:[0-9][0-9_]*)(?:[eE][-+]?[0-9]+)
|\\.[0-9_]+(?:[eE][-+][0-9]+)?
|[-+]?[0-9][0-9_]*(?::[0-5]?[0-9])+\\.[0-9_]*
|[-+]?\\.(?:inf|Inf|INF)
|\\.(?:nan|NaN|NAN))$''', re.X),
list(u'-+0123456789.'))
try:
with open(filename) as file:
cfg_dict = yaml.load(file, Loader=yaml_loader)
cfg_dict = AttrDict(cfg_dict)
except EnvironmentError:
print(f'Please check the file with name of "{filename}"')
# Inherit configurations from parent
parent_key = "_parent_"
if parent_key in cfg_dict:
parent_filename = cfg_dict.pop(parent_key)
cfg_parent = self.load_config(parent_filename)
recursive_update(cfg_parent, cfg_dict)
cfg_dict = cfg_parent
return cfg_dict
def print_config(self, level=0):
"""Recursively print the configuration (with termcolor)."""
for key, value in sorted(self.items()):
if isinstance(value, (dict, Config)):
print(" " * level + cyan("* ") + green(key) + ":")
Config.print_config(value, level + 1)
else:
print(" " * level + cyan("* ") + green(key) + ":", yellow(value))
def save_config(self, logdir):
"""Save the final configuration to a yaml file."""
cfg_fname = f"{logdir}/config.yaml"
with open(cfg_fname, "w") as file:
yaml.safe_dump(self.yaml(), file, default_flow_style=False, indent=4)
def rsetattr(obj, attr, val):
"""Recursively find object and set value"""
pre, _, post = attr.rpartition('.')
return setattr(rgetattr(obj, pre) if pre else obj, post, val)
def rgetattr(obj, attr, *args):
"""Recursively find object and return value"""
def _getattr(obj, attr):
r"""Get attribute."""
return getattr(obj, attr, *args)
return functools.reduce(_getattr, [obj] + attr.split('.'))
def recursive_update(d, u):
"""Recursively update AttrDict d with AttrDict u"""
for key, value in u.items():
if isinstance(value, collections.abc.Mapping):
d.__dict__[key] = recursive_update(d.get(key, AttrDict({})), value)
elif isinstance(value, (list, tuple)):
if value and isinstance(value[0], dict):
d.__dict__[key] = [AttrDict(item) for item in value]
else:
d.__dict__[key] = value
else:
d.__dict__[key] = value
return d
def recursive_update_strict(d, u, stack=[]):
"""Recursively update AttrDict d with AttrDict u with strict matching"""
for key, value in u.items():
if key not in d:
key_full = ".".join(stack + [key])
raise KeyError(f"The input key '{key_full}; does not exist in the config files.")
if isinstance(value, collections.abc.Mapping):
d.__dict__[key] = recursive_update_strict(d.get(key, AttrDict({})), value, stack + [key])
elif isinstance(value, (list, tuple)):
if value and isinstance(value[0], dict):
d.__dict__[key] = [AttrDict(item) for item in value]
else:
d.__dict__[key] = value
else:
d.__dict__[key] = value
return d
def parse_cmdline_arguments(args):
"""
Parse arguments from command line.
Syntax: --key1.key2.key3=value --> value
--key1.key2.key3= --> None
--key1.key2.key3 --> True
--key1.key2.key3! --> False
"""
cfg_cmd = {}
for arg in args:
assert arg.startswith("--")
if "=" not in arg[2:]:
key_str, value = (arg[2:-1], "false") if arg[-1] == "!" else (arg[2:], "true")
else:
key_str, value = arg[2:].split("=")
keys_sub = key_str.split(".")
cfg_sub = cfg_cmd
for k in keys_sub[:-1]:
cfg_sub.setdefault(k, {})
cfg_sub = cfg_sub[k]
assert keys_sub[-1] not in cfg_sub, keys_sub[-1]
cfg_sub[keys_sub[-1]] = yaml.safe_load(value)
return cfg_cmd
================================================
FILE: configs/config_base.yaml
================================================
logdir: "/your/log/path/debug/"
ip: 127.0.0.1
port: -1
detect_anomaly: False
silent: 0
seed: 0
model:
sh_degree: 3
source_path: "/your/data/path/tnt/Barn/"
model_path: "/your/log/path/"
images: "images"
resolution: -1
white_background: False
data_device: "cuda"
eval: False
llffhold: 1
init_ply: "sparse/points3D.ply"
max_init_points:
split: False
sphere: False
load_depth: False
load_normal: False
load_mask: False
normal_folder: 'normals'
depth_folder: 'depths'
use_decoupled_appearance: False
ch_sem_feat: 0
num_cls: 0
max_mem: 22
load_mask: False
use_decoupled_appearance: False
use_decoupled_dnormal: False
ratio: 0
mesh:
voxel_size: 3e-3
depth_type: 'traditional'
optim:
iterations: 30000
position_lr_init: 0.00016
position_lr_final: 0.0000016
position_lr_delay_mult: 0.01
position_lr_max_steps: 30000
feature_lr: 0.0025
sdf_lr: 0.001
weight_decay: 1e-2
opacity_lr: 0.05
scaling_lr: 0.005
rotation_lr: 0.001
appearance_embeddings_lr: 0.001
appearance_network_lr: 0.001
cls_lr: 5e-4
percent_dense: 0.01
densification_interval: 100
opacity_reset_interval: 3000
densify_from_iter: 500
densify_until_iter: 15000
densify_grad_threshold: 0.0005
random_background: False
rand_pts: 20000
edge_thr: 0
mask_depth_thr: 0
loss_weight:
l1: 0.8
ssim: 0.2
distortion: 0.
semantic: 0
mono_depth: 0
mono_normal: 0
depth_normal: 0
prune:
iterations: []
percent: 0.5
decay: 0.6
v_pow: 0.1
pipline:
convert_SHs_python: False
compute_cov3D_python: False
debug: False
data:
name: dummy
train:
test_iterations: [7000, 30000]
save_iterations: [7000, 30000]
checkpoint_iterations: [30000]
save_splat: False
start_checkpoint:
debug_from: -1
================================================
FILE: configs/dtu/base.yaml
================================================
_parent_: configs/reconstruct.yaml
model:
use_decoupled_appearance: False
use_decoupled_dnormal: False
normal_folder: 'normal_npz_indoor'
eval: False
optim:
exp_t: 0.01
mask_depth_thr: 0
loss_weight:
l1_scale: 0.5
consistent_normal_from_iter: 15000
close_depth_from_iter: 15000
densify_large:
percent_dense: 1e-2
sample_cams:
random: False
num: 30
loss_weight:
semantic: 0
depth_normal: 0
mono_normal: 0.01
consistent_normal: 0.05
distortion: 1000
depth_var: 0
random_background: False
================================================
FILE: configs/dtu/dtu_scan24.yaml
================================================
_parent_: configs/dtu/base.yaml
================================================
FILE: configs/reconstruct.yaml
================================================
_parent_: configs/config_base.yaml
model:
load_mask: False
use_decoupled_appearance: False
use_decoupled_dnormal: False
ch_sem_feat: 2
num_cls: 2
depth_type: 'intersection'
optim:
mask_depth_thr: 0.8
edge_thr: 0
exp_t: 0.01
cos_thr: -1
close_depth_from_iter: 0
normal_from_iter: 0
dnormal_from_iter: 0
consistent_normal_from_iter: 0
curv_from_iter: 0
loss_weight:
l1: 0.8
ssim: 0.2
l1_scale: 1
entropy: 0
depth_var: 0.
mono_depth: 0
mono_normal: 0.01
depth_normal: 0.01
consistent_normal: 0
prune:
iterations: [15000, 25000]
percent: 0.5
decay: 0.6
v_pow: 0.1
densify_large:
percent_dense: 2e-3
interval: 1
sample_cams:
random: True
num: 200
up: True
around: True
look_mode: 'target'
random_background: True
train:
checkpoint_iterations: []
save_mesh: False
save_iterations: [30000]
================================================
FILE: configs/scannetpp/base.yaml
================================================
_parent_: configs/reconstruct.yaml
model:
split: True
eval: True
use_decoupled_appearance: False
use_decoupled_dnormal: False
mesh:
voxel_size: 1.5e-2
optim:
mask_depth_thr: 0
curv_from_iter: 15000
densify_large:
percent_dense: 1e-2
sample_cams:
random: False
loss_weight:
semantic: 0
curv: 0.05
================================================
FILE: configs/tnt/Barn.yaml
================================================
_parent_: configs/tnt/base.yaml
================================================
FILE: configs/tnt/Caterpillar.yaml
================================================
_parent_: configs/tnt/base.yaml
================================================
FILE: configs/tnt/Courthouse.yaml
================================================
_parent_: configs/tnt/base.yaml
================================================
FILE: configs/tnt/Ignatius.yaml
================================================
_parent_: configs/tnt/base.yaml
================================================
FILE: configs/tnt/Meetingroom.yaml
================================================
_parent_: configs/tnt/base.yaml
optim:
exp_t: 1e-3
mask_depth_thr: 0
densify_large:
percent_dense: 5e-3
sample_cams:
random: False
loss_weight:
semantic: 0
model:
num_cls: 3
use_decoupled_appearance: False
================================================
FILE: configs/tnt/Truck.yaml
================================================
_parent_: configs/tnt/base.yaml
================================================
FILE: configs/tnt/base.yaml
================================================
_parent_: configs/reconstruct.yaml
model:
use_decoupled_appearance: True
use_decoupled_dnormal: False
eval: False
llffhold: 5
optim:
exp_t: 5e-3
loss_weight:
depth_normal: 0.015
semantic: 0.005
l1_scale: 1
================================================
FILE: environment.yml
================================================
name: fast_render
channels:
- pytorch
- nvidia
- conda-forge
- defaults
dependencies:
- python=3.10
- pytorch==2.0.1
- torchvision==0.15.2
- torchaudio==2.0.2
- pytorch-cuda=11.8
- pip:
- open3d
- plyfile
- ninja
- GPUtil
- opencv-python
- lpips
- trimesh
- pymeshlab
- termcolor
- wandb
- imageio
- scikit-image
- torchmetrics
- mediapy
- "git+https://github.com/facebookresearch/pytorch3d.git"
- submodules/diff-gaussian-rasterization
- submodules/simple-knn
================================================
FILE: evaluation/crop_mesh.py
================================================
import os
import json
import plyfile
import argparse
# import open3d as o3d
import numpy as np
# from tqdm import tqdm
import trimesh
from sklearn.cluster import DBSCAN
def align_gt_with_cam(pts, trans):
trans_inv = np.linalg.inv(trans)
pts_aligned = pts @ trans_inv[:3, :3].transpose(-1, -2) + trans_inv[:3, -1]
return pts_aligned
def main(args):
assert os.path.exists(args.ply_path), f"PLY file {args.ply_path} does not exist."
gt_trans = np.loadtxt(args.align_path)
mesh_rec = trimesh.load(args.ply_path, process=False)
mesh_gt = trimesh.load(args.gt_path, process=False)
mesh_gt.vertices = align_gt_with_cam(mesh_gt.vertices, gt_trans)
to_align, _ = trimesh.bounds.oriented_bounds(mesh_gt)
mesh_gt.vertices = (to_align[:3, :3] @ mesh_gt.vertices.T + to_align[:3, 3:]).T
mesh_rec.vertices = (to_align[:3, :3] @ mesh_rec.vertices.T + to_align[:3, 3:]).T
min_points = mesh_gt.vertices.min(axis=0)
max_points = mesh_gt.vertices.max(axis=0)
mask_min = (mesh_rec.vertices - min_points[None]) > 0
mask_max = (mesh_rec.vertices - max_points[None]) < 0
mask = np.concatenate((mask_min, mask_max), axis=1).all(axis=1)
face_mask = mask[mesh_rec.faces].all(axis=1)
mesh_rec.update_vertices(mask)
mesh_rec.update_faces(face_mask)
mesh_rec.vertices = (to_align[:3, :3].T @ mesh_rec.vertices.T - to_align[:3, :3].T @ to_align[:3, 3:]).T
mesh_gt.vertices = (to_align[:3, :3].T @ mesh_gt.vertices.T - to_align[:3, :3].T @ to_align[:3, 3:]).T
# save mesh_rec and mesh_rec in args.out_path
mesh_rec.export(args.out_path)
# downsample mesh_gt
idx = np.random.choice(np.arange(len(mesh_gt.vertices)), 5000000)
mesh_gt.vertices = mesh_gt.vertices[idx]
mesh_gt.colors = mesh_gt.colors[idx]
mesh_gt.export(args.gt_path.replace('.ply', '_trans.ply'))
return
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument(
"--gt_path",
type=str,
default='/your/path//Barn_GT.ply',
help="path to a dataset/scene directory containing X.json, X.ply, ...",
)
parser.add_argument(
"--align_path",
type=str,
default='/your/path//Barn_trans.txt',
help="path to a dataset/scene directory containing X.json, X.ply, ...",
)
parser.add_argument(
"--ply_path",
type=str,
default='/your/path//Barn_lowres.ply',
help="path to reconstruction ply file",
)
parser.add_argument(
"--scene",
type=str,
default='Barn',
help="path to reconstruction ply file",
)
parser.add_argument(
"--out_path",
type=str,
default='/your/path//Barn_lowres_crop.ply',
help=
"output directory, default: an evaluation directory is created in the directory of the ply file",
)
args = parser.parse_args()
main(args)
================================================
FILE: evaluation/eval_dtu/eval.py
================================================
# adapted from https://github.com/jzhangbs/DTUeval-python
import numpy as np
import open3d as o3d
import sklearn.neighbors as skln
from tqdm import tqdm
from scipy.io import loadmat
import multiprocessing as mp
import argparse
def sample_single_tri(input_):
n1, n2, v1, v2, tri_vert = input_
c = np.mgrid[:n1+1, :n2+1]
c += 0.5
c[0] /= max(n1, 1e-7)
c[1] /= max(n2, 1e-7)
c = np.transpose(c, (1,2,0))
k = c[c.sum(axis=-1) < 1] # m2
q = v1 * k[:,:1] + v2 * k[:,1:] + tri_vert
return q
def write_vis_pcd(file, points, colors):
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(points)
pcd.colors = o3d.utility.Vector3dVector(colors)
o3d.io.write_point_cloud(file, pcd)
if __name__ == '__main__':
mp.freeze_support()
parser = argparse.ArgumentParser()
parser.add_argument('--data', type=str, default='data_in.ply')
parser.add_argument('--scan', type=int, default=1)
parser.add_argument('--mode', type=str, default='mesh', choices=['mesh', 'pcd'])
parser.add_argument('--dataset_dir', type=str, default='.')
parser.add_argument('--vis_out_dir', type=str, default='.')
parser.add_argument('--downsample_density', type=float, default=0.2)
parser.add_argument('--patch_size', type=float, default=60)
parser.add_argument('--max_dist', type=float, default=20)
parser.add_argument('--visualize_threshold', type=float, default=10)
args = parser.parse_args()
thresh = args.downsample_density
if args.mode == 'mesh':
pbar = tqdm(total=9)
pbar.set_description('read data mesh')
data_mesh = o3d.io.read_triangle_mesh(args.data)
vertices = np.asarray(data_mesh.vertices)
triangles = np.asarray(data_mesh.triangles)
tri_vert = vertices[triangles]
pbar.update(1)
pbar.set_description('sample pcd from mesh')
v1 = tri_vert[:,1] - tri_vert[:,0]
v2 = tri_vert[:,2] - tri_vert[:,0]
l1 = np.linalg.norm(v1, axis=-1, keepdims=True)
l2 = np.linalg.norm(v2, axis=-1, keepdims=True)
area2 = np.linalg.norm(np.cross(v1, v2), axis=-1, keepdims=True)
non_zero_area = (area2 > 0)[:,0]
l1, l2, area2, v1, v2, tri_vert = [
arr[non_zero_area] for arr in [l1, l2, area2, v1, v2, tri_vert]
]
thr = thresh * np.sqrt(l1 * l2 / area2)
n1 = np.floor(l1 / thr)
n2 = np.floor(l2 / thr)
with mp.Pool() as mp_pool:
new_pts = mp_pool.map(sample_single_tri, ((n1[i,0], n2[i,0], v1[i:i+1], v2[i:i+1], tri_vert[i:i+1,0]) for i in range(len(n1))), chunksize=1024)
new_pts = np.concatenate(new_pts, axis=0)
data_pcd = np.concatenate([vertices, new_pts], axis=0)
elif args.mode == 'pcd':
pbar = tqdm(total=8)
pbar.set_description('read data pcd')
data_pcd_o3d = o3d.io.read_point_cloud(args.data)
data_pcd = np.asarray(data_pcd_o3d.points)
pbar.update(1)
pbar.set_description('random shuffle pcd index')
shuffle_rng = np.random.default_rng()
shuffle_rng.shuffle(data_pcd, axis=0)
pbar.update(1)
pbar.set_description('downsample pcd')
nn_engine = skln.NearestNeighbors(n_neighbors=1, radius=thresh, algorithm='kd_tree', n_jobs=-1)
nn_engine.fit(data_pcd)
rnn_idxs = nn_engine.radius_neighbors(data_pcd, radius=thresh, return_distance=False)
mask = np.ones(data_pcd.shape[0], dtype=np.bool_)
for curr, idxs in enumerate(rnn_idxs):
if mask[curr]:
mask[idxs] = 0
mask[curr] = 1
data_down = data_pcd[mask]
pbar.update(1)
pbar.set_description('masking data pcd')
obs_mask_file = loadmat(f'{args.dataset_dir}/ObsMask/ObsMask{args.scan}_10.mat')
ObsMask, BB, Res = [obs_mask_file[attr] for attr in ['ObsMask', 'BB', 'Res']]
BB = BB.astype(np.float32)
patch = args.patch_size
inbound = ((data_down >= BB[:1]-patch) & (data_down < BB[1:]+patch*2)).sum(axis=-1) ==3
data_in = data_down[inbound]
data_grid = np.around((data_in - BB[:1]) / Res).astype(np.int32)
grid_inbound = ((data_grid >= 0) & (data_grid < np.expand_dims(ObsMask.shape, 0))).sum(axis=-1) ==3
data_grid_in = data_grid[grid_inbound]
in_obs = ObsMask[data_grid_in[:,0], data_grid_in[:,1], data_grid_in[:,2]].astype(np.bool_)
data_in_obs = data_in[grid_inbound][in_obs]
pbar.update(1)
pbar.set_description('read STL pcd')
stl_pcd = o3d.io.read_point_cloud(f'{args.dataset_dir}/Points/stl/stl{args.scan:03}_total.ply')
stl = np.asarray(stl_pcd.points)
pbar.update(1)
pbar.set_description('compute data2stl')
nn_engine.fit(stl)
dist_d2s, idx_d2s = nn_engine.kneighbors(data_in_obs, n_neighbors=1, return_distance=True)
max_dist = args.max_dist
mean_d2s = dist_d2s[dist_d2s < max_dist].mean()
pbar.update(1)
pbar.set_description('compute stl2data')
ground_plane = loadmat(f'{args.dataset_dir}/ObsMask/Plane{args.scan}.mat')['P']
stl_hom = np.concatenate([stl, np.ones_like(stl[:,:1])], -1)
above = (ground_plane.reshape((1,4)) * stl_hom).sum(-1) > 0
stl_above = stl[above]
nn_engine.fit(data_in)
dist_s2d, idx_s2d = nn_engine.kneighbors(stl_above, n_neighbors=1, return_distance=True)
mean_s2d = dist_s2d[dist_s2d < max_dist].mean()
pbar.update(1)
pbar.set_description('visualize error')
vis_dist = args.visualize_threshold
R = np.array([[1,0,0]], dtype=np.float64)
G = np.array([[0,1,0]], dtype=np.float64)
B = np.array([[0,0,1]], dtype=np.float64)
W = np.array([[1,1,1]], dtype=np.float64)
data_color = np.tile(B, (data_down.shape[0], 1))
data_alpha = dist_d2s.clip(max=vis_dist) / vis_dist
data_color[ np.where(inbound)[0][grid_inbound][in_obs] ] = R * data_alpha + W * (1-data_alpha)
data_color[ np.where(inbound)[0][grid_inbound][in_obs][dist_d2s[:,0] >= max_dist] ] = G
write_vis_pcd(f'{args.vis_out_dir}/vis_{args.scan:03}_d2s.ply', data_down, data_color)
stl_color = np.tile(B, (stl.shape[0], 1))
stl_alpha = dist_s2d.clip(max=vis_dist) / vis_dist
stl_color[ np.where(above)[0] ] = R * stl_alpha + W * (1-stl_alpha)
stl_color[ np.where(above)[0][dist_s2d[:,0] >= max_dist] ] = G
write_vis_pcd(f'{args.vis_out_dir}/vis_{args.scan:03}_s2d.ply', stl, stl_color)
pbar.update(1)
pbar.set_description('done')
pbar.close()
over_all = (mean_d2s + mean_s2d) / 2
print(mean_d2s, mean_s2d, over_all)
import json
with open(f'{args.vis_out_dir}/results.json', 'w') as fp:
json.dump({
'mean_d2s': mean_d2s,
'mean_s2d': mean_s2d,
'overall': over_all,
}, fp, indent=True)
================================================
FILE: evaluation/eval_dtu/evaluate_single_scene.py
================================================
import torch
import torch.nn as nn
import torch.nn.functional as F
import cv2
import numpy as np
import os
import glob
from skimage.morphology import binary_dilation, disk
import argparse
import trimesh
from pathlib import Path
from tqdm import tqdm
import sys
sys.path.append(os.getcwd())
import evaluation.eval_dtu.render_utils as rend_util
def cull_scan(scan, mesh_path, result_mesh_file, instance_dir):
# load poses
image_dir = '{0}/images'.format(instance_dir)
image_paths = sorted(glob.glob(os.path.join(image_dir, "*.png")))
n_images = len(image_paths)
cam_file = '{0}/cameras.npz'.format(instance_dir)
camera_dict = np.load(cam_file)
scale_mats = [camera_dict['scale_mat_%d' % idx].astype(np.float32) for idx in range(n_images)]
world_mats = [camera_dict['world_mat_%d' % idx].astype(np.float32) for idx in range(n_images)]
intrinsics_all = []
pose_all = []
for scale_mat, world_mat in zip(scale_mats, world_mats):
P = world_mat @ scale_mat
P = P[:3, :4]
intrinsics, pose = rend_util.load_K_Rt_from_P(None, P)
intrinsics_all.append(torch.from_numpy(intrinsics).float())
pose_all.append(torch.from_numpy(pose).float())
# load mask
mask_dir = '{0}/mask'.format(instance_dir)
mask_paths = sorted(glob.glob(os.path.join(mask_dir, "*.png")))
masks = []
for p in mask_paths:
mask = cv2.imread(p)
masks.append(mask)
# hard-coded image shape
W, H = 1600, 1200
# load mesh
mesh = trimesh.load(mesh_path)
# load transformation matrix
vertices = mesh.vertices
# project and filter
vertices = torch.from_numpy(vertices).cuda()
vertices = torch.cat((vertices, torch.ones_like(vertices[:, :1])), dim=-1)
vertices = vertices.permute(1, 0)
vertices = vertices.float()
sampled_masks = []
for i in tqdm(range(n_images), desc="Culling mesh given masks"):
pose = pose_all[i]
w2c = torch.inverse(pose).cuda()
intrinsic = intrinsics_all[i].cuda()
with torch.no_grad():
# transform and project
cam_points = intrinsic @ w2c @ vertices
pix_coords = cam_points[:2, :] / (cam_points[2, :].unsqueeze(0) + 1e-6)
pix_coords = pix_coords.permute(1, 0)
pix_coords[..., 0] /= W - 1
pix_coords[..., 1] /= H - 1
pix_coords = (pix_coords - 0.5) * 2
valid = ((pix_coords > -1. ) & (pix_coords < 1.)).all(dim=-1).float()
# dialate mask similar to unisurf
maski = masks[i][:, :, 0].astype(np.float32) / 256.
maski = torch.from_numpy(binary_dilation(maski, disk(24))).float()[None, None].cuda()
sampled_mask = F.grid_sample(maski, pix_coords[None, None], mode='nearest', padding_mode='zeros', align_corners=True)[0, -1, 0]
sampled_mask = sampled_mask + (1. - valid)
sampled_masks.append(sampled_mask)
sampled_masks = torch.stack(sampled_masks, -1)
# filter
mask = (sampled_masks > 0.).all(dim=-1).cpu().numpy()
face_mask = mask[mesh.faces].all(axis=1)
mesh.update_vertices(mask)
mesh.update_faces(face_mask)
# transform vertices to world
scale_mat = scale_mats[0]
mesh.vertices = mesh.vertices * scale_mat[0, 0] + scale_mat[:3, 3][None]
# Taking the biggest connected component
print("Taking the biggest connected component")
components = mesh.split(only_watertight=False)
areas = np.array([c.area for c in components], dtype=np.float32)
mesh = components[areas.argmax()]
mesh.export(result_mesh_file)
del mesh
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description='Arguments to evaluate the mesh.'
)
parser.add_argument('--input_mesh', type=str, help='path to the mesh to be evaluated')
parser.add_argument('--scan_id', type=str, help='scan id of the input mesh')
parser.add_argument('--output_dir', type=str, default='evaluation_results_single', help='path to the output folder')
parser.add_argument('--mask_dir', type=str, default='mask', help='path to uncropped mask')
parser.add_argument('--DTU', type=str, default='Offical_DTU_Dataset', help='path to the GT DTU point clouds')
args = parser.parse_args()
Offical_DTU_Dataset = args.DTU
out_dir = args.output_dir
Path(out_dir).mkdir(parents=True, exist_ok=True)
scan = args.scan_id
ply_file = args.input_mesh
print("cull mesh ....")
result_mesh_file = os.path.join(out_dir, "culled_mesh.ply")
cull_scan(scan, ply_file, result_mesh_file, instance_dir=os.path.join(args.mask_dir, f'scan{args.scan_id}'))
script_dir = os.path.dirname(os.path.abspath(__file__))
cmd = f"python {script_dir}/eval.py --data {result_mesh_file} --scan {scan} --mode mesh --dataset_dir {Offical_DTU_Dataset} --vis_out_dir {out_dir}"
os.system(cmd)
================================================
FILE: evaluation/eval_dtu/render_utils.py
================================================
import numpy as np
import imageio
import skimage
import cv2
import torch
from torch.nn import functional as F
def get_psnr(img1, img2, normalize_rgb=False):
if normalize_rgb: # [-1,1] --> [0,1]
img1 = (img1 + 1.) / 2.
img2 = (img2 + 1. ) / 2.
mse = torch.mean((img1 - img2) ** 2)
psnr = -10. * torch.log(mse) / torch.log(torch.Tensor([10.]).cuda())
return psnr
def load_rgb(path, normalize_rgb = False):
img = imageio.imread(path)
img = skimage.img_as_float32(img)
if normalize_rgb: # [-1,1] --> [0,1]
img -= 0.5
img *= 2.
img = img.transpose(2, 0, 1)
return img
def load_K_Rt_from_P(filename, P=None):
if P is None:
lines = open(filename).read().splitlines()
if len(lines) == 4:
lines = lines[1:]
lines = [[x[0], x[1], x[2], x[3]] for x in (x.split(" ") for x in lines)]
P = np.asarray(lines).astype(np.float32).squeeze()
out = cv2.decomposeProjectionMatrix(P)
K = out[0]
R = out[1]
t = out[2]
K = K/K[2,2]
intrinsics = np.eye(4)
intrinsics[:3, :3] = K
pose = np.eye(4, dtype=np.float32)
pose[:3, :3] = R.transpose()
pose[:3,3] = (t[:3] / t[3])[:,0]
return intrinsics, pose
def get_camera_params(uv, pose, intrinsics):
if pose.shape[1] == 7: #In case of quaternion vector representation
cam_loc = pose[:, 4:]
R = quat_to_rot(pose[:,:4])
p = torch.eye(4).repeat(pose.shape[0],1,1).cuda().float()
p[:, :3, :3] = R
p[:, :3, 3] = cam_loc
else: # In case of pose matrix representation
cam_loc = pose[:, :3, 3]
p = pose
batch_size, num_samples, _ = uv.shape
depth = torch.ones((batch_size, num_samples)).cuda()
x_cam = uv[:, :, 0].view(batch_size, -1)
y_cam = uv[:, :, 1].view(batch_size, -1)
z_cam = depth.view(batch_size, -1)
pixel_points_cam = lift(x_cam, y_cam, z_cam, intrinsics=intrinsics)
# permute for batch matrix product
pixel_points_cam = pixel_points_cam.permute(0, 2, 1)
world_coords = torch.bmm(p, pixel_points_cam).permute(0, 2, 1)[:, :, :3]
ray_dirs = world_coords - cam_loc[:, None, :]
ray_dirs = F.normalize(ray_dirs, dim=2)
return ray_dirs, cam_loc
def get_camera_for_plot(pose):
if pose.shape[1] == 7: #In case of quaternion vector representation
cam_loc = pose[:, 4:].detach()
R = quat_to_rot(pose[:,:4].detach())
else: # In case of pose matrix representation
cam_loc = pose[:, :3, 3]
R = pose[:, :3, :3]
cam_dir = R[:, :3, 2]
return cam_loc, cam_dir
def lift(x, y, z, intrinsics):
# parse intrinsics
intrinsics = intrinsics.cuda()
fx = intrinsics[:, 0, 0]
fy = intrinsics[:, 1, 1]
cx = intrinsics[:, 0, 2]
cy = intrinsics[:, 1, 2]
sk = intrinsics[:, 0, 1]
x_lift = (x - cx.unsqueeze(-1) + cy.unsqueeze(-1)*sk.unsqueeze(-1)/fy.unsqueeze(-1) - sk.unsqueeze(-1)*y/fy.unsqueeze(-1)) / fx.unsqueeze(-1) * z
y_lift = (y - cy.unsqueeze(-1)) / fy.unsqueeze(-1) * z
# homogeneous
return torch.stack((x_lift, y_lift, z, torch.ones_like(z).cuda()), dim=-1)
def quat_to_rot(q):
batch_size, _ = q.shape
q = F.normalize(q, dim=1)
R = torch.ones((batch_size, 3,3)).cuda()
qr=q[:,0]
qi = q[:, 1]
qj = q[:, 2]
qk = q[:, 3]
R[:, 0, 0]=1-2 * (qj**2 + qk**2)
R[:, 0, 1] = 2 * (qj *qi -qk*qr)
R[:, 0, 2] = 2 * (qi * qk + qr * qj)
R[:, 1, 0] = 2 * (qj * qi + qk * qr)
R[:, 1, 1] = 1-2 * (qi**2 + qk**2)
R[:, 1, 2] = 2*(qj*qk - qi*qr)
R[:, 2, 0] = 2 * (qk * qi-qj * qr)
R[:, 2, 1] = 2 * (qj*qk + qi*qr)
R[:, 2, 2] = 1-2 * (qi**2 + qj**2)
return R
def rot_to_quat(R):
batch_size, _,_ = R.shape
q = torch.ones((batch_size, 4)).cuda()
R00 = R[:, 0,0]
R01 = R[:, 0, 1]
R02 = R[:, 0, 2]
R10 = R[:, 1, 0]
R11 = R[:, 1, 1]
R12 = R[:, 1, 2]
R20 = R[:, 2, 0]
R21 = R[:, 2, 1]
R22 = R[:, 2, 2]
q[:,0]=torch.sqrt(1.0+R00+R11+R22)/2
q[:, 1]=(R21-R12)/(4*q[:,0])
q[:, 2] = (R02 - R20) / (4 * q[:, 0])
q[:, 3] = (R10 - R01) / (4 * q[:, 0])
return q
def get_sphere_intersections(cam_loc, ray_directions, r = 1.0):
# Input: n_rays x 3 ; n_rays x 3
# Output: n_rays x 1, n_rays x 1 (close and far)
ray_cam_dot = torch.bmm(ray_directions.view(-1, 1, 3),
cam_loc.view(-1, 3, 1)).squeeze(-1)
under_sqrt = ray_cam_dot ** 2 - (cam_loc.norm(2, 1, keepdim=True) ** 2 - r ** 2)
# sanity check
if (under_sqrt <= 0).sum() > 0:
print('BOUNDING SPHERE PROBLEM!')
exit()
sphere_intersections = torch.sqrt(under_sqrt) * torch.Tensor([-1, 1]).cuda().float() - ray_cam_dot
sphere_intersections = sphere_intersections.clamp_min(0.0)
return sphere_intersections
================================================
FILE: evaluation/eval_tnt.py
================================================
import os
import trimesh
import argparse
import numpy as np
import open3d as o3d
from sklearn.neighbors import KDTree
def nn_correspondance(verts1, verts2):
indices = []
distances = []
if len(verts1) == 0 or len(verts2) == 0:
return indices, distances
kdtree = KDTree(verts1)
distances, indices = kdtree.query(verts2)
distances = distances.reshape(-1)
return distances
def evaluate(mesh_pred, mesh_trgt, threshold=.05, down_sample=.02):
pcd_trgt = o3d.geometry.PointCloud()
pcd_pred = o3d.geometry.PointCloud()
pcd_trgt.points = o3d.utility.Vector3dVector(mesh_trgt.vertices[:, :3])
pcd_pred.points = o3d.utility.Vector3dVector(mesh_pred.vertices[:, :3])
if down_sample:
pcd_pred = pcd_pred.voxel_down_sample(down_sample)
pcd_trgt = pcd_trgt.voxel_down_sample(down_sample)
verts_pred = np.asarray(pcd_pred.points)
verts_trgt = np.asarray(pcd_trgt.points)
dist1 = nn_correspondance(verts_pred, verts_trgt)
dist2 = nn_correspondance(verts_trgt, verts_pred)
precision = np.mean((dist2 < threshold).astype('float'))
recal = np.mean((dist1 < threshold).astype('float'))
fscore = 2 * precision * recal / (precision + recal)
metrics = {
'Acc': np.mean(dist2),
'Comp': np.mean(dist1),
'Prec': precision,
'Recal': recal,
'F-score': fscore,
}
return metrics
def main(args):
assert os.path.exists(args.ply_path), f"PLY file {args.ply_path} does not exist."
mesh_rec = trimesh.load(args.ply_path, process=False)
mesh_gt = trimesh.load(args.gt_path, process=False)
to_align, _ = trimesh.bounds.oriented_bounds(mesh_gt)
mesh_gt.vertices = (to_align[:3, :3] @ mesh_gt.vertices.T + to_align[:3, 3:]).T
mesh_rec.vertices = (to_align[:3, :3] @ mesh_rec.vertices.T + to_align[:3, 3:]).T
min_points = mesh_gt.vertices.min(axis=0)
max_points = mesh_gt.vertices.max(axis=0)
mask_min = (mesh_rec.vertices - min_points[None]) > 0
mask_max = (mesh_rec.vertices - max_points[None]) < 0
mask = np.concatenate((mask_min, mask_max), axis=1).all(axis=1)
face_mask = mask[mesh_rec.faces].all(axis=1)
mesh_rec.update_vertices(mask)
mesh_rec.update_faces(face_mask)
metrics = evaluate(mesh_rec, mesh_gt)
metrics_path = os.path.join(os.path.dirname(args.ply_path), 'metrics.txt')
with open(metrics_path, 'w') as f:
for k, v in metrics.items():
f.write(f'{k}: {v}\n')
print('Scene: {} F-score: {}'.format(args.scene, metrics['F-score']))
mesh_rec.vertices = (to_align[:3, :3].T @ mesh_rec.vertices.T - to_align[:3, :3].T @ to_align[:3, 3:]).T
mesh_rec.export(args.ply_path.replace('.ply', '_crop.ply'))
return
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument(
"--gt_path",
type=str,
default='/your/path//Barn_GT.ply',
help="path to a dataset/scene directory containing X.json, X.ply, ...",
)
parser.add_argument(
"--ply_path",
type=str,
default='/your/path//Barn_lowres.ply',
help="path to reconstruction ply file",
)
parser.add_argument(
"--scene",
type=str,
default='Barn',
help="path to reconstruction ply file",
)
args = parser.parse_args()
main(args)
================================================
FILE: evaluation/full_eval.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import os
from argparse import ArgumentParser
mipnerf360_outdoor_scenes = ["bicycle", "flowers", "garden", "stump", "treehill"]
mipnerf360_indoor_scenes = ["room", "counter", "kitchen", "bonsai"]
tanks_and_temples_scenes = ["truck", "train"]
deep_blending_scenes = ["drjohnson", "playroom"]
parser = ArgumentParser(description="Full evaluation script parameters")
parser.add_argument("--skip_training", action="store_true")
parser.add_argument("--skip_rendering", action="store_true")
parser.add_argument("--skip_metrics", action="store_true")
parser.add_argument("--output_path", default="./eval")
args, _ = parser.parse_known_args()
all_scenes = []
all_scenes.extend(mipnerf360_outdoor_scenes)
all_scenes.extend(mipnerf360_indoor_scenes)
all_scenes.extend(tanks_and_temples_scenes)
all_scenes.extend(deep_blending_scenes)
if not args.skip_training or not args.skip_rendering:
parser.add_argument('--mipnerf360', "-m360", required=True, type=str)
parser.add_argument("--tanksandtemples", "-tat", required=True, type=str)
parser.add_argument("--deepblending", "-db", required=True, type=str)
args = parser.parse_args()
if not args.skip_training:
common_args = " --quiet --eval --test_iterations -1 "
for scene in mipnerf360_outdoor_scenes:
source = args.mipnerf360 + "/" + scene
os.system("python train.py -s " + source + " -i images_4 -m " + args.output_path + "/" + scene + common_args)
for scene in mipnerf360_indoor_scenes:
source = args.mipnerf360 + "/" + scene
os.system("python train.py -s " + source + " -i images_2 -m " + args.output_path + "/" + scene + common_args)
for scene in tanks_and_temples_scenes:
source = args.tanksandtemples + "/" + scene
os.system("python train.py -s " + source + " -m " + args.output_path + "/" + scene + common_args)
for scene in deep_blending_scenes:
source = args.deepblending + "/" + scene
os.system("python train.py -s " + source + " -m " + args.output_path + "/" + scene + common_args)
if not args.skip_rendering:
all_sources = []
for scene in mipnerf360_outdoor_scenes:
all_sources.append(args.mipnerf360 + "/" + scene)
for scene in mipnerf360_indoor_scenes:
all_sources.append(args.mipnerf360 + "/" + scene)
for scene in tanks_and_temples_scenes:
all_sources.append(args.tanksandtemples + "/" + scene)
for scene in deep_blending_scenes:
all_sources.append(args.deepblending + "/" + scene)
common_args = " --quiet --eval --skip_train"
for scene, source in zip(all_scenes, all_sources):
os.system("python render.py --iteration 7000 -s " + source + " -m " + args.output_path + "/" + scene + common_args)
os.system("python render.py --iteration 30000 -s " + source + " -m " + args.output_path + "/" + scene + common_args)
if not args.skip_metrics:
scenes_string = ""
for scene in all_scenes:
scenes_string += "\"" + args.output_path + "/" + scene + "\" "
os.system("python metrics.py -m " + scenes_string)
================================================
FILE: evaluation/lpipsPyTorch/__init__.py
================================================
import torch
from .modules.lpips import LPIPS
def lpips(x: torch.Tensor,
y: torch.Tensor,
net_type: str = 'alex',
version: str = '0.1'):
r"""Function that measures
Learned Perceptual Image Patch Similarity (LPIPS).
Arguments:
x, y (torch.Tensor): the input tensors to compare.
net_type (str): the network type to compare the features:
'alex' | 'squeeze' | 'vgg'. Default: 'alex'.
version (str): the version of LPIPS. Default: 0.1.
"""
device = x.device
criterion = LPIPS(net_type, version).to(device)
return criterion(x, y)
================================================
FILE: evaluation/lpipsPyTorch/modules/lpips.py
================================================
import torch
import torch.nn as nn
from .networks import get_network, LinLayers
from .utils import get_state_dict
class LPIPS(nn.Module):
r"""Creates a criterion that measures
Learned Perceptual Image Patch Similarity (LPIPS).
Arguments:
net_type (str): the network type to compare the features:
'alex' | 'squeeze' | 'vgg'. Default: 'alex'.
version (str): the version of LPIPS. Default: 0.1.
"""
def __init__(self, net_type: str = 'alex', version: str = '0.1'):
assert version in ['0.1'], 'v0.1 is only supported now'
super(LPIPS, self).__init__()
# pretrained network
self.net = get_network(net_type)
# linear layers
self.lin = LinLayers(self.net.n_channels_list)
self.lin.load_state_dict(get_state_dict(net_type, version))
def forward(self, x: torch.Tensor, y: torch.Tensor):
feat_x, feat_y = self.net(x), self.net(y)
diff = [(fx - fy) ** 2 for fx, fy in zip(feat_x, feat_y)]
res = [l(d).mean((2, 3), True) for d, l in zip(diff, self.lin)]
return torch.sum(torch.cat(res, 0), 0, True)
================================================
FILE: evaluation/lpipsPyTorch/modules/networks.py
================================================
from typing import Sequence
from itertools import chain
import torch
import torch.nn as nn
from torchvision import models
from .utils import normalize_activation
def get_network(net_type: str):
if net_type == 'alex':
return AlexNet()
elif net_type == 'squeeze':
return SqueezeNet()
elif net_type == 'vgg':
return VGG16()
else:
raise NotImplementedError('choose net_type from [alex, squeeze, vgg].')
class LinLayers(nn.ModuleList):
def __init__(self, n_channels_list: Sequence[int]):
super(LinLayers, self).__init__([
nn.Sequential(
nn.Identity(),
nn.Conv2d(nc, 1, 1, 1, 0, bias=False)
) for nc in n_channels_list
])
for param in self.parameters():
param.requires_grad = False
class BaseNet(nn.Module):
def __init__(self):
super(BaseNet, self).__init__()
# register buffer
self.register_buffer(
'mean', torch.Tensor([-.030, -.088, -.188])[None, :, None, None])
self.register_buffer(
'std', torch.Tensor([.458, .448, .450])[None, :, None, None])
def set_requires_grad(self, state: bool):
for param in chain(self.parameters(), self.buffers()):
param.requires_grad = state
def z_score(self, x: torch.Tensor):
return (x - self.mean) / self.std
def forward(self, x: torch.Tensor):
x = self.z_score(x)
output = []
for i, (_, layer) in enumerate(self.layers._modules.items(), 1):
x = layer(x)
if i in self.target_layers:
output.append(normalize_activation(x))
if len(output) == len(self.target_layers):
break
return output
class SqueezeNet(BaseNet):
def __init__(self):
super(SqueezeNet, self).__init__()
self.layers = models.squeezenet1_1(True).features
self.target_layers = [2, 5, 8, 10, 11, 12, 13]
self.n_channels_list = [64, 128, 256, 384, 384, 512, 512]
self.set_requires_grad(False)
class AlexNet(BaseNet):
def __init__(self):
super(AlexNet, self).__init__()
self.layers = models.alexnet(True).features
self.target_layers = [2, 5, 8, 10, 12]
self.n_channels_list = [64, 192, 384, 256, 256]
self.set_requires_grad(False)
class VGG16(BaseNet):
def __init__(self):
super(VGG16, self).__init__()
self.layers = models.vgg16(weights=models.VGG16_Weights.IMAGENET1K_V1).features
self.target_layers = [4, 9, 16, 23, 30]
self.n_channels_list = [64, 128, 256, 512, 512]
self.set_requires_grad(False)
================================================
FILE: evaluation/lpipsPyTorch/modules/utils.py
================================================
from collections import OrderedDict
import torch
def normalize_activation(x, eps=1e-10):
norm_factor = torch.sqrt(torch.sum(x ** 2, dim=1, keepdim=True))
return x / (norm_factor + eps)
def get_state_dict(net_type: str = 'alex', version: str = '0.1'):
# build url
url = 'https://raw.githubusercontent.com/richzhang/PerceptualSimilarity/' \
+ f'master/lpips/weights/v{version}/{net_type}.pth'
# download
old_state_dict = torch.hub.load_state_dict_from_url(
url, progress=True,
map_location=None if torch.cuda.is_available() else torch.device('cpu')
)
# rename keys
new_state_dict = OrderedDict()
for key, val in old_state_dict.items():
new_key = key
new_key = new_key.replace('lin', '')
new_key = new_key.replace('model.', '')
new_state_dict[new_key] = val
return new_state_dict
================================================
FILE: evaluation/metrics.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import os
import sys
import json
import torch
from PIL import Image
from tqdm import tqdm
from pathlib import Path
import torchvision.transforms.functional as tf
sys.path.append(os.getcwd())
from tools.loss_utils import ssim
from lpipsPyTorch import lpips
from tools.image_utils import psnr
from argparse import ArgumentParser
from configs.config import Config
from tools.general_utils import set_random_seed
def readImages(renders_dir, gt_dir):
renders = []
gts = []
image_names = []
for fname in os.listdir(renders_dir):
render = Image.open(renders_dir / fname)
gt = Image.open(gt_dir / fname)
renders.append(tf.to_tensor(render).unsqueeze(0)[:, :3, :, :].cuda())
gts.append(tf.to_tensor(gt).unsqueeze(0)[:, :3, :, :].cuda())
image_names.append(fname)
return renders, gts, image_names
def evaluate(model_paths):
full_dict = {}
per_view_dict = {}
full_dict_polytopeonly = {}
per_view_dict_polytopeonly = {}
print("")
for scene_dir in model_paths:
try:
print("Scene:", scene_dir)
full_dict[scene_dir] = {}
per_view_dict[scene_dir] = {}
full_dict_polytopeonly[scene_dir] = {}
per_view_dict_polytopeonly[scene_dir] = {}
test_dir = Path(scene_dir) / "test"
for method in os.listdir(test_dir):
print("Method:", method)
full_dict[scene_dir][method] = {}
per_view_dict[scene_dir][method] = {}
full_dict_polytopeonly[scene_dir][method] = {}
per_view_dict_polytopeonly[scene_dir][method] = {}
method_dir = test_dir / method
gt_dir = method_dir/ "gt"
renders_dir = method_dir / "renders"
renders, gts, image_names = readImages(renders_dir, gt_dir)
ssims = []
psnrs = []
lpipss = []
for idx in tqdm(range(len(renders)), desc="Metric evaluation progress"):
ssims.append(ssim(renders[idx], gts[idx]))
psnrs.append(psnr(renders[idx], gts[idx]))
lpipss.append(lpips(renders[idx], gts[idx], net_type='vgg'))
full_dict[scene_dir][method].update({"SSIM": torch.tensor(ssims).mean().item(),
"PSNR": torch.tensor(psnrs).mean().item(),
"LPIPS": torch.tensor(lpipss).mean().item()})
per_view_dict[scene_dir][method].update({"SSIM": {name: ssim for ssim, name in zip(torch.tensor(ssims).tolist(), image_names)},
"PSNR": {name: psnr for psnr, name in zip(torch.tensor(psnrs).tolist(), image_names)},
"LPIPS": {name: lp for lp, name in zip(torch.tensor(lpipss).tolist(), image_names)}})
with open(scene_dir + "/results.json", 'w') as fp:
json.dump(full_dict[scene_dir], fp, indent=True)
with open(scene_dir + "/per_view.json", 'w') as fp:
json.dump(per_view_dict[scene_dir], fp, indent=True)
except:
print("Unable to compute metrics for model", scene_dir)
if __name__ == "__main__":
device = torch.device("cuda:0")
torch.cuda.set_device(device)
# Set up command line argument parser
parser = ArgumentParser(description="Training script parameters")
parser.add_argument('--cfg_path', type=str, default='configs/config_base.yaml')
args = parser.parse_args()
cfg = Config(args.cfg_path)
cfg.model.data_device = 'cpu'
cfg.model.load_normal = False
set_random_seed(cfg.seed)
evaluate([cfg.model.model_path])
================================================
FILE: evaluation/render.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import os
import sys
import torch
import torchvision
from tqdm import tqdm
from argparse import ArgumentParser
sys.path.append(os.getcwd())
from scene import Scene
from gaussian_renderer import render, render_fast
from gaussian_renderer import GaussianModel
from configs.config import Config
from tools.general_utils import set_random_seed
from tools.loss_utils import cos_weight
def render_set(model_path, name, iteration, views, gaussians, cfg, background):
render_path = os.path.join(model_path, name, "ours_{}".format(iteration), "renders")
gts_path = os.path.join(model_path, name, "ours_{}".format(iteration), "gt")
os.makedirs(render_path, exist_ok=True)
os.makedirs(gts_path, exist_ok=True)
alphas = []
for idx, view in enumerate(tqdm(views, desc="Rendering progress")):
outs = render(view, gaussians, cfg, background)
# outs = render_fast(view, gaussians, cfg, background)
rendering = outs["render"]
gt = view.original_image[0:3, :, :]
torchvision.utils.save_image(rendering, os.path.join(render_path, '{0:05d}'.format(idx) + ".png"))
torchvision.utils.save_image(gt, os.path.join(gts_path, '{0:05d}'.format(idx) + ".png"))
alphas.append(outs["alpha"].detach().clone().view(-1).cpu())
if False:
normal_map = outs["normal"].detach().clone()
normal_gt = view.normal.cuda()
cos = cos_weight(normal_gt, normal_map, cfg.optim.exp_t, cfg.optim.cos_thr)
torchvision.utils.save_image(cos, os.path.join(render_path, '{0:05d}_cosine'.format(idx) + ".png"))
# alphas = torch.cat(alphas, dim=0)
# print("Alpha min: {}, max: {}".format(alphas.min(), alphas.max()))
# print("Alpha mean: {}, std: {}".format(alphas.mean(), alphas.std()))
# print("Alpha median: {}".format(alphas.median()))
def render_sets(cfg, iteration : int, skip_train : bool, skip_test : bool):
with torch.no_grad():
gaussians = GaussianModel(cfg.model)
scene = Scene(cfg.model, gaussians, load_iteration=iteration, shuffle=False)
# gaussians.extent = scene.cameras_extent
bg_color = [1,1,1] if cfg.model.white_background else [0, 0, 0]
background = torch.tensor(bg_color, dtype=torch.float32, device="cuda")
if not skip_train:
render_set(cfg.model.model_path, "train", scene.loaded_iter, scene.getTrainCameras(), gaussians, cfg, background)
if not skip_test:
render_set(cfg.model.model_path, "test", scene.loaded_iter, scene.getTestCameras(), gaussians, cfg, background)
if __name__ == "__main__":
# Set up command line argument parser
parser = ArgumentParser()
parser.add_argument('--cfg_path', type=str, default='configs/config_base.yaml')
parser.add_argument("--iteration", default=-1, type=int)
parser.add_argument("--skip_train", action="store_true")
parser.add_argument("--skip_test", action="store_true")
args = parser.parse_args()
cfg = Config(args.cfg_path)
cfg.model.data_device = 'cuda'
cfg.model.load_normal = False
cfg.model.load_mask = False
set_random_seed(cfg.seed)
# Initialize system state (RNG)
# safe_state(args.quiet)
render_sets(cfg, args.iteration, args.skip_train, args.skip_test)
================================================
FILE: evaluation/tnt_eval/README.md
================================================
# Python Toolbox for Evaluation
This Python script evaluates **training** dataset of TanksAndTemples benchmark.
The script requires ``Open3D`` and a few Python packages such as ``matplotlib``, ``json``, and ``numpy``.
## How to use:
**Step 0**. Reconstruct 3D models and recover camera poses from the training dataset.
The raw videos of the training dataset can be found from:
https://tanksandtemples.org/download/
**Step 1**. Download evaluation data (ground truth geometry + reference reconstruction) using
[this link](https://drive.google.com/open?id=1UoKPiUUsKa0AVHFOrnMRhc5hFngjkE-t). In this example, we regard ``TanksAndTemples/evaluation/data/`` as a dataset folder.
**Step 2**. Install Open3D. Follow instructions in http://open3d.org/docs/getting_started.html
**Step 3**. Run the evaluation script and grab some coffee.
```
python run.py --dataset-dir path/to/TanksAndTemples/evaluation/data/Ignatius --traj-path path/to/TanksAndTemples/evaluation/data/Ignatius/Ignatius_COLMAP_SfM.log --ply-path path/to/TanksAndTemples/evaluation/data/Ignatius/Ignatius_COLMAP.ply
```
Output (evaluation of Ignatius):
```
===========================
Evaluating Ignatius
===========================
path/to/TanksAndTemples/evaluation/data/Ignatius/Ignatius_COLMAP.ply
Reading PLY: [========================================] 100%
Read PointCloud: 6929586 vertices.
path/to/TanksAndTemples/evaluation/data/Ignatius/Ignatius.ply
Reading PLY: [========================================] 100%
:
ICP Iteration #0: Fitness 0.9980, RMSE 0.0044
ICP Iteration #1: Fitness 0.9980, RMSE 0.0043
ICP Iteration #2: Fitness 0.9980, RMSE 0.0043
ICP Iteration #3: Fitness 0.9980, RMSE 0.0043
ICP Iteration #4: Fitness 0.9980, RMSE 0.0042
ICP Iteration #5: Fitness 0.9980, RMSE 0.0042
ICP Iteration #6: Fitness 0.9979, RMSE 0.0042
ICP Iteration #7: Fitness 0.9979, RMSE 0.0042
ICP Iteration #8: Fitness 0.9979, RMSE 0.0042
ICP Iteration #9: Fitness 0.9979, RMSE 0.0042
ICP Iteration #10: Fitness 0.9979, RMSE 0.0042
[EvaluateHisto]
Cropping geometry: [========================================] 100%
Pointcloud down sampled from 6929586 points to 1449840 points.
Pointcloud down sampled from 1449840 points to 1365628 points.
path/to/TanksAndTemples/evaluation/data/Ignatius/evaluation//Ignatius.precision.ply
Cropping geometry: [========================================] 100%
Pointcloud down sampled from 5016769 points to 4957123 points.
Pointcloud down sampled from 4957123 points to 4181506 points.
[compute_point_cloud_to_point_cloud_distance]
[compute_point_cloud_to_point_cloud_distance]
:
[ViewDistances] Add color coding to visualize error
[ViewDistances] Add color coding to visualize error
:
[get_f1_score_histo2]
==============================
evaluation result : Ignatius
==============================
distance tau : 0.003
precision : 0.7679
recall : 0.7937
f-score : 0.7806
==============================
```
**Step 5**. Go to the evaluation folder. ``TanksAndTemples/evaluation/data/Ignatius/evaluation/`` will have the following outputs.
``PR_Ignatius_@d_th_0_0030.pdf`` (Precision and recall curves with a F-score)
|
|
|
|--|--|
| ``Ignatius.precision.ply`` | ``Ignatius.recall.ply`` |
(3D visualization of precision and recall. Each mesh is color coded using hot colormap)
# Requirements
- Python 3
- open3d v0.9.0
- matplotlib
================================================
FILE: evaluation/tnt_eval/config.py
================================================
# ----------------------------------------------------------------------------
# - TanksAndTemples Website Toolbox -
# - http://www.tanksandtemples.org -
# ----------------------------------------------------------------------------
# The MIT License (MIT)
#
# Copyright (c) 2017
# Arno Knapitsch
# Jaesik Park
# Qian-Yi Zhou
# Vladlen Koltun
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
# ----------------------------------------------------------------------------
# some global parameters - do not modify
scenes_tau_dict = {
"Barn": 0.01,
"Caterpillar": 0.005,
"Church": 0.025,
"Courthouse": 0.025,
"Ignatius": 0.003,
"Meetingroom": 0.01,
"Truck": 0.005,
}
================================================
FILE: evaluation/tnt_eval/evaluation.py
================================================
# ----------------------------------------------------------------------------
# - TanksAndTemples Website Toolbox -
# - http://www.tanksandtemples.org -
# ----------------------------------------------------------------------------
# The MIT License (MIT)
#
# Copyright (c) 2017
# Arno Knapitsch
# Jaesik Park
# Qian-Yi Zhou
# Vladlen Koltun
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
# ----------------------------------------------------------------------------
#
# This python script is for downloading dataset from www.tanksandtemples.org
# The dataset has a different license, please refer to
# https://tanksandtemples.org/license/
import json
import copy
import os
import numpy as np
import open3d as o3d
import matplotlib.pyplot as plt
def read_alignment_transformation(filename):
with open(filename) as data_file:
data = json.load(data_file)
return np.asarray(data["transformation"]).reshape((4, 4)).transpose()
def write_color_distances(path, pcd, distances, max_distance):
o3d.utility.set_verbosity_level(o3d.utility.VerbosityLevel.Debug)
# cmap = plt.get_cmap("afmhot")
cmap = plt.get_cmap("hot_r")
distances = np.array(distances)
colors = cmap(np.minimum(distances, max_distance) / max_distance)[:, :3]
pcd.colors = o3d.utility.Vector3dVector(colors)
o3d.io.write_point_cloud(path, pcd)
def EvaluateHisto(
source,
target,
trans,
crop_volume,
voxel_size,
threshold,
filename_mvs,
plot_stretch,
scene_name,
verbose=True,
):
print("[EvaluateHisto]")
o3d.utility.set_verbosity_level(o3d.utility.VerbosityLevel.Debug)
s = copy.deepcopy(source)
s.transform(trans)
s = crop_volume.crop_point_cloud(s)
s = s.voxel_down_sample(voxel_size)
s.estimate_normals(search_param=o3d.geometry.KDTreeSearchParamKNN(knn=20))
print(filename_mvs + "/" + scene_name + ".precision.ply")
t = copy.deepcopy(target)
t = crop_volume.crop_point_cloud(t)
t = t.voxel_down_sample(voxel_size)
t.estimate_normals(search_param=o3d.geometry.KDTreeSearchParamKNN(knn=20))
print("[compute_point_cloud_to_point_cloud_distance]")
distance1 = s.compute_point_cloud_distance(t)
print("[compute_point_cloud_to_point_cloud_distance]")
distance2 = t.compute_point_cloud_distance(s)
# write the distances to bin files
# np.array(distance1).astype("float64").tofile(
# filename_mvs + "/" + scene_name + ".precision.bin"
# )
# np.array(distance2).astype("float64").tofile(
# filename_mvs + "/" + scene_name + ".recall.bin"
# )
# Colorize the poincloud files prith the precision and recall values
# o3d.io.write_point_cloud(
# filename_mvs + "/" + scene_name + ".precision.ply", s
# )
# o3d.io.write_point_cloud(
# filename_mvs + "/" + scene_name + ".precision.ncb.ply", s
# )
# o3d.io.write_point_cloud(filename_mvs + "/" + scene_name + ".recall.ply", t)
source_n_fn = filename_mvs + "/" + scene_name + ".precision.ply"
target_n_fn = filename_mvs + "/" + scene_name + ".recall.ply"
print("[ViewDistances] Add color coding to visualize error")
# eval_str_viewDT = (
# OPEN3D_EXPERIMENTAL_BIN_PATH
# + "ViewDistances "
# + source_n_fn
# + " --max_distance "
# + str(threshold * 3)
# + " --write_color_back --without_gui"
# )
# os.system(eval_str_viewDT)
write_color_distances(source_n_fn, s, distance1, 3 * threshold)
print("[ViewDistances] Add color coding to visualize error")
# eval_str_viewDT = (
# OPEN3D_EXPERIMENTAL_BIN_PATH
# + "ViewDistances "
# + target_n_fn
# + " --max_distance "
# + str(threshold * 3)
# + " --write_color_back --without_gui"
# )
# os.system(eval_str_viewDT)
write_color_distances(target_n_fn, t, distance2, 3 * threshold)
# get histogram and f-score
[
precision,
recall,
fscore,
edges_source,
cum_source,
edges_target,
cum_target,
] = get_f1_score_histo2(threshold, filename_mvs, plot_stretch, distance1,
distance2)
np.savetxt(filename_mvs + "/" + scene_name + ".recall.txt", cum_target)
np.savetxt(filename_mvs + "/" + scene_name + ".precision.txt", cum_source)
np.savetxt(
filename_mvs + "/" + scene_name + ".prf_tau_plotstr.txt",
np.array([precision, recall, fscore, threshold, plot_stretch]),
)
return [
precision,
recall,
fscore,
edges_source,
cum_source,
edges_target,
cum_target,
]
def get_f1_score_histo2(threshold,
filename_mvs,
plot_stretch,
distance1,
distance2,
verbose=True):
print("[get_f1_score_histo2]")
dist_threshold = threshold
if len(distance1) and len(distance2):
recall = float(sum(d < threshold for d in distance2)) / float(
len(distance2))
precision = float(sum(d < threshold for d in distance1)) / float(
len(distance1))
fscore = 2 * recall * precision / (recall + precision)
num = len(distance1)
bins = np.arange(0, dist_threshold * plot_stretch, dist_threshold / 100)
hist, edges_source = np.histogram(distance1, bins)
cum_source = np.cumsum(hist).astype(float) / num
num = len(distance2)
bins = np.arange(0, dist_threshold * plot_stretch, dist_threshold / 100)
hist, edges_target = np.histogram(distance2, bins)
cum_target = np.cumsum(hist).astype(float) / num
else:
precision = 0
recall = 0
fscore = 0
edges_source = np.array([0])
cum_source = np.array([0])
edges_target = np.array([0])
cum_target = np.array([0])
return [
precision,
recall,
fscore,
edges_source,
cum_source,
edges_target,
cum_target,
]
================================================
FILE: evaluation/tnt_eval/plot.py
================================================
# ----------------------------------------------------------------------------
# - TanksAndTemples Website Toolbox -
# - http://www.tanksandtemples.org -
# ----------------------------------------------------------------------------
# The MIT License (MIT)
#
# Copyright (c) 2017
# Arno Knapitsch
# Jaesik Park
# Qian-Yi Zhou
# Vladlen Koltun
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
# ----------------------------------------------------------------------------
#
# This python script is for downloading dataset from www.tanksandtemples.org
# The dataset has a different license, please refer to
# https://tanksandtemples.org/license/
import matplotlib.pyplot as plt
from cycler import cycler
def plot_graph(
scene,
fscore,
dist_threshold,
edges_source,
cum_source,
edges_target,
cum_target,
plot_stretch,
mvs_outpath,
show_figure=False,
):
f = plt.figure()
plt_size = [14, 7]
pfontsize = "medium"
ax = plt.subplot(111)
label_str = "precision"
ax.plot(
edges_source[1::],
cum_source * 100,
c="red",
label=label_str,
linewidth=2.0,
)
label_str = "recall"
ax.plot(
edges_target[1::],
cum_target * 100,
c="blue",
label=label_str,
linewidth=2.0,
)
ax.grid(True)
plt.rcParams["figure.figsize"] = plt_size
plt.rc("axes", prop_cycle=cycler("color", ["r", "g", "b", "y"]))
plt.title("Precision and Recall: " + scene + ", " + "%02.2f f-score" %
(fscore * 100))
plt.axvline(x=dist_threshold, c="black", ls="dashed", linewidth=2.0)
plt.ylabel("# of points (%)", fontsize=15)
plt.xlabel("Meters", fontsize=15)
plt.axis([0, dist_threshold * plot_stretch, 0, 100])
ax.legend(shadow=True, fancybox=True, fontsize=pfontsize)
# plt.axis([0, dist_threshold*plot_stretch, 0, 100])
plt.setp(ax.get_legend().get_texts(), fontsize=pfontsize)
plt.legend(loc=2, borderaxespad=0.0, fontsize=pfontsize)
plt.legend(loc=4)
leg = plt.legend(loc="lower right")
box = ax.get_position()
ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])
# Put a legend to the right of the current axis
ax.legend(loc="center left", bbox_to_anchor=(1, 0.5))
plt.setp(ax.get_legend().get_texts(), fontsize=pfontsize)
png_name = mvs_outpath + "/PR_{0}_@d_th_0_{1}.png".format(
scene, "%04d" % (dist_threshold * 10000))
pdf_name = mvs_outpath + "/PR_{0}_@d_th_0_{1}.pdf".format(
scene, "%04d" % (dist_threshold * 10000))
# save figure and display
f.savefig(png_name, format="png", bbox_inches="tight")
f.savefig(pdf_name, format="pdf", bbox_inches="tight")
if show_figure:
plt.show()
================================================
FILE: evaluation/tnt_eval/registration.py
================================================
# ----------------------------------------------------------------------------
# - TanksAndTemples Website Toolbox -
# - http://www.tanksandtemples.org -
# ----------------------------------------------------------------------------
# The MIT License (MIT)
#
# Copyright (c) 2017
# Arno Knapitsch
# Jaesik Park
# Qian-Yi Zhou
# Vladlen Koltun
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
# ----------------------------------------------------------------------------
#
# This python script is for downloading dataset from www.tanksandtemples.org
# The dataset has a different license, please refer to
# https://tanksandtemples.org/license/
from trajectory_io import read_trajectory, convert_trajectory_to_pointcloud
import copy
import numpy as np
import open3d as o3d
MAX_POINT_NUMBER = 4e6
def read_mapping(filename):
mapping = []
with open(filename, "r") as f:
n_sampled_frames = int(f.readline())
n_total_frames = int(f.readline())
mapping = np.zeros(shape=(n_sampled_frames, 2))
metastr = f.readline()
for iter in range(n_sampled_frames):
metadata = list(map(int, metastr.split()))
mapping[iter, :] = metadata
metastr = f.readline()
return [n_sampled_frames, n_total_frames, mapping]
def gen_sparse_trajectory(mapping, f_trajectory):
sparse_traj = []
for m in mapping:
sparse_traj.append(f_trajectory[int(m[1] - 1)])
return sparse_traj
def trajectory_alignment(map_file, traj_to_register, gt_traj_col, gt_trans,
scene):
traj_pcd_col = convert_trajectory_to_pointcloud(gt_traj_col)
traj_pcd_col.transform(gt_trans)
corres = o3d.utility.Vector2iVector(
np.asarray(list(map(lambda x: [x, x], range(len(gt_traj_col))))))
rr = o3d.registration.RANSACConvergenceCriteria()
rr.max_iteration = 100000
rr.max_validation = 100000
# in this case a log file was used which contains
# every movie frame (see tutorial for details)
if len(traj_to_register) > 1600:
n_sampled_frames, n_total_frames, mapping = read_mapping(map_file)
traj_col2 = gen_sparse_trajectory(mapping, traj_to_register)
traj_to_register_pcd = convert_trajectory_to_pointcloud(traj_col2)
else:
traj_to_register_pcd = convert_trajectory_to_pointcloud(
traj_to_register)
randomvar = 0.0
nr_of_cam_pos = len(traj_to_register_pcd.points)
rand_number_added = np.asanyarray(traj_to_register_pcd.points) * (
np.random.rand(nr_of_cam_pos, 3) * randomvar - randomvar / 2.0 + 1)
list_rand = list(rand_number_added)
traj_to_register_pcd_rand = o3d.geometry.PointCloud()
for elem in list_rand:
traj_to_register_pcd_rand.points.append(elem)
# Rough registration based on aligned colmap SfM data
reg = o3d.registration.registration_ransac_based_on_correspondence(
traj_to_register_pcd_rand,
traj_pcd_col,
corres,
0.2,
o3d.registration.TransformationEstimationPointToPoint(True),
6,
rr,
)
return reg.transformation
def crop_and_downsample(
pcd,
crop_volume,
down_sample_method="voxel",
voxel_size=0.01,
trans=np.identity(4),
):
pcd_copy = copy.deepcopy(pcd)
pcd_copy.transform(trans)
pcd_crop = crop_volume.crop_point_cloud(pcd_copy)
if down_sample_method == "voxel":
# return voxel_down_sample(pcd_crop, voxel_size)
return pcd_crop.voxel_down_sample(voxel_size)
elif down_sample_method == "uniform":
n_points = len(pcd_crop.points)
if n_points > MAX_POINT_NUMBER:
ds_rate = int(round(n_points / float(MAX_POINT_NUMBER)))
return pcd_crop.uniform_down_sample(ds_rate)
return pcd_crop
def registration_unif(
source,
gt_target,
init_trans,
crop_volume,
threshold,
max_itr,
max_size=4 * MAX_POINT_NUMBER,
verbose=True,
):
if verbose:
print("[Registration] threshold: %f" % threshold)
o3d.utility.set_verbosity_level(o3d.utility.VerbosityLevel.Debug)
s = crop_and_downsample(source,
crop_volume,
down_sample_method="uniform",
trans=init_trans)
t = crop_and_downsample(gt_target,
crop_volume,
down_sample_method="uniform")
reg = o3d.registration.registration_icp(
s,
t,
threshold,
np.identity(4),
o3d.registration.TransformationEstimationPointToPoint(True),
o3d.registration.ICPConvergenceCriteria(1e-6, max_itr),
)
reg.transformation = np.matmul(reg.transformation, init_trans)
return reg
def registration_vol_ds(
source,
gt_target,
init_trans,
crop_volume,
voxel_size,
threshold,
max_itr,
verbose=True,
):
if verbose:
print("[Registration] voxel_size: %f, threshold: %f" %
(voxel_size, threshold))
o3d.utility.set_verbosity_level(o3d.utility.VerbosityLevel.Debug)
s = crop_and_downsample(
source,
crop_volume,
down_sample_method="voxel",
voxel_size=voxel_size,
trans=init_trans,
)
t = crop_and_downsample(
gt_target,
crop_volume,
down_sample_method="voxel",
voxel_size=voxel_size,
)
s = crop_based_target(s, t)
reg = o3d.registration.registration_icp(
s,
t,
threshold,
np.identity(4),
o3d.registration.TransformationEstimationPointToPoint(True),
o3d.registration.ICPConvergenceCriteria(1e-6, max_itr),
)
reg.transformation = np.matmul(reg.transformation, init_trans)
return reg
def crop_based_target(s, t):
bbox_t = t.get_axis_aligned_bounding_box()
min_bound = bbox_t.get_min_bound()
max_bound = bbox_t.get_max_bound()
s_filtered = o3d.geometry.PointCloud()
valid = np.logical_and(np.all(s.points >= min_bound, axis=1), np.all(s.points <= max_bound, axis=1))
s_filtered.points = o3d.utility.Vector3dVector(np.asarray(s.points)[valid])
return s_filtered
================================================
FILE: evaluation/tnt_eval/requirements.txt
================================================
matplotlib>=1.3
open3d==0.9
================================================
FILE: evaluation/tnt_eval/run.py
================================================
# ----------------------------------------------------------------------------
# - TanksAndTemples Website Toolbox -
# - http://www.tanksandtemples.org -
# ----------------------------------------------------------------------------
# The MIT License (MIT)
#
# Copyright (c) 2017
# Arno Knapitsch
# Jaesik Park
# Qian-Yi Zhou
# Vladlen Koltun
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
# ----------------------------------------------------------------------------
#
# This python script is for downloading dataset from www.tanksandtemples.org
# The dataset has a different license, please refer to
# https://tanksandtemples.org/license/
# this script requires Open3D python binding
# please follow the intructions in setup.py before running this script.
import numpy as np
import open3d as o3d
import os
import argparse
import sys
sys.path.append(os.getcwd())
from config import scenes_tau_dict
from registration import (
trajectory_alignment,
registration_vol_ds,
registration_unif,
read_trajectory,
)
from evaluation import EvaluateHisto
from util import make_dir
from plot import plot_graph
def run_evaluation(dataset_dir, traj_path, ply_path, out_dir):
scene = os.path.basename(os.path.normpath(dataset_dir))
if scene not in scenes_tau_dict:
print(dataset_dir, scene)
raise Exception("invalid dataset-dir, not in scenes_tau_dict")
print("")
print("===========================")
print("Evaluating %s" % scene)
print("===========================")
dTau = scenes_tau_dict[scene]
# put the crop-file, the GT file, the COLMAP SfM log file and
# the alignment of the according scene in a folder of
# the same scene name in the dataset_dir
colmap_ref_logfile = os.path.join(dataset_dir, scene + "_COLMAP_SfM.log")
alignment = os.path.join(dataset_dir, scene + "_trans.txt")
gt_filen = os.path.join(dataset_dir, scene + ".ply")
# gt_filen = os.path.join(dataset_dir, scene + "_GT.ply")
cropfile = os.path.join(dataset_dir, scene + ".json")
map_file = os.path.join(dataset_dir, scene + "_mapping_reference.txt")
make_dir(out_dir)
assert os.path.exists(ply_path), f"ply_path {ply_path} does not exist"
# Load reconstruction and according GT
print(gt_filen)
gt_pcd = o3d.io.read_point_cloud(gt_filen)
print(ply_path)
# pcd = o3d.io.read_point_cloud(ply_path)
mesh = o3d.io.read_triangle_mesh(ply_path)
pcd = mesh.sample_points_uniformly(len(gt_pcd.points))
gt_trans = np.loadtxt(alignment)
traj_to_register = read_trajectory(traj_path)
gt_traj_col = read_trajectory(colmap_ref_logfile)
trajectory_transform = trajectory_alignment(map_file, traj_to_register,
gt_traj_col, gt_trans, scene)
# Refine alignment by using the actual GT and MVS pointclouds
vol = o3d.visualization.read_selection_polygon_volume(cropfile)
# big pointclouds will be downlsampled to this number to speed up alignment
dist_threshold = dTau
# Registration refinment in 3 iterations
r2 = registration_vol_ds(pcd, gt_pcd, trajectory_transform, vol, dTau,
dTau * 80, 20)
r3 = registration_vol_ds(pcd, gt_pcd, r2.transformation, vol, dTau / 2.0,
dTau * 20, 20)
r = registration_unif(pcd, gt_pcd, r3.transformation, vol, 2 * dTau, 20)
# Histogramms and P/R/F1
plot_stretch = 5
[
precision,
recall,
fscore,
edges_source,
cum_source,
edges_target,
cum_target,
] = EvaluateHisto(
pcd,
gt_pcd,
r.transformation,
vol,
dTau / 2.0,
dTau,
out_dir,
plot_stretch,
scene,
)
eva = [precision, recall, fscore]
# eva = [i*100 for i in eva]
print("==============================")
print("evaluation result : %s" % scene)
print("==============================")
print("distance tau : %.3f" % dTau)
print("precision : %.4f" % eva[0])
print("recall : %.4f" % eva[1])
print("f-score : %.4f" % eva[2])
print("==============================")
with open(os.path.join(out_dir, "evaluation.txt"), "w") as f:
f.write("evaluation result : %s\n" % scene)
f.write("distance tau : %.3f\n" % dTau)
f.write("precision : %.4f\n" % eva[0])
f.write("recall : %.4f\n" % eva[1])
f.write("f-score : %.4f\n" % eva[2])
# Plotting
plot_graph(
scene,
fscore,
dist_threshold,
edges_source,
cum_source,
edges_target,
cum_target,
plot_stretch,
out_dir,
)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--dataset-dir",
type=str,
required=True,
help="path to a dataset/scene directory containing X.json, X.ply, ...",
)
parser.add_argument(
"--traj-path",
type=str,
required=True,
help=
"path to trajectory file. See `convert_to_logfile.py` to create this file.",
)
parser.add_argument(
"--ply-path",
type=str,
required=True,
help="path to reconstruction ply file",
)
parser.add_argument(
"--out-dir",
type=str,
default="",
help=
"output directory, default: an evaluation directory is created in the directory of the ply file",
)
args = parser.parse_args()
if args.out_dir.strip() == "":
args.out_dir = os.path.join(os.path.dirname(args.ply_path),
"evaluation")
run_evaluation(
dataset_dir=args.dataset_dir,
traj_path=args.traj_path,
ply_path=args.ply_path,
out_dir=args.out_dir,
)
================================================
FILE: evaluation/tnt_eval/trajectory_io.py
================================================
import numpy as np
import open3d as o3d
class CameraPose:
def __init__(self, meta, mat):
self.metadata = meta
self.pose = mat
def __str__(self):
return ("Metadata : " + " ".join(map(str, self.metadata)) + "\n" +
"Pose : " + "\n" + np.array_str(self.pose))
def convert_trajectory_to_pointcloud(traj):
pcd = o3d.geometry.PointCloud()
for t in traj:
pcd.points.append(t.pose[:3, 3])
return pcd
def read_trajectory(filename):
traj = []
with open(filename, "r") as f:
metastr = f.readline()
while metastr:
metadata = map(int, metastr.split())
mat = np.zeros(shape=(4, 4))
for i in range(4):
matstr = f.readline()
mat[i, :] = np.fromstring(matstr, dtype=float, sep=" \t")
traj.append(CameraPose(metadata, mat))
metastr = f.readline()
return traj
def write_trajectory(traj, filename):
with open(filename, "w") as f:
for x in traj:
p = x.pose.tolist()
f.write(" ".join(map(str, x.metadata)) + "\n")
f.write("\n".join(
" ".join(map("{0:.12f}".format, p[i])) for i in range(4)))
f.write("\n")
================================================
FILE: evaluation/tnt_eval/util.py
================================================
import os
def make_dir(path):
if not os.path.exists(path):
os.makedirs(path)
================================================
FILE: gaussian_renderer/__init__.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import math
import torch
import torch.nn.functional as F
from diff_gaussian_rasterization import GaussianRasterizationSettings, GaussianRasterizer
from scene.gaussian_model import GaussianModel
from tools.sh_utils import eval_sh
from tools.normal_utils import compute_normals
def render(viewpoint_camera, pc : GaussianModel, cfg, bg_color : torch.Tensor, scaling_modifier = 1.0, override_color = None,
return_normal = True, is_all = True, dirs=None, mask_depth_thr=0.8):
"""
Render the scene.
Background tensor (bg_color) must be on GPU!
"""
# Create zero tensor. We will use it to make pytorch return gradients of the 2D (screen-space) means
screenspace_points = torch.zeros_like(pc.get_xyz, dtype=pc.get_xyz.dtype, requires_grad=True, device="cuda") + 0
screenspace_points_densify = torch.zeros_like(pc.get_xyz, dtype=pc.get_xyz.dtype, requires_grad=True, device="cuda") + 0
try:
screenspace_points.retain_grad()
screenspace_points_densify.retain_grad()
except:
pass
# Set up rasterization configuration
tanfovx = math.tan(viewpoint_camera.FoVx * 0.5)
tanfovy = math.tan(viewpoint_camera.FoVy * 0.5)
raster_settings = GaussianRasterizationSettings(
image_height=int(viewpoint_camera.image_height),
image_width=int(viewpoint_camera.image_width),
tanfovx=tanfovx,
tanfovy=tanfovy,
bg=bg_color,
scale_modifier=scaling_modifier,
viewmatrix=viewpoint_camera.world_view_transform,
projmatrix=viewpoint_camera.full_proj_transform,
sh_degree=pc.active_sh_degree,
campos=viewpoint_camera.camera_center,
prefiltered=False,
debug=cfg.pipline.debug,
f_count=0,
)
rasterizer = GaussianRasterizer(raster_settings=raster_settings)
means3D = pc.get_xyz
means2D = screenspace_points
means2D_densify = screenspace_points_densify
opacity = pc.get_opacity
# If precomputed 3d covariance is provided, use it. If not, then it will be computed from
# scaling / rotation by the rasterizer.
scales = None
rotations = None
cov3D_precomp = None
if cfg.pipline.compute_cov3D_python:
cov3D_precomp = pc.get_covariance(scaling_modifier)
else:
scales = pc.get_scaling
rotations = pc.get_rotation
# If precomputed colors are provided, use them. Otherwise, if it is desired to precompute colors
# from SHs in Python, do it. If not, then SH -> RGB conversion will be done by rasterizer.
shs = None
colors_precomp = None
if override_color is None:
if cfg.pipline.convert_SHs_python:
shs_view = pc.get_features.transpose(1, 2).view(-1, 3, (pc.max_sh_degree+1)**2)
dir_pp = (pc.get_xyz - viewpoint_camera.camera_center.repeat(pc.get_features.shape[0], 1))
dir_pp_normalized = dir_pp/dir_pp.norm(dim=1, keepdim=True)
sh2rgb = eval_sh(pc.active_sh_degree, shs_view, dir_pp_normalized)
colors_precomp = torch.clamp_min(sh2rgb + 0.5, 0.0)
else:
shs = pc.get_features
else:
colors_precomp = override_color
normals_precomp = None
# inside, _ = pc.get_inside_gaus_normalized()
if return_normal:
normal = pc.get_normal(is_all=is_all)
# convert normal direction to the camera; calculate the normal in the camera coordinate
view_dir = means3D - viewpoint_camera.camera_center
normal = normal * ((((view_dir * normal).sum(dim=-1) > 0) * 1 - 0.5) * 2)[..., None]
R_w2c = torch.tensor(viewpoint_camera.R.T).cuda().to(torch.float32)
normals_precomp = normal @ R_w2c.transpose(0, 1) # camera coordinate
sem_feats = pc.get_objects.squeeze(1) if cfg.optim.loss_weight.semantic > 0 else None
inside = None
# Rasterize visible Gaussians to image, obtain their radii (on screen).
rendered_out, radii = rasterizer(
means3D = means3D,
means2D = means2D,
means2D_densify = means2D_densify,
shs = shs,
colors_precomp = colors_precomp,
normals_precomp = normals_precomp,
semantics_precomp = sem_feats,
opacities = opacity,
scales = scales,
rotations = rotations,
cov3D_precomp = cov3D_precomp,
dirs = dirs,
inside = inside)
chs = [3, 1, 3, 1]
rendered_image, rendered_depth, rendered_normal, rendered_alpha = rendered_out[:sum(chs)].split(chs, dim=0)
with torch.no_grad():
mask = viewpoint_camera.mask.bool() if hasattr(viewpoint_camera, 'mask') else \
torch.ones_like(rendered_depth, dtype=torch.bool).squeeze(0)
if cfg.optim.mask_depth_thr > 0:
mask1 = rendered_depth < (pc.extent * cfg.optim.mask_depth_thr)
mask1 = mask1.squeeze(0)
mask = mask & mask1
rendered_normal = rendered_normal.permute(1, 2, 0)
rendered_normal = F.normalize(rendered_normal, dim = -1)
est_normal = compute_normals(rendered_depth, viewpoint_camera.intr)
out = {"render": rendered_image,
"depth": rendered_depth,
"normal": rendered_normal,
"est_normal": est_normal,
"alpha": rendered_alpha,
"viewspace_points": screenspace_points,
"viewspace_points_densify": screenspace_points_densify,
"visibility_filter" : radii > 0,
"mask": mask,
"radii": radii,}
if cfg.optim.loss_weight.semantic > 0:
rendered_sem = rendered_out[sum(chs):sum(chs)+cfg.model.ch_sem_feat]
rendered_sem = pc.classifier(rendered_sem[None])[0].permute(1, 2, 0) # [H, W, cls]
out.update({"render_sem": rendered_sem})
if hasattr(cfg.optim.loss_weight, 'depth_var') and cfg.optim.loss_weight.depth_var > 0:
d1 = rendered_out[-2:-1]
d2 = rendered_out[-1:]
depth_var = d2 / rendered_alpha - (d1 / rendered_alpha) ** 2
out.update({"depth_var": depth_var})
if hasattr(cfg.optim.loss_weight, 'distortion') and cfg.optim.loss_weight.distortion > 0:
rendered_dist = rendered_out[-1:]
out.update({"distortion": rendered_dist})
return out
def render_fast(viewpoint_camera, pc : GaussianModel, cfg, bg_color : torch.Tensor, scaling_modifier = 1.0, override_color = None):
"""
use the original Gaussian Splatting cuda code!!!!
"""
# Create zero tensor. We will use it to make pytorch return gradients of the 2D (screen-space) means
screenspace_points = torch.zeros_like(pc.get_xyz, dtype=pc.get_xyz.dtype, requires_grad=True, device="cuda") + 0
try:
screenspace_points.retain_grad()
except:
pass
# Set up rasterization configuration
tanfovx = math.tan(viewpoint_camera.FoVx * 0.5)
tanfovy = math.tan(viewpoint_camera.FoVy * 0.5)
raster_settings = GaussianRasterizationSettings(
image_height=int(viewpoint_camera.image_height),
image_width=int(viewpoint_camera.image_width),
tanfovx=tanfovx,
tanfovy=tanfovy,
bg=bg_color,
scale_modifier=scaling_modifier,
viewmatrix=viewpoint_camera.world_view_transform,
projmatrix=viewpoint_camera.full_proj_transform,
sh_degree=pc.active_sh_degree,
campos=viewpoint_camera.camera_center,
prefiltered=False,
debug=cfg.pipline.debug
)
rasterizer = GaussianRasterizer(raster_settings=raster_settings)
means3D = pc.get_xyz
means2D = screenspace_points
opacity = pc.get_opacity
# If precomputed 3d covariance is provided, use it. If not, then it will be computed from
# scaling / rotation by the rasterizer.
scales = None
rotations = None
cov3D_precomp = None
if cfg.pipline.compute_cov3D_python:
cov3D_precomp = pc.get_covariance(scaling_modifier)
else:
scales = pc.get_scaling
rotations = pc.get_rotation
# If precomputed colors are provided, use them. Otherwise, if it is desired to precompute colors
# from SHs in Python, do it. If not, then SH -> RGB conversion will be done by rasterizer.
shs = None
colors_precomp = None
if override_color is None:
if cfg.pipline.convert_SHs_python:
shs_view = pc.get_features.transpose(1, 2).view(-1, 3, (pc.max_sh_degree+1)**2)
dir_pp = (pc.get_xyz - viewpoint_camera.camera_center.repeat(pc.get_features.shape[0], 1))
dir_pp_normalized = dir_pp/dir_pp.norm(dim=1, keepdim=True)
sh2rgb = eval_sh(pc.active_sh_degree, shs_view, dir_pp_normalized)
colors_precomp = torch.clamp_min(sh2rgb + 0.5, 0.0)
else:
shs = pc.get_features
else:
colors_precomp = override_color
# Rasterize visible Gaussians to image, obtain their radii (on screen).
rendered_image, radii = rasterizer(
means3D = means3D,
means2D = means2D,
shs = shs,
colors_precomp = colors_precomp,
opacities = opacity,
scales = scales,
rotations = rotations,
cov3D_precomp = cov3D_precomp)
# Those Gaussians that were frustum culled or had a radius of 0 were not visible.
# They will be excluded from value updates used in the splitting criteria.
return {"render": rendered_image,
"viewspace_points": screenspace_points,
"visibility_filter" : radii > 0,
"radii": radii}
def count_render(
viewpoint_camera,
pc: GaussianModel,
pipe,
bg_color: torch.Tensor,
scaling_modifier=1.0,
override_color=None,
):
"""
Render the scene.
Background tensor (bg_color) must be on GPU!
"""
# Create zero tensor. We will use it to make pytorch return gradients of the 2D (screen-space) means
screenspace_points = (
torch.zeros_like(
pc.get_xyz, dtype=pc.get_xyz.dtype, requires_grad=True, device="cuda"
)
+ 0
)
try:
screenspace_points.retain_grad()
except:
pass
# Set up rasterization configuration
tanfovx = math.tan(viewpoint_camera.FoVx * 0.5)
tanfovy = math.tan(viewpoint_camera.FoVy * 0.5)
raster_settings = GaussianRasterizationSettings(
image_height=int(viewpoint_camera.image_height),
image_width=int(viewpoint_camera.image_width),
tanfovx=tanfovx,
tanfovy=tanfovy,
bg=bg_color,
scale_modifier=scaling_modifier,
viewmatrix=viewpoint_camera.world_view_transform,
projmatrix=viewpoint_camera.full_proj_transform,
sh_degree=pc.active_sh_degree,
campos=viewpoint_camera.camera_center,
prefiltered=False,
debug=pipe.debug,
f_count=1,
)
rasterizer = GaussianRasterizer(raster_settings=raster_settings)
means3D = pc.get_xyz
means2D = screenspace_points
opacity = pc.get_opacity
# If precomputed 3d covariance is provided, use it. If not, then it will be computed from
# scaling / rotation by the rasterizer.
scales = None
rotations = None
cov3D_precomp = None
if pipe.compute_cov3D_python:
cov3D_precomp = pc.get_covariance(scaling_modifier)
else:
scales = pc.get_scaling
rotations = pc.get_rotation
# If precomputed colors are provided, use them. Otherwise, if it is desired to precompute colors
# from SHs in Python, do it. If not, then SH -> RGB conversion will be done by rasterizer.
shs = None
colors_precomp = None
if override_color is None:
if pipe.convert_SHs_python:
shs_view = pc.get_features.transpose(1, 2).view(
-1, 3, (pc.max_sh_degree + 1) ** 2
)
dir_pp = pc.get_xyz - viewpoint_camera.camera_center.repeat(
pc.get_features.shape[0], 1
)
dir_pp_normalized = dir_pp / dir_pp.norm(dim=1, keepdim=True)
sh2rgb = eval_sh(pc.active_sh_degree, shs_view, dir_pp_normalized)
colors_precomp = torch.clamp_min(sh2rgb + 0.5, 0.0)
else:
shs = pc.get_features
else:
colors_precomp = override_color
# Rasterize visible Gaussians to image, obtain their radii (on screen).
gaussians_count, important_score, rendered_image, radii = rasterizer(
means3D=means3D,
means2D=means2D,
means2D_densify=None,
shs=shs,
colors_precomp=colors_precomp,
normals_precomp = None,
semantics_precomp = None,
opacities=opacity,
scales=scales,
rotations=rotations,
cov3D_precomp=cov3D_precomp,
)
# Those Gaussians that were frustum culled or had a radius of 0 were not visible.
# They will be excluded from value updates used in the splitting criteria.
return {
"render": rendered_image,
"viewspace_points": screenspace_points,
"visibility_filter": radii > 0,
"radii": radii,
"gaussians_count": gaussians_count,
"important_score": important_score,
}
def visi_render(
viewpoint_camera,
pc: GaussianModel,
pipe,
bg_color: torch.Tensor,
scaling_modifier=1.0,
override_color=None,
):
"""
Render the scene.
Background tensor (bg_color) must be on GPU!
"""
# Create zero tensor. We will use it to make pytorch return gradients of the 2D (screen-space) means
screenspace_points = (
torch.zeros_like(
pc.get_xyz, dtype=pc.get_xyz.dtype, requires_grad=True, device="cuda"
)
+ 0
)
try:
screenspace_points.retain_grad()
except:
pass
# Set up rasterization configuration
tanfovx = math.tan(viewpoint_camera.FoVx * 0.5)
tanfovy = math.tan(viewpoint_camera.FoVy * 0.5)
raster_settings = GaussianRasterizationSettings(
image_height=int(viewpoint_camera.image_height),
image_width=int(viewpoint_camera.image_width),
tanfovx=tanfovx,
tanfovy=tanfovy,
bg=bg_color,
scale_modifier=scaling_modifier,
viewmatrix=viewpoint_camera.world_view_transform,
projmatrix=viewpoint_camera.full_proj_transform,
sh_degree=pc.active_sh_degree,
campos=viewpoint_camera.camera_center,
prefiltered=False,
debug=pipe.debug,
f_count=2,
)
rasterizer = GaussianRasterizer(raster_settings=raster_settings)
means3D = pc.get_xyz
means2D = screenspace_points
opacity = pc.get_opacity
# If precomputed 3d covariance is provided, use it. If not, then it will be computed from
# scaling / rotation by the rasterizer.
scales = None
rotations = None
cov3D_precomp = None
if pipe.compute_cov3D_python:
cov3D_precomp = pc.get_covariance(scaling_modifier)
else:
scales = pc.get_scaling
rotations = pc.get_rotation
# If precomputed colors are provided, use them. Otherwise, if it is desired to precompute colors
# from SHs in Python, do it. If not, then SH -> RGB conversion will be done by rasterizer.
shs = None
colors_precomp = None
if override_color is None:
if pipe.convert_SHs_python:
shs_view = pc.get_features.transpose(1, 2).view(
-1, 3, (pc.max_sh_degree + 1) ** 2
)
dir_pp = pc.get_xyz - viewpoint_camera.camera_center.repeat(
pc.get_features.shape[0], 1
)
dir_pp_normalized = dir_pp / dir_pp.norm(dim=1, keepdim=True)
sh2rgb = eval_sh(pc.active_sh_degree, shs_view, dir_pp_normalized)
colors_precomp = torch.clamp_min(sh2rgb + 0.5, 0.0)
else:
shs = pc.get_features
else:
colors_precomp = override_color
# Rasterize visible Gaussians to image, obtain their radii (on screen).
countlist, important_score, rendered_image, radii = rasterizer(
means3D=means3D,
means2D=means2D,
means2D_densify=None,
shs=shs,
colors_precomp=colors_precomp,
normals_precomp = None,
semantics_precomp = None,
opacities=opacity,
scales=scales,
rotations=rotations,
cov3D_precomp=cov3D_precomp,
)
# Those Gaussians that were frustum culled or had a radius of 0 were not visible.
# They will be excluded from value updates used in the splitting criteria.
return {
"render": rendered_image,
"viewspace_points": screenspace_points,
"visibility_filter": radii > 0,
"radii": radii,
"countlist": countlist,
"important_score": important_score,
}
def visi_acc_render(
viewpoint_camera,
pc: GaussianModel,
pipe,
bg_color: torch.Tensor,
scaling_modifier=1.0,
override_color=None,
):
"""
Render the scene.
Background tensor (bg_color) must be on GPU!
"""
# Create zero tensor. We will use it to make pytorch return gradients of the 2D (screen-space) means
screenspace_points = (
torch.zeros_like(
pc.get_xyz, dtype=pc.get_xyz.dtype, requires_grad=True, device="cuda"
)
+ 0
)
try:
screenspace_points.retain_grad()
except:
pass
# Set up rasterization configuration
tanfovx = math.tan(viewpoint_camera.FoVx * 0.5)
tanfovy = math.tan(viewpoint_camera.FoVy * 0.5)
raster_settings = GaussianRasterizationSettings(
image_height=int(viewpoint_camera.image_height),
image_width=int(viewpoint_camera.image_width),
tanfovx=tanfovx,
tanfovy=tanfovy,
bg=bg_color,
scale_modifier=scaling_modifier,
viewmatrix=viewpoint_camera.world_view_transform,
projmatrix=viewpoint_camera.full_proj_transform,
sh_degree=pc.active_sh_degree,
campos=viewpoint_camera.camera_center,
prefiltered=False,
debug=pipe.debug,
f_count=3,
)
rasterizer = GaussianRasterizer(raster_settings=raster_settings)
means3D = pc.get_xyz
means2D = screenspace_points
opacity = pc.get_opacity
# If precomputed 3d covariance is provided, use it. If not, then it will be computed from
# scaling / rotation by the rasterizer.
scales = None
rotations = None
cov3D_precomp = None
if pipe.compute_cov3D_python:
cov3D_precomp = pc.get_covariance(scaling_modifier)
else:
scales = pc.get_scaling
rotations = pc.get_rotation
# If precomputed colors are provided, use them. Otherwise, if it is desired to precompute colors
# from SHs in Python, do it. If not, then SH -> RGB conversion will be done by rasterizer.
shs = None
colors_precomp = None
if override_color is None:
if pipe.convert_SHs_python:
shs_view = pc.get_features.transpose(1, 2).view(
-1, 3, (pc.max_sh_degree + 1) ** 2
)
dir_pp = pc.get_xyz - viewpoint_camera.camera_center.repeat(
pc.get_features.shape[0], 1
)
dir_pp_normalized = dir_pp / dir_pp.norm(dim=1, keepdim=True)
sh2rgb = eval_sh(pc.active_sh_degree, shs_view, dir_pp_normalized)
colors_precomp = torch.clamp_min(sh2rgb + 0.5, 0.0)
else:
shs = pc.get_features
else:
colors_precomp = override_color
# Rasterize visible Gaussians to image, obtain their radii (on screen).
countlist, radii = rasterizer(
means3D=means3D,
means2D=means2D,
means2D_densify=None,
shs=shs,
colors_precomp=colors_precomp,
normals_precomp = None,
semantics_precomp = None,
opacities=opacity,
scales=scales,
rotations=rotations,
cov3D_precomp=cov3D_precomp,
)
# Those Gaussians that were frustum culled or had a radius of 0 were not visible.
# They will be excluded from value updates used in the splitting criteria.
return {
"viewspace_points": screenspace_points,
"visibility_filter": radii > 0,
"radii": radii,
"countlist": countlist,
}
================================================
FILE: gaussian_renderer/network_gui.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import torch
import traceback
import socket
import json
from scene.cameras import MiniCam
host = "127.0.0.1"
port = 6009
conn = None
addr = None
listener = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
def init(wish_host, wish_port):
global host, port, listener
host = wish_host
port = wish_port
listener.bind((host, port))
listener.listen()
listener.settimeout(0)
def try_connect():
global conn, addr, listener
try:
conn, addr = listener.accept()
print(f"\nConnected by {addr}")
conn.settimeout(None)
except Exception as inst:
pass
def read():
global conn
messageLength = conn.recv(4)
messageLength = int.from_bytes(messageLength, 'little')
message = conn.recv(messageLength)
return json.loads(message.decode("utf-8"))
def send(message_bytes, verify):
global conn
if message_bytes != None:
conn.sendall(message_bytes)
conn.sendall(len(verify).to_bytes(4, 'little'))
conn.sendall(bytes(verify, 'ascii'))
def receive():
message = read()
width = message["resolution_x"]
height = message["resolution_y"]
if width != 0 and height != 0:
try:
do_training = bool(message["train"])
fovy = message["fov_y"]
fovx = message["fov_x"]
znear = message["z_near"]
zfar = message["z_far"]
do_shs_python = bool(message["shs_python"])
do_rot_scale_python = bool(message["rot_scale_python"])
keep_alive = bool(message["keep_alive"])
scaling_modifier = message["scaling_modifier"]
world_view_transform = torch.reshape(torch.tensor(message["view_matrix"]), (4, 4)).cuda()
world_view_transform[:,1] = -world_view_transform[:,1]
world_view_transform[:,2] = -world_view_transform[:,2]
full_proj_transform = torch.reshape(torch.tensor(message["view_projection_matrix"]), (4, 4)).cuda()
full_proj_transform[:,1] = -full_proj_transform[:,1]
custom_cam = MiniCam(width, height, fovy, fovx, znear, zfar, world_view_transform, full_proj_transform)
except Exception as e:
print("")
traceback.print_exc()
raise e
return custom_cam, do_training, do_shs_python, do_rot_scale_python, keep_alive, scaling_modifier
else:
return None, None, None, None, None, None
================================================
FILE: process_data/convert.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import os
import json
import logging
from argparse import ArgumentParser
import shutil
import sys
import importlib
sys.path.append(os.getcwd())
def create_init_files(pinhole_dict_file, db_file, out_dir):
# Partially adapted from https://github.com/Kai-46/nerfplusplus/blob/master/colmap_runner/run_colmap_posed.py
# COLMAPDatabase = getattr(importlib.import_module(f'{args.colmap_path}.scripts.python.database'), 'COLMAPDatabase')
from submodules.colmap.scripts.python.database import COLMAPDatabase # NOQA
if not os.path.exists(out_dir):
os.mkdir(out_dir)
# create template
with open(pinhole_dict_file) as fp:
pinhole_dict = json.load(fp)
template = {}
cameras_line_template = '{camera_id} RADIAL {width} {height} {f} {cx} {cy} {k1} {k2}\n'
images_line_template = '{image_id} {qw} {qx} {qy} {qz} {tx} {ty} {tz} {camera_id} {image_name}\n\n'
for img_name in pinhole_dict:
# w, h, fx, fy, cx, cy, qvec, t
params = pinhole_dict[img_name]
w = params[0]
h = params[1]
fx = params[2]
# fy = params[3]
cx = params[4]
cy = params[5]
qvec = params[6:10]
tvec = params[10:13]
cam_line = cameras_line_template.format(
camera_id="{camera_id}", width=w, height=h, f=fx, cx=cx, cy=cy, k1=0, k2=0)
img_line = images_line_template.format(image_id="{image_id}", qw=qvec[0], qx=qvec[1], qy=qvec[2], qz=qvec[3],
tx=tvec[0], ty=tvec[1], tz=tvec[2], camera_id="{camera_id}",
image_name=img_name)
template[img_name] = (cam_line, img_line)
# read database
db = COLMAPDatabase.connect(db_file)
table_images = db.execute("SELECT * FROM images")
img_name2id_dict = {}
for row in table_images:
img_name2id_dict[row[1]] = row[0]
cameras_txt_lines = [template[img_name][0].format(camera_id=1)]
images_txt_lines = []
for img_name, img_id in img_name2id_dict.items():
image_line = template[img_name][1].format(image_id=img_id, camera_id=1)
images_txt_lines.append(image_line)
with open(os.path.join(out_dir, 'cameras.txt'), 'w') as fp:
fp.writelines(cameras_txt_lines)
with open(os.path.join(out_dir, 'images.txt'), 'w') as fp:
fp.writelines(images_txt_lines)
fp.write('\n')
# create an empty points3D.txt
fp = open(os.path.join(out_dir, 'points3D.txt'), 'w')
fp.close()
def main(args):
colmap_command = '"{}"'.format(args.colmap_executable) if len(args.colmap_executable) > 0 else "colmap"
magick_command = '"{}"'.format(args.magick_executable) if len(args.magick_executable) > 0 else "magick"
use_gpu = 1 if not args.no_gpu else 0
if not args.skip_matching:
os.makedirs(args.source_path + "/distorted/sparse", exist_ok=True)
## Feature extraction
feat_extracton_cmd = colmap_command + " feature_extractor "\
"--database_path " + args.source_path + "/distorted/database.db \
--image_path " + args.source_path + "/input \
--ImageReader.single_camera 1 \
--ImageReader.camera_model " + args.camera + " \
--SiftExtraction.use_gpu " + str(use_gpu)
exit_code = os.system(feat_extracton_cmd)
if exit_code != 0:
logging.error(f"Feature extraction failed with code {exit_code}. Exiting.")
exit(exit_code)
## Feature matching
feat_matching_cmd = colmap_command + " exhaustive_matcher \
--database_path " + args.source_path + "/distorted/database.db \
--SiftMatching.use_gpu " + str(use_gpu)
exit_code = os.system(feat_matching_cmd)
if exit_code != 0:
logging.error(f"Feature matching failed with code {exit_code}. Exiting.")
exit(exit_code)
if args.existing_pose:
db_file = os.path.join(args.source_path, 'distorted/database.db')
sfm_dir = os.path.join(args.source_path, 'distorted/sparse/0')
pinhole_dict_file = os.path.join(args.source_path, 'pinhole_dict.json')
create_init_files(pinhole_dict_file, db_file, sfm_dir)
### Bundle adjustment
# The default Mapper tolerance is unnecessarily large,
# decreasing it speeds up bundle adjustment steps.
mapper_cmd = (colmap_command + " mapper \
--database_path " + args.source_path + "/distorted/database.db \
--image_path " + args.source_path + "/input \
--output_path " + args.source_path + "/distorted/sparse \
--Mapper.ba_global_function_tolerance=0.000001")
exit_code = os.system(mapper_cmd)
if exit_code != 0:
logging.error(f"Mapper failed with code {exit_code}. Exiting.")
exit(exit_code)
if not args.skip_distorting:
### Image undistortion
## We need to undistort our images into ideal pinhole intrinsics.
img_undist_cmd = (colmap_command + " image_undistorter \
--image_path " + args.source_path + "/input \
--input_path " + args.source_path + "/distorted/sparse/0 \
--output_path " + args.source_path + "\
--output_type COLMAP")
exit_code = os.system(img_undist_cmd)
if exit_code != 0:
logging.error(f"Mapper failed with code {exit_code}. Exiting.")
exit(exit_code)
files = os.listdir(args.source_path + "/distorted/sparse/0")
os.makedirs(args.source_path + "/sparse/0", exist_ok=True)
# Copy each file from the source directory to the destination directory
for file in files:
source_file = os.path.join(args.source_path, "distorted/sparse/0", file)
destination_file = os.path.join(args.source_path, "sparse", "0", file)
shutil.move(source_file, destination_file)
if(args.resize):
print("Copying and resizing...")
# Resize images.
os.makedirs(args.source_path + "/images_2", exist_ok=True)
os.makedirs(args.source_path + "/images_4", exist_ok=True)
os.makedirs(args.source_path + "/images_8", exist_ok=True)
# Get the list of files in the source directory
files = os.listdir(args.source_path + "/images")
# Copy each file from the source directory to the destination directory
for file in files:
source_file = os.path.join(args.source_path, "images", file)
destination_file = os.path.join(args.source_path, "images_2", file)
shutil.copy2(source_file, destination_file)
exit_code = os.system(magick_command + " mogrify -resize 50% " + destination_file)
if exit_code != 0:
logging.error(f"50% resize failed with code {exit_code}. Exiting.")
exit(exit_code)
destination_file = os.path.join(args.source_path, "images_4", file)
shutil.copy2(source_file, destination_file)
exit_code = os.system(magick_command + " mogrify -resize 25% " + destination_file)
if exit_code != 0:
logging.error(f"25% resize failed with code {exit_code}. Exiting.")
exit(exit_code)
destination_file = os.path.join(args.source_path, "images_8", file)
shutil.copy2(source_file, destination_file)
exit_code = os.system(magick_command + " mogrify -resize 12.5% " + destination_file)
if exit_code != 0:
logging.error(f"12.5% resize failed with code {exit_code}. Exiting.")
exit(exit_code)
print("Done.")
if __name__ == '__main__':
# This Python script is based on the shell converter script provided in the MipNerF 360 repository.
parser = ArgumentParser("Colmap converter")
parser.add_argument("--no_gpu", action='store_true')
parser.add_argument("--skip_matching", action='store_true')
parser.add_argument("--skip_distorting", action='store_true')
parser.add_argument("--source_path", "-s", required=True, type=str)
parser.add_argument("--camera", default="OPENCV", type=str)
parser.add_argument("--colmap_executable", default="", type=str)
parser.add_argument("--resize", action="store_true")
parser.add_argument("--magick_executable", default="", type=str)
parser.add_argument("--existing_pose", action='store_true')
parser.add_argument("--colmap_path", default="submodules.colmap", type=str)
args = parser.parse_args()
main(args)
================================================
FILE: process_data/convert_360_to_json.py
================================================
import os
import numpy as np
import json
import sys
from pathlib import Path
from argparse import ArgumentParser
import trimesh
dir_path = Path(os.path.dirname(os.path.realpath(__file__))).parents[0]
sys.path.append(dir_path.__str__())
from process_data.convert_data_to_json import export_to_json, get_split_dict, bound_by_pose # NOQA
from submodules.colmap.scripts.python.database import COLMAPDatabase # NOQA
from submodules.colmap.scripts.python.read_write_model import read_model, rotmat2qvec # NOQA
def create_init_files(pinhole_dict_file, db_file, out_dir):
# Partially adapted from https://github.com/Kai-46/nerfplusplus/blob/master/colmap_runner/run_colmap_posed.py
if not os.path.exists(out_dir):
os.mkdir(out_dir)
# create template
with open(pinhole_dict_file) as fp:
pinhole_dict = json.load(fp)
template = {}
cameras_line_template = '{camera_id} RADIAL {width} {height} {f} {cx} {cy} {k1} {k2}\n'
images_line_template = '{image_id} {qw} {qx} {qy} {qz} {tx} {ty} {tz} {camera_id} {image_name}\n\n'
for img_name in pinhole_dict:
# w, h, fx, fy, cx, cy, qvec, t
params = pinhole_dict[img_name]
w = params[0]
h = params[1]
fx = params[2]
# fy = params[3]
cx = params[4]
cy = params[5]
qvec = params[6:10]
tvec = params[10:13]
cam_line = cameras_line_template.format(
camera_id="{camera_id}", width=w, height=h, f=fx, cx=cx, cy=cy, k1=0, k2=0)
img_line = images_line_template.format(image_id="{image_id}", qw=qvec[0], qx=qvec[1], qy=qvec[2], qz=qvec[3],
tx=tvec[0], ty=tvec[1], tz=tvec[2], camera_id="{camera_id}",
image_name=img_name)
template[img_name] = (cam_line, img_line)
# read database
db = COLMAPDatabase.connect(db_file)
table_images = db.execute("SELECT * FROM images")
img_name2id_dict = {}
for row in table_images:
img_name2id_dict[row[1]] = row[0]
cameras_txt_lines = [template[img_name][0].format(camera_id=1)]
images_txt_lines = []
for img_name, img_id in img_name2id_dict.items():
image_line = template[img_name][1].format(image_id=img_id, camera_id=1)
images_txt_lines.append(image_line)
with open(os.path.join(out_dir, 'cameras.txt'), 'w') as fp:
fp.writelines(cameras_txt_lines)
with open(os.path.join(out_dir, 'images.txt'), 'w') as fp:
fp.writelines(images_txt_lines)
fp.write('\n')
# create an empty points3D.txt
fp = open(os.path.join(out_dir, 'points3D.txt'), 'w')
fp.close()
def convert_cam_dict_to_pinhole_dict(cam_dict, pinhole_dict_file):
# Partially adapted from https://github.com/Kai-46/nerfplusplus/blob/master/colmap_runner/run_colmap_posed.py
print('Writing pinhole_dict to: ', pinhole_dict_file)
h = 1080
w = 1920
pinhole_dict = {}
for img_name in cam_dict:
W2C = cam_dict[img_name]
# params
fx = 0.6 * w
fy = 0.6 * w
cx = w / 2.0
cy = h / 2.0
qvec = rotmat2qvec(W2C[:3, :3])
tvec = W2C[:3, 3]
params = [w, h, fx, fy, cx, cy,
qvec[0], qvec[1], qvec[2], qvec[3],
tvec[0], tvec[1], tvec[2]]
pinhole_dict[img_name] = params
with open(pinhole_dict_file, 'w') as fp:
json.dump(pinhole_dict, fp, indent=2, sort_keys=True)
def load_COLMAP_poses(cam_file, img_dir, tf='w2c'):
# load img_dir namges
names = sorted(os.listdir(img_dir))
with open(cam_file) as f:
lines = f.readlines()
# C2W
poses = {}
for idx, line in enumerate(lines):
if idx % 5 == 0: # header
img_idx, valid, _ = line.split(' ')
if valid != '-1':
poses[int(img_idx)] = np.eye(4)
poses[int(img_idx)]
else:
if int(img_idx) in poses:
num = np.array([float(n) for n in line.split(' ')])
poses[int(img_idx)][idx % 5-1, :] = num
if tf == 'c2w':
return poses
else:
# convert to W2C (follow nerf convention)
poses_w2c = {}
for k, v in poses.items():
poses_w2c[names[k]] = np.linalg.inv(v)
return poses_w2c
def load_transformation(trans_file):
with open(trans_file) as f:
lines = f.readlines()
trans = np.eye(4)
for idx, line in enumerate(lines):
num = np.array([float(n) for n in line.split(' ')])
trans[idx, :] = num
return trans
def align_gt_with_cam(pts, trans):
trans_inv = np.linalg.inv(trans)
pts_aligned = pts @ trans_inv[:3, :3].transpose(-1, -2) + trans_inv[:3, -1]
return pts_aligned
def main(args):
assert args.data_path, "Provide path to 360 dataset"
scene_list = os.listdir(args.data_path)
scene_list = sorted(scene_list)
for scene in scene_list:
scene_path = os.path.join(args.data_path, scene)
if not os.path.isdir(scene_path): continue
cameras, images, points3D = read_model(os.path.join(scene_path, "sparse/0"), ext=".bin")
trans, scale, bounding_box = bound_by_pose(images)
trans = trans.tolist()
export_to_json(trans, scale, scene_path, 'meta.json')
print('Writing data to json file: ', os.path.join(scene_path, 'meta.json'))
if __name__ == '__main__':
parser = ArgumentParser()
parser.add_argument('--data_path', type=str, default=None, help='Path to tanks and temples dataset')
parser.add_argument('--run_colmap', action='store_true', help='Run colmap')
parser.add_argument('--export_json', action='store_true', help='export json')
args = parser.parse_args()
main(args)
================================================
FILE: process_data/convert_data_to_json.py
================================================
'''
-----------------------------------------------------------------------------
Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
NVIDIA CORPORATION and its licensors retain all intellectual property
and proprietary rights in and to this software, related documentation
and any modifications thereto. Any use, reproduction, disclosure or
distribution of this software and related documentation without an express
license agreement from NVIDIA CORPORATION is strictly prohibited.
-----------------------------------------------------------------------------
'''
import numpy as np
from argparse import ArgumentParser
import os
import sys
from pathlib import Path
import json
import trimesh
dir_path = Path(os.path.dirname(os.path.realpath(__file__))).parents[0]
sys.path.append(dir_path.__str__())
from submodules.colmap.scripts.python.read_write_model import read_model, qvec2rotmat # NOQA
def find_closest_point(p1, d1, p2, d2):
# Calculate the direction vectors of the lines
d1_norm = d1 / np.linalg.norm(d1)
d2_norm = d2 / np.linalg.norm(d2)
# Create the coefficient matrix A and the constant vector b
A = np.vstack((d1_norm, -d2_norm)).T
b = p2 - p1
# Solve the linear system to find the parameters t1 and t2
t1, t2 = np.linalg.lstsq(A, b, rcond=None)[0]
# Calculate the closest point on each line
closest_point1 = p1 + d1_norm * t1
closest_point2 = p2 + d2_norm * t2
# Calculate the average of the two closest points
closest_point = 0.5 * (closest_point1 + closest_point2)
return closest_point
def bound_by_pose(images):
poses = []
for img in images.values():
rotation = qvec2rotmat(img.qvec)
translation = img.tvec.reshape(3, 1)
w2c = np.concatenate([rotation, translation], 1)
w2c = np.concatenate([w2c, np.array([0, 0, 0, 1])[None]], 0)
c2w = np.linalg.inv(w2c)
poses.append(c2w)
center = np.array([0.0, 0.0, 0.0])
for f in poses:
src_frame = f[0:3, :]
for g in poses:
tgt_frame = g[0:3, :]
p = find_closest_point(src_frame[:, 3], src_frame[:, 2], tgt_frame[:, 3], tgt_frame[:, 2])
center += p
center /= len(poses) ** 2
radius = 0.0
for f in poses:
radius += np.linalg.norm(f[0:3, 3])
radius /= len(poses)
bounding_box = [
[center[0] - radius, center[0] + radius],
[center[1] - radius, center[1] + radius],
[center[2] - radius, center[2] + radius],
]
return center, radius, bounding_box
def bound_by_points(points3D):
if not isinstance(points3D, np.ndarray):
xyzs = np.stack([point.xyz for point in points3D.values()])
else:
xyzs = points3D
center = xyzs.mean(axis=0)
std = xyzs.std(axis=0)
# radius = float(std.max() * 2) # use 2*std to define the region, equivalent to 95% percentile
radius = np.abs(xyzs).max(0) * 1.1
bounding_box = [
[center[0] - std[0] * 3, center[0] + std[0] * 3],
[center[1] - std[1] * 3, center[1] + std[1] * 3],
[center[2] - std[2] * 3, center[2] + std[2] * 3],
]
return center, radius, bounding_box
def compute_oriented_bound(pts):
to_align, _ = trimesh.bounds.oriented_bounds(pts)
scale = (np.abs((to_align[:3, :3] @ pts.vertices.T + to_align[:3, 3:]).T).max(0) * 1.2).tolist()
return to_align.tolist(), scale
def split_data(names, split=10):
split_dict = {'train': [], 'test': []}
names = sorted(names)
for i, name in enumerate(names):
if i % split == 0:
split_dict['test'].append(name)
else:
split_dict['train'].append(name)
split_dict['train'] = sorted(split_dict['train'])
split_dict['test'] = sorted(split_dict['test'])
return split_dict
def get_split_dict(scene_path):
split_dict = None
if os.path.exists(os.path.join(scene_path, 'train_test_lists.json')):
image_names = os.listdir(os.path.join(scene_path, "images"))
image_names = sorted(['{:06}'.format(int(i.split(".")[0])) for i in image_names])
with open(os.path.join(scene_path, 'train_test_lists.json'), 'r') as fp:
split_dict = json.load(fp)
test_split = sorted([i.split(".")[0] for i in split_dict['test']])
train_split = [i for i in image_names if i not in test_split]
assert len(train_split) + len(test_split) == len(image_names), "train and test split do not cover all images"
split_dict = {
'train': train_split,
'test': test_split,
}
return split_dict
def check_concentric(images, ang_tol=np.pi / 6.0, radii_tol=0.5, pose_tol=0.5):
look_at = []
cam_loc = []
for img in images.values():
rotation = qvec2rotmat(img.qvec)
translation = img.tvec.reshape(3, 1)
w2c = np.concatenate([rotation, translation], 1)
w2c = np.concatenate([w2c, np.array([0, 0, 0, 1])[None]], 0)
c2w = np.linalg.inv(w2c)
cam_loc.append(c2w[:3, -1])
look_at.append(c2w[:3, 2])
look_at = np.stack(look_at)
look_at = look_at / np.linalg.norm(look_at, axis=1, keepdims=True)
cam_loc = np.stack(cam_loc)
num_images = cam_loc.shape[0]
center = cam_loc.mean(axis=0)
vec = center - cam_loc
radii = np.linalg.norm(vec, axis=1, keepdims=True)
vec_unit = vec / radii
ang = np.arccos((look_at * vec_unit).sum(axis=-1, keepdims=True))
ang_valid = ang < ang_tol
print(f"Fraction of images looking at the center: {ang_valid.sum()/num_images:.2f}.")
radius_mean = radii.mean()
radii_valid = np.isclose(radius_mean, radii, rtol=radii_tol)
print(f"Fraction of images positioned around the center: {radii_valid.sum()/num_images:.2f}.")
valid = ang_valid * radii_valid
print(f"Valid fraction of concentric images: {valid.sum()/num_images:.2f}.")
return valid.sum() / num_images > pose_tol
def export_to_json(trans, scale, scene_path, file_name, split_dict=None, do_split=False):
out = {
"trans": trans,
"scale": scale,
}
if do_split:
if split_dict is None:
image_names = os.listdir(os.path.join(scene_path, "images"))
image_names = ['{:06}'.format(int(i.split(".")[0])) for i in image_names]
split_dict = split_data(image_names, split=10)
out.update(split_dict)
with open(os.path.join(scene_path, file_name), "w") as outputfile:
json.dump(out, outputfile, indent=4)
return
def data_to_json(args):
cameras, images, points3D = read_model(os.path.join(args.data_dir, "sparse"), ext=".bin")
# define bounding regions based on scene type
if args.scene_type == "outdoor":
if check_concentric(images):
center, scale, bounding_box = bound_by_pose(images)
else:
center, scale, bounding_box = bound_by_points(points3D)
elif args.scene_type == "indoor":
# use sfm points as a proxy to define bounding regions
center, scale, bounding_box = bound_by_points(points3D)
elif args.scene_type == "object":
# use poses as a proxy to define bounding regions
center, scale, bounding_box = bound_by_pose(images)
else:
raise TypeError("Unknown scene type")
# export json file
export_to_json(list(center), scale, args.data_dir, "meta.json")
print("Writing data to json file: ", os.path.join(args.data_dir, "meta.json"))
return
if __name__ == "__main__":
parser = ArgumentParser()
parser.add_argument("--data_dir", type=str, default=None, help="Path to data")
parser.add_argument(
"--scene_type",
type=str,
default="outdoor",
choices=["outdoor", "indoor", "object"],
help="Select scene type. Outdoor for building-scale reconstruction; "
"indoor for room-scale reconstruction; object for object-centric scene reconstruction.",
)
args = parser.parse_args()
data_to_json(args)
================================================
FILE: process_data/convert_dtu_to_json.py
================================================
'''
-----------------------------------------------------------------------------
Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
NVIDIA CORPORATION and its licensors retain all intellectual property
and proprietary rights in and to this software, related documentation
and any modifications thereto. Any use, reproduction, disclosure or
distribution of this software and related documentation without an express
license agreement from NVIDIA CORPORATION is strictly prohibited.
-----------------------------------------------------------------------------
'''
import numpy as np
import json
from argparse import ArgumentParser
import os
import cv2
from PIL import Image, ImageFile
from glob import glob
import math
import sys
from pathlib import Path
from tqdm import tqdm
import trimesh
dir_path = Path(os.path.dirname(os.path.realpath(__file__))).parents[0]
sys.path.append(dir_path.__str__())
# from process_data.convert_data_to_json import _cv_to_gl # noqa: E402
from process_data.convert_data_to_json import export_to_json, compute_oriented_bound # NOQA
from submodules.colmap.scripts.python.database import COLMAPDatabase # NOQA
from submodules.colmap.scripts.python.read_write_model import rotmat2qvec # NOQA
ImageFile.LOAD_TRUNCATED_IMAGES = True
def load_K_Rt_from_P(filename, P=None):
# This function is borrowed from IDR: https://github.com/lioryariv/idr
if P is None:
lines = open(filename).read().splitlines()
if len(lines) == 4:
lines = lines[1:]
lines = [[x[0], x[1], x[2], x[3]] for x in (x.split(" ") for x in lines)]
P = np.asarray(lines).astype(np.float32).squeeze()
out = cv2.decomposeProjectionMatrix(P)
K = out[0]
R = out[1]
t = out[2]
K = K / K[2, 2]
intrinsics = np.eye(4)
intrinsics[:3, :3] = K
pose = np.eye(4, dtype=np.float32)
pose[:3, :3] = R.transpose()
pose[:3, 3] = (t[:3] / t[3])[:, 0]
return intrinsics, pose
def dtu_to_json(args):
assert args.dtu_path, "Provide path to DTU dataset"
scene_list = os.listdir(args.dtu_path)
test_indexes = [8, 13, 16, 21, 26, 31, 34, 56]
for scene in tqdm(scene_list):
scene_path = os.path.join(args.dtu_path, scene)
if not os.path.isdir(scene_path) or 'scan' not in scene:
continue
# trans = [0., 0., 0.]
# scale = 1.
id = int(scene[4:])
pts = trimesh.load(os.path.join(args.dtu_path, f'Points/stl/stl{id:03}_total.ply'))
trans, scale = compute_oriented_bound(pts)
out = {
"trans": trans,
"scale": scale,
}
# split_dict = None
if args.split:
images_names = os.listdir(os.path.join(scene_path, 'images'))
images_names = sorted([i for i in images_names if 'png' in i])
train_images = [i.split('.')[0] for i in images_names if int(i.split('.')[0]) not in test_indexes]
test_images = [i.split('.')[0] for i in images_names if int(i.split('.')[0]) in test_indexes]
train_images = sorted(train_images)
test_images = sorted(test_images)
out.update({
'train': train_images,
'test': test_images,
})
assert len(train_images) + len(test_images) == len(images_names)
file_path = os.path.join(scene_path, 'meta.json')
with open(file_path, "w") as outputfile:
json.dump(out, outputfile, indent=4)
# print('Writing data to json file: ', file_path)
def load_poses(scene_path):
camera_param = dict(np.load(os.path.join(scene_path, 'cameras_sphere.npz')))
images_lis = sorted(glob(os.path.join(scene_path, 'image/*.png')))
c2ws = {}
for idx, image in enumerate(images_lis):
image = os.path.basename(image)
world_mat = camera_param['world_mat_%d' % idx]
scale_mat = camera_param['scale_mat_%d' % idx]
# scale and decompose
P = world_mat @ scale_mat
P = P[:3, :4]
intrinsic_param, c2w = load_K_Rt_from_P(None, P)
c2ws[image] = c2w
w, h = Image.open(os.path.join(scene_path, 'image', image)).size
return c2ws, intrinsic_param, w, h
def convert_cam_dict_to_pinhole_dict(scene_path, pinhole_dict_file):
# Partially adapted from https://github.com/Kai-46/nerfplusplus/blob/master/colmap_runner/run_colmap_posed.py
c2ws, intrinsic_param, w, h = load_poses(scene_path)
fx = intrinsic_param[0][0]
fy = intrinsic_param[1][1]
cx = intrinsic_param[0][2]
cy = intrinsic_param[1][2]
sk_x = intrinsic_param[0][1]
sk_y = intrinsic_param[1][0]
print('Writing pinhole_dict to: ', pinhole_dict_file)
pinhole_dict = {}
for img_name in c2ws:
c2w = c2ws[img_name]
W2C = np.linalg.inv(c2w)
# params
qvec = rotmat2qvec(W2C[:3, :3])
tvec = W2C[:3, 3]
params = [w, h, fx, fy, cx, cy, sk_x, sk_y,
qvec[0], qvec[1], qvec[2], qvec[3],
tvec[0], tvec[1], tvec[2]]
pinhole_dict[img_name] = params
with open(pinhole_dict_file, 'w') as fp:
pinhole_dict = {k: [float(x) for x in v] for k, v in pinhole_dict.items()}
json.dump(pinhole_dict, fp, indent=2, sort_keys=True)
def create_init_files(pinhole_dict_file, db_file, out_dir):
# Partially adapted from https://github.com/Kai-46/nerfplusplus/blob/master/colmap_runner/run_colmap_posed.py
if not os.path.exists(out_dir):
os.mkdir(out_dir)
# create template
with open(pinhole_dict_file) as fp:
pinhole_dict = json.load(fp)
template = {}
cameras_line_template = '{camera_id} RADIAL {width} {height} {fx} {fy} {cx} {cy} {k1} {k2}\n'
images_line_template = '{image_id} {qw} {qx} {qy} {qz} {tx} {ty} {tz} {camera_id} {image_name}\n\n'
for img_name in pinhole_dict:
# w, h, fx, fy, cx, cy, qvec, t
params = pinhole_dict[img_name]
w = params[0]
h = params[1]
fx = params[2]
fy = params[3]
cx = params[4]
cy = params[5]
sk_x = params[6]
sk_y = params[7]
qvec = params[8:12]
tvec = params[12:15]
cam_line = cameras_line_template.format(
camera_id="{camera_id}", width=w, height=h, fx=fx, fy=fy, cx=cx, cy=cy, k1=sk_x, k2=sk_y)
img_line = images_line_template.format(image_id="{image_id}", qw=qvec[0], qx=qvec[1], qy=qvec[2], qz=qvec[3],
tx=tvec[0], ty=tvec[1], tz=tvec[2], camera_id="{camera_id}",
image_name=img_name)
template[img_name] = (cam_line, img_line)
# read database
db = COLMAPDatabase.connect(db_file)
table_images = db.execute("SELECT * FROM images")
img_name2id_dict = {}
for row in table_images:
img_name2id_dict[row[1]] = row[0]
cameras_txt_lines = [template[img_name][0].format(camera_id=1)]
images_txt_lines = []
for img_name, img_id in img_name2id_dict.items():
image_line = template[img_name][1].format(image_id=img_id, camera_id=1)
images_txt_lines.append(image_line)
with open(os.path.join(out_dir, 'cameras.txt'), 'w') as fp:
fp.writelines(cameras_txt_lines)
with open(os.path.join(out_dir, 'images.txt'), 'w') as fp:
fp.writelines(images_txt_lines)
fp.write('\n')
# create an empty points3D.txt
fp = open(os.path.join(out_dir, 'points3D.txt'), 'w')
fp.close()
def init_colmap(args):
assert args.dtu_path, "Provide path to DTU dataset"
scene_list = os.listdir(args.dtu_path)
scene_list = sorted([i for i in scene_list if 'scan' in i])
pbar = tqdm(total=len(scene_list))
for scene in scene_list:
pbar.set_description(desc=f'Scene: {scene}')
pbar.update(1)
scene_path = os.path.join(args.dtu_path, scene)
if not os.path.exists(f"{scene_path}/image"):
raise Exception(f"'image` folder cannot be found in {scene_path}."
"Please check the expected folder structure in DATA_PREPROCESSING.md")
# extract features
os.system(f"colmap feature_extractor --database_path {scene_path}/database.db \
--image_path {scene_path}/image \
--ImageReader.camera_model=RADIAL \
--SiftExtraction.use_gpu=true \
--SiftExtraction.num_threads=32 \
--ImageReader.single_camera=true"
)
# --ImageReader.camera_model=RADIAL \
# match features
os.system(f"colmap sequential_matcher \
--database_path {scene_path}/database.db \
--SiftMatching.use_gpu=true"
)
pinhole_dict_file = os.path.join(scene_path, 'pinhole_dict.json')
convert_cam_dict_to_pinhole_dict(scene_path, pinhole_dict_file)
db_file = os.path.join(scene_path, 'database.db')
sfm_dir = os.path.join(scene_path, 'sparse')
# sfm_dir = os.path.join(scene_path, 'colmap')
create_init_files(pinhole_dict_file, db_file, sfm_dir)
# bundle adjustment
os.system(f"colmap point_triangulator \
--database_path {scene_path}/database.db \
--image_path {scene_path}/image \
--input_path {scene_path}/sparse \
--output_path {scene_path}/sparse \
--clear_points 1 \
--Mapper.tri_ignore_two_view_tracks=true"
)
os.system(f"colmap bundle_adjuster \
--input_path {scene_path}/sparse \
--output_path {scene_path}/sparse \
--BundleAdjustment.refine_extrinsics=false"
)
# undistortion
os.system(f"colmap image_undistorter \
--image_path {scene_path}/image \
--input_path {scene_path}/sparse \
--output_path {scene_path} \
--output_type COLMAP \
--max_image_size 1600"
)
if __name__ == '__main__':
parser = ArgumentParser()
parser.add_argument('--dtu_path', type=str, default=None)
parser.add_argument('--export_json', action='store_true', help='export json')
parser.add_argument('--run_colmap', action='store_true', help='export json')
parser.add_argument('--split', action='store_true', help='export json')
args = parser.parse_args()
if args.run_colmap:
init_colmap(args)
if args.export_json:
dtu_to_json(args)
================================================
FILE: process_data/convert_tnt_to_json.py
================================================
import os
import numpy as np
import json
import sys
from pathlib import Path
from argparse import ArgumentParser
import trimesh
dir_path = Path(os.path.dirname(os.path.realpath(__file__))).parents[0]
sys.path.append(dir_path.__str__())
from process_data.convert_data_to_json import export_to_json, get_split_dict, compute_oriented_bound # NOQA
from submodules.colmap.scripts.python.database import COLMAPDatabase # NOQA
from submodules.colmap.scripts.python.read_write_model import rotmat2qvec # NOQA
def create_init_files(pinhole_dict_file, db_file, out_dir):
# Partially adapted from https://github.com/Kai-46/nerfplusplus/blob/master/colmap_runner/run_colmap_posed.py
if not os.path.exists(out_dir):
os.mkdir(out_dir)
# create template
with open(pinhole_dict_file) as fp:
pinhole_dict = json.load(fp)
template = {}
cameras_line_template = '{camera_id} RADIAL {width} {height} {f} {cx} {cy} {k1} {k2}\n'
images_line_template = '{image_id} {qw} {qx} {qy} {qz} {tx} {ty} {tz} {camera_id} {image_name}\n\n'
for img_name in pinhole_dict:
# w, h, fx, fy, cx, cy, qvec, t
params = pinhole_dict[img_name]
w = params[0]
h = params[1]
fx = params[2]
# fy = params[3]
cx = params[4]
cy = params[5]
qvec = params[6:10]
tvec = params[10:13]
cam_line = cameras_line_template.format(
camera_id="{camera_id}", width=w, height=h, f=fx, cx=cx, cy=cy, k1=0, k2=0)
img_line = images_line_template.format(image_id="{image_id}", qw=qvec[0], qx=qvec[1], qy=qvec[2], qz=qvec[3],
tx=tvec[0], ty=tvec[1], tz=tvec[2], camera_id="{camera_id}",
image_name=img_name)
template[img_name] = (cam_line, img_line)
# read database
db = COLMAPDatabase.connect(db_file)
table_images = db.execute("SELECT * FROM images")
img_name2id_dict = {}
for row in table_images:
img_name2id_dict[row[1]] = row[0]
cameras_txt_lines = [template[img_name][0].format(camera_id=1)]
images_txt_lines = []
for img_name, img_id in img_name2id_dict.items():
image_line = template[img_name][1].format(image_id=img_id, camera_id=1)
images_txt_lines.append(image_line)
with open(os.path.join(out_dir, 'cameras.txt'), 'w') as fp:
fp.writelines(cameras_txt_lines)
with open(os.path.join(out_dir, 'images.txt'), 'w') as fp:
fp.writelines(images_txt_lines)
fp.write('\n')
# create an empty points3D.txt
fp = open(os.path.join(out_dir, 'points3D.txt'), 'w')
fp.close()
def convert_cam_dict_to_pinhole_dict(cam_dict, pinhole_dict_file):
# Partially adapted from https://github.com/Kai-46/nerfplusplus/blob/master/colmap_runner/run_colmap_posed.py
print('Writing pinhole_dict to: ', pinhole_dict_file)
h = 1080
w = 1920
pinhole_dict = {}
for img_name in cam_dict:
W2C = cam_dict[img_name]
# params
fx = 0.6 * w
fy = 0.6 * w
cx = w / 2.0
cy = h / 2.0
qvec = rotmat2qvec(W2C[:3, :3])
tvec = W2C[:3, 3]
params = [w, h, fx, fy, cx, cy,
qvec[0], qvec[1], qvec[2], qvec[3],
tvec[0], tvec[1], tvec[2]]
pinhole_dict[img_name] = params
with open(pinhole_dict_file, 'w') as fp:
json.dump(pinhole_dict, fp, indent=2, sort_keys=True)
def load_COLMAP_poses(cam_file, img_dir, tf='w2c'):
# load img_dir namges
names = sorted(os.listdir(img_dir))
with open(cam_file) as f:
lines = f.readlines()
# C2W
poses = {}
for idx, line in enumerate(lines):
if idx % 5 == 0: # header
img_idx, valid, _ = line.split(' ')
if valid != '-1':
poses[int(img_idx)] = np.eye(4)
poses[int(img_idx)]
else:
if int(img_idx) in poses:
num = np.array([float(n) for n in line.split(' ')])
poses[int(img_idx)][idx % 5-1, :] = num
if tf == 'c2w':
return poses
else:
# convert to W2C (follow nerf convention)
poses_w2c = {}
for k, v in poses.items():
poses_w2c[names[k]] = np.linalg.inv(v)
return poses_w2c
def load_transformation(trans_file):
with open(trans_file) as f:
lines = f.readlines()
trans = np.eye(4)
for idx, line in enumerate(lines):
num = np.array([float(n) for n in line.split(' ')])
trans[idx, :] = num
return trans
def align_gt_with_cam(pts, trans):
trans_inv = np.linalg.inv(trans)
pts_aligned = pts @ trans_inv[:3, :3].transpose(-1, -2) + trans_inv[:3, -1]
return pts_aligned
def compute_bound(pts):
bounding_box = np.array([pts.min(axis=0), pts.max(axis=0)])
center = bounding_box.mean(axis=0)
# sphere radius
# scale = np.max(np.linalg.norm(pts - center, axis=-1)) * 1.01
# cube
# scale = (np.abs(pts - center).max(0) * 1.2).tolist() # cuboid for street
scale = (np.abs(pts - center).max(0) * 1.).tolist() # cuboid for street
return center, scale, bounding_box.T.tolist()
def init_colmap(args):
assert args.tnt_path, "Provide path to Tanks and Temples dataset"
scene_list = os.listdir(args.tnt_path)
if 'Church' in scene_list: scene_list.remove('Church')
scene_list = sorted(scene_list)
for scene in scene_list:
scene_path = os.path.join(args.tnt_path, scene)
if args.run_colmap:
if not os.path.exists(f"{scene_path}/images_raw"):
raise Exception(f"'images_raw` folder cannot be found in {scene_path}."
"Please check the expected folder structure in DATA_PREPROCESSING.md")
# extract features
os.system(f"colmap feature_extractor --database_path {scene_path}/database.db \
--image_path {scene_path}/images_raw \
--ImageReader.camera_model=RADIAL \
--SiftExtraction.use_gpu=true \
--SiftExtraction.num_threads=32 \
--ImageReader.single_camera=true"
)
# match features
os.system(f"colmap sequential_matcher \
--database_path {scene_path}/database.db \
--SiftMatching.use_gpu=true"
)
# read poses
poses = load_COLMAP_poses(os.path.join(scene_path, f'{scene}_COLMAP_SfM.log'),
os.path.join(scene_path, 'images_raw'))
# convert to colmap files
pinhole_dict_file = os.path.join(scene_path, 'pinhole_dict.json')
convert_cam_dict_to_pinhole_dict(poses, pinhole_dict_file)
db_file = os.path.join(scene_path, 'database.db')
sfm_dir = os.path.join(scene_path, 'sparse')
create_init_files(pinhole_dict_file, db_file, sfm_dir)
# bundle adjustment
os.system(f"colmap point_triangulator \
--database_path {scene_path}/database.db \
--image_path {scene_path}/images_raw \
--input_path {scene_path}/sparse \
--output_path {scene_path}/sparse \
--Mapper.tri_ignore_two_view_tracks=true"
)
os.system(f"colmap bundle_adjuster \
--input_path {scene_path}/sparse \
--output_path {scene_path}/sparse \
--BundleAdjustment.refine_extrinsics=false"
)
# undistortion
os.system(f"colmap image_undistorter \
--image_path {scene_path}/images_raw \
--input_path {scene_path}/sparse \
--output_path {scene_path} \
--output_type COLMAP \
--max_image_size 1500"
)
if args.export_json:
# read for bounding information
trans = load_transformation(os.path.join(scene_path, f'{scene}_trans.txt'))
pts = trimesh.load(os.path.join(scene_path, f'{scene}.ply'))
# pts = pts.vertices
# pts_aligned = align_gt_with_cam(pts, trans)
# center, scale, bounding_box = compute_bound(pts_aligned[::100])
pts.vertices = align_gt_with_cam(pts.vertices, trans)
# pts = pts.sample(20000)
pts.vertices = pts.vertices[::100]
trans, scale = compute_oriented_bound(pts)
split_dict = get_split_dict(scene_path)
export_to_json(trans, scale, scene_path, 'meta.json', split_dict=split_dict)
print('Writing data to json file: ', os.path.join(scene_path, 'meta.json'))
if __name__ == '__main__':
parser = ArgumentParser()
parser.add_argument('--tnt_path', type=str, default=None, help='Path to tanks and temples dataset')
parser.add_argument('--run_colmap', action='store_true', help='Run colmap')
parser.add_argument('--export_json', action='store_true', help='export json')
args = parser.parse_args()
init_colmap(args)
================================================
FILE: process_data/extract_mask.py
================================================
import argparse
import os
import gc
import sys
import numpy as np
import json
import torch
from PIL import Image
from tqdm import tqdm
import torch.nn.functional as F
# segment anything
from segment_anything import (
sam_model_registry,
sam_hq_model_registry,
SamPredictor
)
import cv2
import numpy as np
import matplotlib.pyplot as plt
sys.path.append(os.getcwd())
from tools.semantic_id import text_label_dict
text_prompt_dict = {
'indoor': 'window.floor.',
'outdoor': 'sky.',
}
def load_image(image_path):
# load image
image_pil = Image.open(image_path).convert("RGB") # load image
transform = T.Compose(
[
T.RandomResize([800], max_size=1333),
T.ToTensor(),
T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
]
)
image, _ = transform(image_pil, None) # 3, h, w
return image_pil, image
def print_(a):
pass
def load_model(model_config_path, model_checkpoint_path, device):
args = SLConfig.fromfile(model_config_path)
args.device = device
model = build_model(args)
checkpoint = torch.load(model_checkpoint_path, map_location="cpu")
load_res = model.load_state_dict(clean_state_dict(checkpoint["model"]), strict=False)
print(load_res)
_ = model.eval()
return model
def get_grounding_output(model, image, caption, box_threshold, text_threshold, with_logits=True, device="cpu"):
caption = caption.lower()
caption = caption.strip()
if not caption.endswith("."):
caption = caption + "."
model = model.to(device)
image = image.to(device)
with torch.no_grad():
outputs = model(image[None], captions=[caption])
logits = outputs["pred_logits"].cpu().sigmoid()[0] # (nq, 256)
boxes = outputs["pred_boxes"].cpu()[0] # (nq, 4)
logits.shape[0]
# filter output
logits_filt = logits.clone()
boxes_filt = boxes.clone()
filt_mask = logits_filt.max(dim=1)[0] > box_threshold
logits_filt = logits_filt[filt_mask] # num_filt, 256
boxes_filt = boxes_filt[filt_mask] # num_filt, 4
logits_filt.shape[0]
# get phrase
tokenlizer = model.tokenizer
tokenized = tokenlizer(caption)
# build pred
pred_phrases = []
for logit, box in zip(logits_filt, boxes_filt):
pred_phrase = get_phrases_from_posmap(logit > text_threshold, tokenized, tokenlizer)
if with_logits:
pred_phrases.append(pred_phrase + f"({str(logit.max().item())[:4]})")
else:
pred_phrases.append(pred_phrase)
return boxes_filt, pred_phrases
def show_mask(mask, ax, random_color=False):
if random_color:
color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0)
else:
color = np.array([30/255, 144/255, 255/255, 0.6])
h, w = mask.shape[-2:]
mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
ax.imshow(mask_image)
def show_box(box, ax, label):
x0, y0 = box[0], box[1]
w, h = box[2] - box[0], box[3] - box[1]
ax.add_patch(plt.Rectangle((x0, y0), w, h, edgecolor='green', facecolor=(0,0,0,0), lw=2))
ax.text(x0, y0, label)
def save_mask_data(output_dir, mask_list, box_list, label_list, name):
value = 1
mask_img = torch.ones(mask_list.shape[-2:]) * value
for idx, mask in enumerate(mask_list):
if len(label_list) == 0: break
sem = label_list[idx].split('(')[0]
try:
mask_img[mask.cpu().numpy()[0] == True] = text_label_dict.get(sem, value)
except KeyError:
import pdb; pdb.set_trace()
mask_img = mask_img.numpy().astype(np.uint8)
cv2.imwrite(os.path.join(output_dir, f'{name}.png'), mask_img)
def morphology_open(x, k1=21, k2=21):
out = x.float()[None]
p1 = (k1 - 1) // 2
out = -F.max_pool2d(-out, kernel_size=k1, stride=1, padding=p1)
out = F.max_pool2d(out, kernel_size=k1, stride=1, padding=p1)
return out
def process_image(image_name):
name = image_name.split('.')[0]
image_path = os.path.join(image_dir, image_name)
# load image
image_pil, image = load_image(image_path)
# visualize raw image
# image_pil.save(os.path.join(output_dir, "raw_image.jpg"))
# run grounding dino model
boxes_filt, pred_phrases = get_grounding_output(
model, image, text_prompt, box_threshold, text_threshold, device=device
)
image = cv2.imread(image_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
predictor.set_image(image)
size = image_pil.size
H, W = size[1], size[0]
for i in range(boxes_filt.size(0)):
boxes_filt[i] = boxes_filt[i] * torch.Tensor([W, H, W, H])
boxes_filt[i][:2] -= boxes_filt[i][2:] / 2
boxes_filt[i][2:] += boxes_filt[i][:2]
boxes_filt = boxes_filt.cpu()
transformed_boxes = predictor.transform.apply_boxes_torch(boxes_filt, image.shape[:2]).to(device)
with torch.no_grad():
try:
masks, _, _ = predictor.predict_torch(
point_coords = None,
point_labels = None,
boxes = transformed_boxes.to(device),
multimask_output = False,
)
except RuntimeError:
print(f"Error in {name}")
masks = torch.zeros([1, 1, H, W]).to(device).bool()
masks = masks.cpu()
if args.vis:
# draw output image
plt.figure(figsize=(10, 10))
plt.imshow(image)
for mask in masks:
show_mask(mask.cpu().numpy(), plt.gca(), random_color=True)
for box, label in zip(boxes_filt, pred_phrases):
show_box(box.numpy(), plt.gca(), label)
plt.axis('off')
plt.savefig(
os.path.join(output_dir, f"{name}_output.png"),
bbox_inches="tight", dpi=100, pad_inches=0.0
)
plt.close() # important!!! close the plot to release memory
save_mask_data(output_dir, masks, boxes_filt, pred_phrases, name)
if __name__ == "__main__":
parser = argparse.ArgumentParser("Grounded-Segment-Anything Demo", add_help=True)
parser.add_argument("--config", type=str, required=True, help="path to config file")
parser.add_argument(
"--grounded_checkpoint", type=str, required=True, help="path to checkpoint file"
)
parser.add_argument(
"--sam_version", type=str, default="vit_h", required=False, help="SAM ViT version: vit_b / vit_l / vit_h"
)
parser.add_argument(
"--sam_checkpoint", type=str, required=False, help="path to sam checkpoint file"
)
parser.add_argument(
"--sam_hq_checkpoint", type=str, default=None, help="path to sam-hq checkpoint file"
)
parser.add_argument(
"--use_sam_hq", action="store_true", help="using sam-hq for prediction"
)
parser.add_argument("--input_image", type=str, required=True, help="path to image file")
parser.add_argument("--text_prompt", type=str, default=None, help="text prompt")
parser.add_argument("--scene_type", type=str, choices=['indoor', 'outdoor'], help="text prompt")
parser.add_argument("--scene", type=str, default=None, help="text prompt")
parser.add_argument(
"--output_dir", "-o", type=str, default="outputs", required=True, help="output directory"
)
parser.add_argument("--box_threshold", type=float, default=0.3, help="box threshold")
parser.add_argument("--text_threshold", type=float, default=0.25, help="text threshold")
parser.add_argument("--gsam_path", dest="gsam_path", help="path to gsam")
parser.add_argument('--vis', action='store_true', help='visualize the output')
parser.add_argument("--device", type=str, default="cpu", help="running on cpu only!, default=False")
args = parser.parse_args()
gsam_path = args.gsam_path
sys.path.append(args.gsam_path)
sys.path.append(os.path.join(gsam_path, "GroundingDINO"))
sys.path.append(os.path.join(gsam_path, "segment_anything"))
# Grounding DINO
import GroundingDINO.groundingdino.datasets.transforms as T
from GroundingDINO.groundingdino.models import build_model
from GroundingDINO.groundingdino.util.slconfig import SLConfig
from GroundingDINO.groundingdino.util.utils import clean_state_dict, get_phrases_from_posmap
# print = print_
seed = 0
np.random.seed(seed)
torch.manual_seed(seed) # sets seed on the current CPU & all GPUs
# cfg
config_file = args.config # change the path of the model config file
grounded_checkpoint = args.grounded_checkpoint # change the path of the model
sam_version = args.sam_version
sam_checkpoint = args.sam_checkpoint
sam_hq_checkpoint = args.sam_hq_checkpoint
use_sam_hq = args.use_sam_hq
image_dir = args.input_image
if args.text_prompt is not None:
text_prompt = args.text_prompt
else:
text_prompt = text_prompt_dict[args.scene_type]
if args.scene is not None:
text_prompt = text_prompt_dict.get(args.scene, text_prompt_dict[args.scene_type])
output_dir = args.output_dir
box_threshold = args.box_threshold
text_threshold = args.text_threshold
device = args.device
# make dir
os.makedirs(output_dir, exist_ok=True)
# load model
model = load_model(config_file, grounded_checkpoint, device=device)
image_names = os.listdir(image_dir)
image_names = sorted([i for i in image_names if i.endswith(".jpg") or i.endswith(".png")])
# initialize SAM
if use_sam_hq:
predictor = SamPredictor(sam_hq_model_registry[sam_version](checkpoint=sam_hq_checkpoint).to(device))
else:
predictor = SamPredictor(sam_model_registry[sam_version](checkpoint=sam_checkpoint).to(device))
for image_name in tqdm(image_names):
process_image(image_name)
================================================
FILE: process_data/extract_normal.py
================================================
import os
import sys
import glob
import math
import struct
import argparse
import numpy as np
import collections
import torch
import torch.nn.functional as F
from torchvision import transforms
from PIL import Image, ImageFile
from tqdm import tqdm
ImageFile.LOAD_TRUNCATED_IMAGES = True
sys.path.append(os.getcwd())
from tools.general_utils import set_random_seed
Camera = collections.namedtuple(
"Camera", ["id", "model", "width", "height", "params"])
CameraModel = collections.namedtuple(
"CameraModel", ["model_id", "model_name", "num_params"])
CAMERA_MODELS = {
CameraModel(model_id=0, model_name="SIMPLE_PINHOLE", num_params=3),
CameraModel(model_id=1, model_name="PINHOLE", num_params=4),
CameraModel(model_id=2, model_name="SIMPLE_RADIAL", num_params=4),
CameraModel(model_id=3, model_name="RADIAL", num_params=5),
CameraModel(model_id=4, model_name="OPENCV", num_params=8),
CameraModel(model_id=5, model_name="OPENCV_FISHEYE", num_params=8),
CameraModel(model_id=6, model_name="FULL_OPENCV", num_params=12),
CameraModel(model_id=7, model_name="FOV", num_params=5),
CameraModel(model_id=8, model_name="SIMPLE_RADIAL_FISHEYE", num_params=4),
CameraModel(model_id=9, model_name="RADIAL_FISHEYE", num_params=5),
CameraModel(model_id=10, model_name="THIN_PRISM_FISHEYE", num_params=12)
}
CAMERA_MODEL_IDS = dict([(camera_model.model_id, camera_model)
for camera_model in CAMERA_MODELS])
CAMERA_MODEL_NAMES = dict([(camera_model.model_name, camera_model)
for camera_model in CAMERA_MODELS])
def get_args(test=False):
parser = get_default_parser()
#↓↓↓↓
#NOTE: project-specific args
parser.add_argument('--NNET_architecture', type=str, default='v02')
parser.add_argument('--NNET_output_dim', type=int, default=3, help='{3, 4}')
parser.add_argument('--NNET_output_type', type=str, default='R', help='{R, G}')
parser.add_argument('--NNET_feature_dim', type=int, default=64)
parser.add_argument('--NNET_hidden_dim', type=int, default=64)
parser.add_argument('--NNET_encoder_B', type=int, default=5)
parser.add_argument('--NNET_decoder_NF', type=int, default=2048)
parser.add_argument('--NNET_decoder_BN', default=False, action="store_true")
parser.add_argument('--NNET_decoder_down', type=int, default=8)
parser.add_argument('--NNET_learned_upsampling', default=False, action="store_true")
parser.add_argument('--NRN_prop_ps', type=int, default=5)
parser.add_argument('--NRN_num_iter_train', type=int, default=5)
parser.add_argument('--NRN_num_iter_test', type=int, default=5)
parser.add_argument('--NRN_ray_relu', default=False, action="store_true")
parser.add_argument('--loss_fn', type=str, default='AL')
parser.add_argument('--loss_gamma', type=float, default=0.8)
parser.add_argument('--outdir', type=str, default='/your/log/path/')
#↑↑↑↑
# read arguments from txt file
assert '.txt' in sys.argv[1]
arg_filename_with_prefix = '@' + sys.argv[1]
args = parser.parse_args([arg_filename_with_prefix] + sys.argv[2:])
#↓↓↓↓
#NOTE: update args
args.exp_root = os.path.join(args.outdir, 'dsine')
args.load_normal = True
args.load_intrins = True
#↑↑↑↑
# set working dir
exp_dir = os.path.join(args.exp_root, args.exp_name)
args.output_dir = os.path.join(exp_dir, args.exp_id)
return args
def focal2fov(focal, pixels):
return 2*math.atan(pixels/(2*focal))
def read_next_bytes(fid, num_bytes, format_char_sequence, endian_character="<"):
"""Read and unpack the next bytes from a binary file.
:param fid:
:param num_bytes: Sum of combination of {2, 4, 8}, e.g. 2, 6, 16, 30, etc.
:param format_char_sequence: List of {c, e, f, d, h, H, i, I, l, L, q, Q}.
:param endian_character: Any of {@, =, <, >, !}
:return: Tuple of read and unpacked values.
"""
data = fid.read(num_bytes)
return struct.unpack(endian_character + format_char_sequence, data)
def read_intrinsics_binary(path_to_model_file):
"""
see: src/base/reconstruction.cc
void Reconstruction::WriteCamerasBinary(const std::string& path)
void Reconstruction::ReadCamerasBinary(const std::string& path)
"""
cameras = {}
with open(path_to_model_file, "rb") as fid:
num_cameras = read_next_bytes(fid, 8, "Q")[0]
for _ in range(num_cameras):
camera_properties = read_next_bytes(
fid, num_bytes=24, format_char_sequence="iiQQ")
camera_id = camera_properties[0]
model_id = camera_properties[1]
model_name = CAMERA_MODEL_IDS[camera_properties[1]].model_name
width = camera_properties[2]
height = camera_properties[3]
num_params = CAMERA_MODEL_IDS[model_id].num_params
params = read_next_bytes(fid, num_bytes=8*num_params,
format_char_sequence="d"*num_params)
cameras[camera_id] = Camera(id=camera_id,
model=model_name,
width=width,
height=height,
params=np.array(params))
assert len(cameras) == num_cameras
return cameras
def read_intrinsics_text(path):
"""
Taken from https://github.com/colmap/colmap/blob/dev/scripts/python/read_write_model.py
"""
cameras = {}
with open(path, "r") as fid:
while True:
line = fid.readline()
if not line:
break
line = line.strip()
if len(line) > 0 and line[0] != "#":
elems = line.split()
camera_id = int(elems[0])
model = elems[1]
assert model == "PINHOLE", "While the loader support other types, the rest of the code assumes PINHOLE"
width = int(elems[2])
height = int(elems[3])
params = np.array(tuple(map(float, elems[4:])))
cameras[camera_id] = Camera(id=camera_id, model=model,
width=width, height=height,
params=params)
return cameras
def load_intrinsic_colmap(path):
intr_dir = os.path.join(path, "sparse", "0")
if not os.path.exists(intr_dir):
intr_dir = os.path.join(path, "sparse")
# support only one camera for now
try:
cameras_intrinsic_file = os.path.join(intr_dir, "cameras.bin")
cam_intrinsics = read_intrinsics_binary(cameras_intrinsic_file)
except:
cameras_intrinsic_file = os.path.join(intr_dir, "cameras.txt")
cam_intrinsics = read_intrinsics_text(cameras_intrinsic_file)
intrinsics = []
for idx, key in enumerate(cam_intrinsics):
intrinsic = np.eye(3)
intrinsic = torch.eye(3, dtype=torch.float32)
intr = cam_intrinsics[key]
height = intr.height
width = intr.width
if intr.model=="SIMPLE_PINHOLE":
focal_length_x = intr.params[0]
FovY = focal2fov(focal_length_x, height)
FovX = focal2fov(focal_length_x, width)
elif intr.model=="PINHOLE":
focal_length_x = intr.params[0]
focal_length_y = intr.params[1]
FovY = focal2fov(focal_length_y, height)
FovX = focal2fov(focal_length_x, width)
else:
assert False, "Colmap camera model not handled: only undistorted datasets (PINHOLE or SIMPLE_PINHOLE cameras) supported!"
intrinsic[0, 0] = focal_length_x # FovX
intrinsic[1, 1] = focal_length_y # FovY
intrinsic[0, 2] = width / 2
intrinsic[1, 2] = height / 2
intrinsics.append(intrinsic)
intrinsics = torch.stack(intrinsics, axis=0)
return intrinsics
def test_samples(args, model, intrins=None, device='cpu'):
img_paths = glob.glob(f'{args.img_path}/*.png') + glob.glob(f'{args.img_path}/*.jpg') + glob.glob(f'{args.img_path}/*.JPG')
img_paths.sort()
# normalize
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
intrin = load_intrinsic_colmap(args.intrins_path).to(device)
os.makedirs(args.output_path, exist_ok=True)
with torch.no_grad():
for img_path in tqdm(img_paths):
ext = os.path.splitext(img_path)[1]
img = Image.open(img_path).convert('RGB')
img = np.array(img).astype(np.float32) / 255.0
img = torch.from_numpy(img).permute(2, 0, 1).unsqueeze(0).to(device)
_, _, orig_H, orig_W = img.shape
# zero-pad the input image so that both the width and height are multiples of 32
lrtb = utils.get_padding(orig_H, orig_W)
img = F.pad(img, lrtb, mode="constant", value=0.0)
img = normalize(img)
intrins = intrin.clone()
intrins[:, 0, 2] += lrtb[0]
intrins[:, 1, 2] += lrtb[2]
pred_norm = model(img, intrins=intrins)[-1]
pred_norm = pred_norm[:, :, lrtb[2]:lrtb[2]+orig_H, lrtb[0]:lrtb[0]+orig_W]
# save to output folder
img_name = os.path.basename(img_path)
# NOTE: by saving the prediction as uint8 png format, you lose a lot of precision
# if you want to use the predicted normals for downstream tasks, we recommend saving them as float32 NPY files
pred_norm_np = pred_norm.cpu().detach().numpy()[0,:,:,:].transpose(1, 2, 0) # (H, W, 3) -1, 1
if args.vis:
pred_norm_np = ((pred_norm_np + 1.0) / 2.0 * 255.0).astype(np.uint8)
target_path = os.path.join(args.output_path, img_name.replace(ext, '.png'))
im = Image.fromarray(pred_norm_np)
im.save(target_path)
else:
target_path = os.path.join(args.output_path, img_name.replace(ext, '.npz'))
np.savez_compressed(target_path, pred_norm_np.astype(np.float16))
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--ckpt', default='dsine', type=str, help='path to model checkpoint')
parser.add_argument('--mode', default='samples', type=str, help='{samples}')
parser.add_argument("--dsine_path", dest="dsine_path", help="path to rgb image")
parser.add_argument("--img_path", dest="img_path", help="path to rgb image")
parser.add_argument("--intrins_path", dest="intrins_path", help="path to rgb image")
parser.add_argument("--output_path", dest="output_path", help="path to where output image should be stored")
parser.add_argument('--vis', action='store_true', help='visualize the output')
args = parser.parse_args()
dsine_path = args.dsine_path
dsine_path = os.path.abspath(dsine_path)
sys.path.append(dsine_path)
# define model
device = torch.device('cuda')
set_random_seed(0)
import utils.utils as utils
from projects import get_default_parser
from models.dsine.v02 import DSINE_v02 as DSINE
cfg_path = f'{args.dsine_path}/projects/dsine/experiments/exp001_cvpr2024/dsine.txt'
sys.argv = [sys.argv[0], cfg_path]
cfg = get_args(test=True)
model = DSINE(cfg).to(device)
model.pixel_coords = model.pixel_coords.to(device)
model = utils.load_checkpoint(args.ckpt, model)
model.eval()
# # # Load the normal predictor model from torch hub
# model = torch.hub.load("hugoycj/DSINE-hub", "DSINE", trust_repo=True)
if args.mode == 'samples':
test_samples(args, model, intrins=None, device=device)
================================================
FILE: process_data/extract_normal_geo.py
================================================
# A reimplemented version in public environments by Xiao Fu and Mu Hu
import os
import sys
import logging
import argparse
import numpy as np
import torch
from PIL import Image, ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
from tqdm.auto import tqdm
if __name__=="__main__":
logging.basicConfig(level=logging.INFO)
'''Set the Args'''
parser = argparse.ArgumentParser(
description="Run MonoDepthNormal Estimation using Stable Diffusion."
)
parser.add_argument("--code_path", help="path to code directory", type=str,
default="~/code/geowizard/geowizard")
parser.add_argument(
"--pretrained_model_path",
type=str,
default='lemonaddie/geowizard',
help="pretrained model path from hugging face or local dir",
)
parser.add_argument(
"--input_dir", type=str, required=True, help="Input directory."
)
parser.add_argument(
"--output_dir", type=str, required=True, help="Output directory."
)
parser.add_argument(
"--domain",
type=str,
default='indoor',
required=True,
help="domain prediction",
)
# inference setting
parser.add_argument(
"--denoise_steps",
type=int,
default=10,
help="Diffusion denoising steps, more steps results in higher accuracy but slower inference speed.",
)
parser.add_argument(
"--ensemble_size",
type=int,
default=10,
help="Number of predictions to be ensembled, more inference gives better results but runs slower.",
)
parser.add_argument(
"--half_precision",
action="store_true",
help="Run with half-precision (16-bit float), might lead to suboptimal result.",
)
# resolution setting
parser.add_argument(
"--processing_res",
type=int,
default=768,
help="Maximum resolution of processing. 0 for using input image resolution. Default: 768.",
)
parser.add_argument(
"--output_processing_res",
action="store_true",
help="When input is resized, out put depth at resized operating resolution. Default: False.",
)
# depth map colormap
parser.add_argument(
"--color_map",
type=str,
default="Spectral",
help="Colormap used to render depth predictions.",
)
# other settings
parser.add_argument("--seed", type=int, default=None, help="Random seed.")
parser.add_argument(
"--batch_size",
type=int,
default=0,
help="Inference batch size. Default: 0 (will be set automatically).",
)
args = parser.parse_args()
sys.path.append(args.code_path)
from models.geowizard_pipeline import DepthNormalEstimationPipeline
from utils.seed_all import seed_all
from utils.depth2normal import *
checkpoint_path = args.pretrained_model_path
output_dir = args.output_dir
denoise_steps = args.denoise_steps
ensemble_size = args.ensemble_size
if ensemble_size>15:
logging.warning("long ensemble steps, low speed..")
half_precision = args.half_precision
processing_res = args.processing_res
match_input_res = not args.output_processing_res
domain = args.domain
color_map = args.color_map
seed = args.seed
batch_size = args.batch_size
if batch_size==0:
batch_size = 1 # set default batchsize
# -------------------- Preparation --------------------
# Random seed
if seed is None:
import time
seed = int(time.time())
seed_all(seed)
# Output directories
output_dir_color = os.path.join(output_dir, f"depth_colored_{domain}")
# output_dir_npy = os.path.join(output_dir, "depth_npy")
# output_dir_normal_npy = os.path.join(output_dir, "normal_npy")
output_dir_npy = os.path.join(output_dir, f"depth_npz_{domain}")
output_dir_normal_npy = os.path.join(output_dir, f"normal_npz_{domain}")
output_dir_normal_color = os.path.join(output_dir, f"normal_colored_{domain}")
os.makedirs(output_dir, exist_ok=True)
os.makedirs(output_dir_color, exist_ok=True)
os.makedirs(output_dir_npy, exist_ok=True)
os.makedirs(output_dir_normal_npy, exist_ok=True)
os.makedirs(output_dir_normal_color, exist_ok=True)
logging.info(f"output dir = {output_dir}")
# -------------------- Device --------------------
if torch.cuda.is_available():
device = torch.device("cuda")
else:
device = torch.device("cpu")
logging.warning("CUDA is not available. Running on CPU will be slow.")
logging.info(f"device = {device}")
# -------------------- Data --------------------
input_dir = args.input_dir
test_files = sorted(os.listdir(input_dir))
n_images = len(test_files)
if n_images > 0:
logging.info(f"Found {n_images} images")
else:
logging.error(f"No image found")
exit(1)
# -------------------- Model --------------------
if half_precision:
dtype = torch.float16
logging.info(f"Running with half precision ({dtype}).")
else:
dtype = torch.float32
# declare a pipeline
pipe = DepthNormalEstimationPipeline.from_pretrained(checkpoint_path, torch_dtype=dtype)
logging.info("loading pipeline whole successfully.")
try:
pipe.enable_xformers_memory_efficient_attention()
except:
pass # run without xformers
pipe = pipe.to(device)
# -------------------- Inference and saving --------------------
with torch.no_grad():
os.makedirs(output_dir, exist_ok=True)
for test_file in tqdm(test_files, desc="Estimating Depth & Normal", leave=True):
rgb_path = os.path.join(input_dir, test_file)
rgb_name_base = os.path.splitext(os.path.basename(rgb_path))[0]
pred_name_base = rgb_name_base # + "_pred"
normal_npz_save_path = os.path.join(output_dir_normal_npy, f"{pred_name_base}.npz")
if os.path.exists(normal_npz_save_path):
continue
# logging.warning(f"Existing file: '{normal_npz_save_path}' will be overwritten")
# Read input image
input_image = Image.open(rgb_path)
# predict the depth here
pipe_out = pipe(input_image,
denoising_steps = denoise_steps,
ensemble_size= ensemble_size,
processing_res = processing_res,
match_input_res = match_input_res,
domain = domain,
color_map = color_map,
show_progress_bar = False,
)
depth_pred: np.ndarray = pipe_out.depth_np
depth_colored: Image.Image = pipe_out.depth_colored
normal_pred: np.ndarray = pipe_out.normal_np
normal_colored: Image.Image = pipe_out.normal_colored
# Save as npy
# npy_save_path = os.path.join(output_dir_npy, f"{pred_name_base}.npy")
npy_save_path = os.path.join(output_dir_npy, f"{pred_name_base}.npz")
if os.path.exists(npy_save_path):
logging.warning(f"Existing file: '{npy_save_path}' will be overwritten")
# np.save(npy_save_path, depth_pred)
np.savez_compressed(npy_save_path, depth_pred)
# normal_npy_save_path = os.path.join(output_dir_normal_npy, f"{pred_name_base}.npy")
normal_npz_save_path = os.path.join(output_dir_normal_npy, f"{pred_name_base}.npz")
if os.path.exists(normal_npz_save_path):
logging.warning(f"Existing file: '{normal_npz_save_path}' will be overwritten")
# np.save(normal_npy_save_path, normal_pred)
np.savez_compressed(normal_npz_save_path, normal_pred)
# Colorize
# depth_colored_save_path = os.path.join(output_dir_color, f"{pred_name_base}_colored.png")
depth_colored_save_path = os.path.join(output_dir_color, f"{pred_name_base}.png")
if os.path.exists(depth_colored_save_path):
logging.warning(
f"Existing file: '{depth_colored_save_path}' will be overwritten"
)
depth_colored.save(depth_colored_save_path)
normal_colored_save_path = os.path.join(output_dir_normal_color, f"{pred_name_base}_colored.png")
if os.path.exists(normal_colored_save_path):
logging.warning(
f"Existing file: '{normal_colored_save_path}' will be overwritten"
)
normal_colored.save(normal_colored_save_path)
================================================
FILE: process_data/visualize_colmap.ipynb
================================================
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "8b8d7b17-af50-42cd-b531-ef61c49c9e61",
"metadata": {},
"outputs": [],
"source": [
"# Set the work directory to the imaginaire root.\n",
"import os, sys, time\n",
"import pathlib\n",
"\n",
"root_dir = pathlib.Path().absolute().parents[0]\n",
"os.chdir(root_dir)\n",
"print(f\"Root Directory Path: {root_dir}\")"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "2b5b9e2f-841c-4815-92e0-0c76ed46da62",
"metadata": {},
"outputs": [],
"source": [
"# Import Python libraries.\n",
"import numpy as np\n",
"import torch\n",
"import k3d\n",
"import json\n",
"import trimesh\n",
"import plotly.graph_objs as go\n",
"from collections import OrderedDict\n",
"# Import imaginaire modules.\n",
"from submodules.colmap.scripts.python.read_write_model import read_model\n",
"# from tools import camera, visualize\n",
"from tools.camera import quaternion\n",
"from tools.visualize import k3d_visualize_pose, plotly_visualize_pose\n",
"from process_data.convert_tnt_to_json import load_transformation, align_gt_with_cam\n",
"from tools.camera_utils import cubic_camera, grid_camera, around_camera, up_camera, bb_camera\n",
"from tools.math_utils import inv_normalize_pts"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "76033016-2d92-4a5d-9e50-3978553e8df4",
"metadata": {},
"outputs": [],
"source": [
"# Read the COLMAP data.\n",
"# colmap_path = \"datasets/lego_ds2\"\n",
"scene = 'Barn'\n",
"colmap_path = f\"/your/path/tnt/{scene}\"\n",
"# read piont clouds from lidar # point cloud\n",
"pcd = trimesh.load(os.path.join(colmap_path, '{}.ply'.format(colmap_path.split('/')[-1])))\n",
"# scene = 'c49a8c6cff'\n",
"# colmap_path = f\"/your/path/ScanNet++/{scene}/dslr\"\n",
"# pcd = trimesh.load(os.path.join(colmap_path, '../scans/mesh_aligned_0.05.ply'))\n",
"view_sample_camera = False\n",
"cameras, images, points_3D = read_model(path=f\"{colmap_path}/sparse\", ext=\".bin\") # w2c extrinsics\n",
"# Convert camera poses.\n",
"images = OrderedDict(sorted(images.items()))\n",
"qvecs = torch.from_numpy(np.stack([image.qvec for image in images.values()]))\n",
"tvecs = torch.from_numpy(np.stack([image.tvec for image in images.values()]))\n",
"# Rs = camera.quaternion.q_to_R(qvecs)\n",
"Rs = quaternion.q_to_R(qvecs)\n",
"poses = torch.cat([Rs, tvecs[..., None]], dim=-1) # [N,3,4] w2c\n",
"print(f\"# images: {len(poses)}\")\n",
"print(\"camera height: {}\".format(poses[:, 1, 3].mean()))\n",
"\n",
"# # Get the sparse 3D points and the colors. colmap\n",
"# xyzs = torch.from_numpy(np.stack([point.xyz for point in points_3D.values()]))\n",
"# rgbs = np.stack([point.rgb for point in points_3D.values()])\n",
"# rgbs_int32 = (rgbs[:, 0] * 2**16 + rgbs[:, 1] * 2**8 + rgbs[:, 2]).astype(np.uint32)\n",
"# print(f\"# points: {len(xyzs)}\")\n",
"\n",
"\n",
"if os.path.exists(os.path.join(colmap_path, f'{scene}_trans.txt')):\n",
" trans = load_transformation(os.path.join(colmap_path, f'{scene}_trans.txt'))\n",
" pcd.vertices = align_gt_with_cam(pcd.vertices, trans)\n",
" \n",
"xyzs = pcd.vertices[::500]\n",
"# xyzs = pcd.vertices\n",
"rgbs = np.random.randint(0, 255, xyzs.shape)\n",
"rgbs_int32 = (rgbs[:, 0] * 2**16 + rgbs[:, 1] * 2**8 + rgbs[:, 2]).astype(np.uint32)\n",
"print(f\"# points: {len(xyzs)}\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "47862ee1-286c-4877-a181-4b33b7733719",
"metadata": {},
"outputs": [],
"source": [
"vis_depth = 0.2"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "b6cf60ec-fe6a-43ba-9aaf-e3c7afd88208",
"metadata": {},
"outputs": [],
"source": [
"# Visualize the bounding sphere.\n",
"json_fname = f\"{colmap_path}/meta.json\"\n",
"with open(json_fname) as file:\n",
" meta = json.load(file)\n",
"trans = np.array(meta[\"trans\"])\n",
"scale = np.array(meta[\"scale\"])\n",
"# ------------------------------------------------------------------------------------\n",
"# These variables can be adjusted to make the bounding sphere fit the region of interest.\n",
"# The adjusted values can then be set in the config as data.readjust.center and data.readjust.scale\n",
"readjust_center = np.array([0., 0., 0.])\n",
"readjust_scale = np.array([1., 1., 1.]) # * 1.1\n",
"# save adjusted values\n",
"readjust = {\n",
" 'scale': readjust_scale.tolist(),\n",
" 'trans': readjust_center.tolist()\n",
"}\n",
"redjust_fname = f'{colmap_path}/readjust.json'\n",
"with open(redjust_fname, \"w\") as outputfile:\n",
" json.dump(readjust, outputfile, indent=2)\n",
"# ------------------------------------------------------------------------------------\n",
"if trans.ndim == 1:\n",
" trans += readjust_center\n",
"scale *= readjust_scale\n",
"# Make some points to hallucinate a bounding sphere.\n",
"# sphere_points = np.random.randn(100000, 3)\n",
"sphere_points = np.random.rand(100000, 3) * 2 - 1\n",
"# sphere_points = sphere_points / np.linalg.norm(sphere_points, axis=-1, keepdims=True) # Unit sphere\n",
"# sphere_points[:, 0] = -1 # up\n",
"for i in range(3): sphere_points[i::3, i] = sphere_points[i::3, i] / np.abs(sphere_points[i::3, i]) # Unit cube\n",
"sphere_points = np.concatenate([sphere_points, np.zeros([1, 3])], axis=0) # center point\n",
"# sphere_points[-1, 0] = 5\n",
"\n",
"sphere_points = inv_normalize_pts(sphere_points, trans, scale)\n",
"\n",
"# sphere_points[:, 1] = -1.1\n",
"\n",
"# sample up cameras\n",
"if view_sample_camera:\n",
" height = poses[:, 1, 3].mean()\n",
" # height = -1\n",
" # sample_poses = cubic_camera(200, trans, scale)\n",
" # sample_poses = around_camera(500, trans, scale, height)\n",
" # sample_poses = bb_camera(500, trans, scale, height, up=False, around=True)\n",
" sample_poses = bb_camera(200, trans, scale, height=height, up=True, around=True, bidirect=True) # , look_mode='direction'\n",
" # sample_poses = up_camera(500, trans, scale)\n",
" # sample_poses = grid_camera(trans, scale)\n",
"\n",
" # sample_poses = torch.from_numpy(poses[:, :3])\n",
" sample_poses = sample_poses[:, :3]\n",
"\n",
" # poses = torch.cat([poses, sample_poses], dim=0)\n",
" poses = sample_poses # [::6]\n",
" # print(f\"# poses: {len(poses)}\")\n",
"\n",
" # print(f\"center: {trans[:3, 3:].T}\")\n",
" # print(f\"scale: {scale}\")\n",
" # print(\"up: {}\".format(trans[1, 3] - scale[1] * 0.5))\n",
" # print(f\"max: {sphere_points.max(0)}\")\n",
" # print(f\"min: {sphere_points.min(0)}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e986aed0-1aaf-4772-937c-136db7f2eaec",
"metadata": {},
"outputs": [],
"source": [
"# You can choose to visualize with Plotly...\n",
"x, y, z = *xyzs.T,\n",
"colors = rgbs / 255.0\n",
"sphere_x, sphere_y, sphere_z = *sphere_points.T,\n",
"sphere_colors = [\"#4488ff\"] * len(sphere_points)\n",
"sphere_size = [0.5] * len(sphere_points)\n",
"sphere_colors[-1] = \"#ff0000\" # #ff4444 center point\n",
"# sphere_size[-1] = 5\n",
"# traces_poses = visualize.plotly_visualize_pose(poses, vis_depth=vis_depth, xyz_length=0.02, center_size=0.01, xyz_width=0.005, mesh_opacity=0.05)\n",
"traces_poses = plotly_visualize_pose(poses, vis_depth=vis_depth, xyz_length=0.02, center_size=0.01, xyz_width=0.005, mesh_opacity=0.05)\n",
"trace_points = go.Scatter3d(x=x, y=y, z=z, mode=\"markers\", marker=dict(size=0.4, color=colors, opacity=0.7), hoverinfo=\"skip\")\n",
"trace_sphere = go.Scatter3d(x=sphere_x, y=sphere_y, z=sphere_z, mode=\"markers\", marker=dict(size=sphere_size, color=sphere_colors, opacity=0.7), hoverinfo=\"skip\")\n",
"traces_all = traces_poses + [trace_points, trace_sphere]\n",
"layout = go.Layout(scene=dict(xaxis=dict(showspikes=False, backgroundcolor=\"rgba(0,0,0,0)\", gridcolor=\"rgba(0,0,0,0.1)\"),\n",
" yaxis=dict(showspikes=False, backgroundcolor=\"rgba(0,0,0,0)\", gridcolor=\"rgba(0,0,0,0.1)\"),\n",
" zaxis=dict(showspikes=False, backgroundcolor=\"rgba(0,0,0,0)\", gridcolor=\"rgba(0,0,0,0.1)\"),\n",
" xaxis_title=\"X\", yaxis_title=\"Y\", zaxis_title=\"Z\", dragmode=\"orbit\",\n",
" aspectratio=dict(x=1, y=1, z=1), aspectmode=\"data\"), height=800)\n",
"fig = go.Figure(data=traces_all, layout=layout)\n",
"fig.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fdde170b-4546-4617-9162-a9fcb936347d",
"metadata": {},
"outputs": [],
"source": [
"# ... or visualize with K3D.\n",
"plot = k3d.plot(name=\"poses\", height=800, camera_rotate_speed=5.0, camera_zoom_speed=3.0, camera_pan_speed=1.0)\n",
"# k3d_objects = visualize.k3d_visualize_pose(poses, vis_depth=vis_depth, xyz_length=0.02, center_size=0.01, xyz_width=0.005, mesh_opacity=0.05)\n",
"k3d_objects = k3d_visualize_pose(poses, vis_depth=vis_depth, xyz_length=0.02, center_size=0.01, xyz_width=0.005, mesh_opacity=0.05)\n",
"for k3d_object in k3d_objects:\n",
" plot += k3d_object\n",
"plot += k3d.points(xyzs, colors=rgbs_int32, point_size=0.02, shader=\"flat\")\n",
"plot += k3d.points(sphere_points, color=0x4488ff, point_size=0.01, shader=\"flat\")\n",
"plot.display()\n",
"plot.camera_fov = 30.0"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
================================================
FILE: process_data/visualize_transforms.ipynb
================================================
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "8b8d7b17-af50-42cd-b531-ef61c49c9e61",
"metadata": {},
"outputs": [],
"source": [
"# Set the work directory to the imaginaire root.\n",
"import os, sys, time\n",
"import pathlib\n",
"root_dir = pathlib.Path().absolute().parents[2]\n",
"os.chdir(root_dir)\n",
"print(f\"Root Directory Path: {root_dir}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2b5b9e2f-841c-4815-92e0-0c76ed46da62",
"metadata": {},
"outputs": [],
"source": [
"# Import Python libraries.\n",
"import numpy as np\n",
"import torch\n",
"import k3d\n",
"import json\n",
"from collections import OrderedDict\n",
"# Import imaginaire modules.\n",
"from projects.nerf.utils import camera, visualize\n",
"from third_party.colmap.scripts.python.read_write_model import read_model"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "97bedecf-da68-44b1-96cf-580ef7e7f3f0",
"metadata": {},
"outputs": [],
"source": [
"# Read the COLMAP data.\n",
"colmap_path = \"datasets/lego_ds2\"\n",
"json_fname = f\"{colmap_path}/transforms.json\"\n",
"with open(json_fname) as file:\n",
" meta = json.load(file)\n",
"center = meta[\"sphere_center\"]\n",
"radius = meta[\"sphere_radius\"]\n",
"# Convert camera poses.\n",
"poses = []\n",
"for frame in meta[\"frames\"]:\n",
" c2w = torch.tensor(frame[\"transform_matrix\"])\n",
" c2w[:, 1:3] *= -1\n",
" w2c = c2w.inverse()\n",
" pose = w2c[:3] # [3,4]\n",
" poses.append(pose)\n",
"poses = torch.stack(poses, dim=0)\n",
"print(f\"# images: {len(poses)}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2016d20c-1e58-407f-9810-cbe76dc5ccec",
"metadata": {},
"outputs": [],
"source": [
"vis_depth = 0.2\n",
"k3d_textures = []"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d7168a09-6654-4660-b140-66b9dfd6f1e8",
"metadata": {},
"outputs": [],
"source": [
"# (optional) visualize the images.\n",
"# This block can be skipped if we don't want to visualize the image observations.\n",
"for i, frame in enumerate(meta[\"frames\"]):\n",
" image_fname = frame[\"file_path\"]\n",
" image_path = f\"{colmap_path}/{image_fname}\"\n",
" with open(image_path, \"rb\") as file:\n",
" binary = file.read()\n",
" # Compute the corresponding image corners in 3D.\n",
" pose = poses[i]\n",
" corners = torch.tensor([[-0.5, 0.5, 1], [0.5, 0.5, 1], [-0.5, -0.5, 1]])\n",
" corners *= vis_depth\n",
" corners = camera.cam2world(corners, pose)\n",
" puv = [corners[0].tolist(), (corners[1]-corners[0]).tolist(), (corners[2]-corners[0]).tolist()]\n",
" k3d_texture = k3d.texture(binary, file_format=\"jpg\", puv=puv)\n",
" k3d_textures.append(k3d_texture)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b6cf60ec-fe6a-43ba-9aaf-e3c7afd88208",
"metadata": {},
"outputs": [],
"source": [
"# Visualize the bounding sphere.\n",
"json_fname = f\"{colmap_path}/transforms.json\"\n",
"with open(json_fname) as file:\n",
" meta = json.load(file)\n",
"center = meta[\"sphere_center\"]\n",
"radius = meta[\"sphere_radius\"]\n",
"# ------------------------------------------------------------------------------------\n",
"# These variables can be adjusted to make the bounding sphere fit the region of interest.\n",
"# The adjusted values can then be set in the config as data.readjust.center and data.readjust.scale\n",
"readjust_center = np.array([0., 0., 0.])\n",
"readjust_scale = 1.\n",
"# ------------------------------------------------------------------------------------\n",
"center += readjust_center\n",
"radius *= readjust_scale\n",
"# Make some points to hallucinate a bounding sphere.\n",
"sphere_points = np.random.randn(100000, 3)\n",
"sphere_points = sphere_points / np.linalg.norm(sphere_points, axis=-1, keepdims=True)\n",
"sphere_points = sphere_points * radius + center"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fdde170b-4546-4617-9162-a9fcb936347d",
"metadata": {},
"outputs": [],
"source": [
"# Visualize with K3D.\n",
"plot = k3d.plot(name=\"poses\", height=800, camera_rotate_speed=5.0, camera_zoom_speed=3.0, camera_pan_speed=1.0)\n",
"k3d_objects = visualize.k3d_visualize_pose(poses, vis_depth=vis_depth, xyz_length=0.02, center_size=0.01, xyz_width=0.005, mesh_opacity=0.)\n",
"for k3d_object in k3d_objects:\n",
" plot += k3d_object\n",
"for k3d_texture in k3d_textures:\n",
" plot += k3d_texture\n",
"plot += k3d.points(sphere_points, color=0x4488ff, point_size=0.01, shader=\"flat\")\n",
"plot.display()\n",
"plot.camera_fov = 30.0"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
================================================
FILE: pyproject.toml
================================================
[tool.black]
line-length = 240
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"
[project]
name = "vcr-gaus"
version = "0.0.0.dev0"
description = "VCR-GauS: View Consistent Depth-Normal Regularizer for Gaussian Surface Reconstruction"
readme = "README.md"
requires-python = ">=3.8"
classifiers = [
"Programming Language :: Python :: 3",
"License :: OSI Approved :: Apache Software License",
]
[project.optional-dependencies]
f1eval = [
"open3d==0.10.0",
"numpy"
]
train = [
"torch==2.0.1",
"torchvision==0.15.2",
"torchaudio==2.0.2",
"numpy==1.26.1",
"open3d",
"plyfile",
"ninja",
"GPUtil",
"opencv-python",
"lpips",
"trimesh",
"pymeshlab",
"termcolor",
"wandb",
"imageio",
"scikit-image",
"torchmetrics",
"mediapy",
]
[project.urls]
"Homepage" = "https://hlinchen.github.io/projects/VCR-GauS/"
"Bug Tracker" = "https://github.com/HLinChen/VCR-GauS/issues"
[tool.setuptools.packages.find]
include = ["vcr*", "trl*"]
exclude = [
"assets*",
"benchmark*",
"docs",
"dist*",
"playground*",
"scripts*",
"tests*",
"checkpoints*",
"project_checkpoints*",
"debug_checkpoints*",
"mlx_configs*",
"wandb*",
"notebooks*",
]
[tool.wheel]
exclude = [
"assets*",
"benchmark*",
"docs",
"dist*",
"playground*",
"scripts*",
"tests*",
"checkpoints*",
"project_checkpoints*",
"debug_checkpoints*",
"mlx_configs*",
"wandb*",
"notebooks*",
]
================================================
FILE: python_scripts/run_base.py
================================================
import os
import time
import GPUtil
def worker(gpu, scene, factor, fn):
print(f"Starting job on GPU {gpu} with scene {scene}\n")
fn(gpu, scene, factor)
print(f"Finished job on GPU {gpu} with scene {scene}\n")
# This worker function starts a job and returns when it's done.
def dispatch_jobs(jobs, executor, excluded_gpus, fn):
future_to_job = {}
reserved_gpus = set() # GPUs that are slated for work but may not be active yet
while jobs or future_to_job:
# Get the list of available GPUs, not including those that are reserved.
all_available_gpus = set(GPUtil.getAvailable(order="first", limit=10, maxMemory=0.1, maxLoad=0.1))
available_gpus = list(all_available_gpus - reserved_gpus - excluded_gpus)
# Launch new jobs on available GPUs
while available_gpus and jobs:
gpu = available_gpus.pop(0)
job = jobs.pop(0)
future = executor.submit(worker, gpu, *job, fn) # Unpacking job as arguments to worker
future_to_job[future] = (gpu, job)
reserved_gpus.add(gpu) # Reserve this GPU until the job starts processing
# Check for completed jobs and remove them from the list of running jobs.
# Also, release the GPUs they were using.
done_futures = [future for future in future_to_job if future.done()]
for future in done_futures:
job = future_to_job.pop(future) # Remove the job associated with the completed future
gpu = job[0] # The GPU is the first element in each job tuple
reserved_gpus.discard(gpu) # Release this GPU
print(f"Job {job} has finished., rellasing GPU {gpu}")
# (Optional) You might want to introduce a small delay here to prevent this loop from spinning very fast
# when there are no GPUs available.
time.sleep(5)
print("All jobs have been processed.")
def check_finish(scene, path, type='mesh'):
if not os.path.exists(path):
print(f"Scene \033[1;31m{scene}\033[0m failed in \033[1;31m{type}\033[0m")
return False
return True
train_cmd = "OMP_NUM_THREADS=4 CUDA_VISIBLE_DEVICES={gpu} \
python train.py \
--config=configs/{dataset}/{cfg}.yaml \
--logdir={log_dir} \
--model.source_path={data_dir}/{scene}/ \
--train.debug_from={debug_from} \
--model.data_device={data_device} \
--model.resolution={resolution} \
--wandb \
--wandb_name {project}"
train_cmd_new = "OMP_NUM_THREADS=4 CUDA_VISIBLE_DEVICES={gpu} \
python train.py \
--config={cfg} \
--logdir={log_dir} \
--model.source_path={data_dir}/{scene}/ \
--train.debug_from={debug_from} \
--model.data_device={data_device} \
--model.resolution={resolution} \
--wandb \
--wandb_name {project}"
extract_mesh_cmd = "OMP_NUM_THREADS=4 CUDA_VISIBLE_DEVICES={gpu} \
python tools/depth2mesh.py \
--mesh_name {ply} \
--split {step} \
--method {fuse_method} \
--voxel_size {voxel_size} \
--num_cluster {num_cluster} \
--max_depth {max_depth} \
--clean \
--prob_thres {prob_thr} \
--cfg_path {log_dir}/config.yaml"
eval_tnt_cmd = "OMP_NUM_THREADS={num_threads} CUDA_VISIBLE_DEVICES={gpu} \
conda run -n {eval_env} \
python evaluation/tnt_eval/run.py \
--dataset-dir {data_dir}/{scene}/ \
--traj-path {data_dir}/{scene}/{scene}_COLMAP_SfM.log \
--ply-path {log_dir}/{ply} > {log_dir}/fscore.txt"
eval_cd_cmd = "OMP_NUM_THREADS={num_threads} CUDA_VISIBLE_DEVICES={gpu} \
python evaluation/eval_dtu/evaluate_single_scene.py \
--input_mesh {tri_mesh_path} \
--scan_id {scan_id} --output_dir {output_dir} \
--mask_dir {data_dir} \
--DTU {data_dir}"
render_cmd = "CUDA_VISIBLE_DEVICES={gpu} \
python evaluation/render.py \
--cfg_path {log_dir}/config.yaml \
--iteration 30000 \
--skip_train"
eval_psnr_cmd = "CUDA_VISIBLE_DEVICES={gpu} \
python evaluation/metrics.py \
--cfg_path {log_dir}/config.yaml"
eval_replica_cmd = "OMP_NUM_THREADS={num_threads} CUDA_VISIBLE_DEVICES={gpu} \
python evaluation/replica_eval/evaluate_single_scene.py \
--input_mesh {tri_mesh_path} \
--scene {scene} \
--output_dir {output_dir} \
--data_dir {data_dir}"
================================================
FILE: python_scripts/run_dtu.py
================================================
# training scripts for the TNT datasets
import os
import sys
import time
from concurrent.futures import ThreadPoolExecutor
sys.path.append(os.getcwd())
from python_scripts.run_base import dispatch_jobs, train_cmd, extract_mesh_cmd, eval_cd_cmd, check_finish
from python_scripts.show_dtu import show_matrix
TRIAL_NAME = 'vcr_gaus'
PROJECT = 'vcr_gaus'
PROJECT_wandb = 'vcr_gaus_dtu'
DATASET = 'dtu'
base_dir = "/your/path"
output_dir = f"{base_dir}/output/{PROJECT}/{DATASET}"
data_dir = f"{base_dir}/data/DTU_mask"
do_train = False
do_extract_mesh = False
do_cd = True
dry_run = False
node = 0
max_workers = 15
be = node*max_workers
excluded_gpus = set([])
total_list = [
'scan24', 'scan37', 'scan40', 'scan55', 'scan63', 'scan65', 'scan69',
'scan83', 'scan97', 'scan105', 'scan106', 'scan110', 'scan114', 'scan118', 'scan122'
]
training_list = [
'scan24', 'scan37', 'scan40', 'scan55', 'scan63', 'scan65', 'scan69',
'scan83', 'scan97', 'scan105', 'scan106', 'scan110', 'scan114', 'scan118', 'scan122'
]
training_list = training_list[be: be + max_workers]
scenes = training_list
factors = [-1] * len(scenes)
debug_from = -1
eval_env = 'pt'
data_device = 'cuda'
voxel_size = 0.004
step = 1
PLY = f"ours.ply"
TOTAL_THREADS = 64
NUM_THREADS = TOTAL_THREADS // max_workers
prob_thr = 0.15
num_cluster = 1
max_depth = 3
fuse_method = 'tsdf_cpu'
jobs = list(zip(scenes, factors))
def train_scene(gpu, scene, factor):
time.sleep(2*gpu)
os.system('ulimit -n 9000')
log_dir = f"{output_dir}/{scene}/{TRIAL_NAME}"
fail = 0
if not dry_run:
if do_train:
cmd = train_cmd.format(gpu=gpu, dataset=DATASET, cfg='base',
scene=scene, log_dir=log_dir,
data_dir=data_dir, debug_from=debug_from,
data_device=data_device, resolution=factor, project=PROJECT_wandb)
print(cmd)
fail = os.system(cmd)
if fail == 0:
if not dry_run:
# fusion
if do_extract_mesh:
if not check_finish(scene, f"{log_dir}/point_cloud", 'train'): return False
cmd = extract_mesh_cmd.format(gpu=gpu, ply=PLY, step=step, fuse_method=fuse_method, voxel_size=voxel_size, num_cluster=num_cluster, max_depth=max_depth, log_dir=log_dir, prob_thr=prob_thr)
fail = os.system(cmd)
print(cmd)
# evaluation
# evaluate the mesh
scan_id = scene[4:]
cmd = eval_cd_cmd.format(num_threads=NUM_THREADS, gpu=gpu, tri_mesh_path=f'{log_dir}/{PLY}', scan_id=scan_id, output_dir=log_dir, data_dir=data_dir)
if fail == 0:
if not dry_run:
if do_cd:
if not check_finish(scene, f"{log_dir}/{PLY}", 'mesh'): return False
print(cmd)
fail = os.system(cmd)
if not check_finish(scene, f"{log_dir}/results.json", 'cd'): return False
return fail == 0
# Using ThreadPoolExecutor to manage the thread pool
with ThreadPoolExecutor(max_workers) as executor:
dispatch_jobs(jobs, executor, excluded_gpus, train_scene)
show_matrix(total_list, [output_dir], TRIAL_NAME)
print(TRIAL_NAME, " done")
================================================
FILE: python_scripts/run_mipnerf360.py
================================================
# Training script for the Mip-NeRF 360 dataset
import os
import sys
import time
from concurrent.futures import ThreadPoolExecutor
sys.path.append(os.getcwd())
from python_scripts.run_base import dispatch_jobs, train_cmd, extract_mesh_cmd, check_finish, render_cmd, eval_psnr_cmd
from python_scripts.show_360 import show_matrix
TRIAL_NAME = 'vcr_gaus'
PROJECT = 'vcr_gaus'
PROJECT_wandb = 'vcr_gaus_360'
do_train = True
do_render = True
do_eval = True
do_extract_mesh = True
dry_run = False
node = 0
max_workers = 9
be = node*max_workers
excluded_gpus = set([])
total_list = [
"bicycle", "bonsai", "counter", "flowers", "garden", "stump", "treehill", "kitchen", "room"
]
training_list = [
"bicycle", "bonsai", "counter", "flowers", "garden", "stump", "treehill", "kitchen", "room"
]
training_list = training_list[be: be + max_workers]
scenes = training_list
factors = [-1] * len(scenes)
debug_from = -1
DATASET = '360_v2'
eval_env = 'pt'
data_device = 'cpu'
step = 1
max_depth = 6.0
voxel_size = 8e-3
PLY = f"fused_mesh_split{step}.ply"
TOTAL_THREADS = 64
NUM_THREADS = TOTAL_THREADS // max_workers
prob_thr = 0.15
num_cluster = 1000
fuse_method = 'tsdf'
base_dir = "/your/path"
output_dir = f"{base_dir}/output/{PROJECT}/{DATASET}"
data_dir = f"{base_dir}/data/{DATASET}"
jobs = list(zip(scenes, factors))
def train_scene(gpu, scene, factor):
time.sleep(2*gpu)
os.system('ulimit -n 9000')
log_dir = f"{output_dir}/{scene}/{TRIAL_NAME}"
fail = 0
if not dry_run:
if do_train:
cmd = train_cmd.format(gpu=gpu, dataset=DATASET, cfg='base',
scene=scene, log_dir=log_dir,
data_dir=data_dir, debug_from=debug_from,
data_device=data_device, resolution=factor, project=PROJECT_wandb)
print(cmd)
fail = os.system(cmd)
if fail == 0:
if not dry_run:
# render
cmd = render_cmd.format(gpu=gpu, log_dir=log_dir)
if fail == 0:
if not dry_run:
if do_render:
print(cmd)
fail = os.system(cmd)
if not check_finish(scene, f"{log_dir}/test/ours_30000/renders", 'render'): return False
# eval
cmd = eval_psnr_cmd.format(gpu=gpu, log_dir=log_dir)
if fail == 0:
if not dry_run:
if do_eval:
print(cmd)
fail = os.system(cmd)
if not check_finish(scene, f"{log_dir}/results.json", 'eval'): return False
# fusion
if do_extract_mesh:
if not check_finish(scene, f"{log_dir}/point_cloud", 'train'): return False
cmd = extract_mesh_cmd.format(gpu=gpu, ply=PLY, step=step, fuse_method=fuse_method, voxel_size=voxel_size, num_cluster=num_cluster, max_depth=max_depth, log_dir=log_dir, prob_thr=prob_thr)
fail = os.system(cmd)
print(cmd)
return fail == 0
# Using ThreadPoolExecutor to manage the thread pool
with ThreadPoolExecutor(max_workers) as executor:
dispatch_jobs(jobs, executor, excluded_gpus, train_scene)
show_matrix(total_list, [output_dir], TRIAL_NAME)
print(TRIAL_NAME, " done")
================================================
FILE: python_scripts/run_tnt.py
================================================
# training scripts for the TNT datasets
import os
import sys
import time
from concurrent.futures import ThreadPoolExecutor
sys.path.append(os.getcwd())
from python_scripts.run_base import dispatch_jobs, train_cmd, extract_mesh_cmd, eval_tnt_cmd, check_finish
from python_scripts.show_tnt import show_matrix
TRIAL_NAME = 'vcr_gaus'
PROJECT = 'vcr_gaus'
DATASET = 'tnt'
base_dir = "/your/path"
output_dir = f"{base_dir}/output/{PROJECT}/{DATASET}"
data_dir = f"{base_dir}/data/{DATASET}"
do_train = True
do_extract_mesh = True
do_f1 = True
dry_run = False
node = 0
max_workers = 4
be = node*max_workers
excluded_gpus = set([])
total_list = [
'Barn', 'Caterpillar', 'Courthouse', 'Ignatius',
'Meetingroom', 'Truck'
]
training_list = [
'Barn', 'Caterpillar', 'Courthouse', 'Ignatius',
'Meetingroom', 'Truck'
]
training_list = training_list[be: be + max_workers]
scenes = training_list
factors = [1] * len(scenes)
debug_from = -1 # enable wandb
eval_env = 'f1eval'
data_device = 'cpu'
step = 3
voxel_size = [0.02, 0.015, 0.01] + [x / 1000.0 for x in range(2, 10, 1)][::-1]
voxel_size = sorted(voxel_size)
PLY = f"ours.ply"
TOTAL_THREADS = 128
NUM_THREADS = TOTAL_THREADS // max_workers
prob_thr = 0.3
num_cluster = 1000
fuse_method = 'tsdf'
max_depth = 8
jobs = list(zip(scenes, factors))
def train_scene(gpu, scene, factor):
time.sleep(2*gpu)
os.system('ulimit -n 9000')
log_dir = f"{output_dir}/{scene}/{TRIAL_NAME}"
fail = 0
if not dry_run:
if do_train:
cmd = train_cmd.format(gpu=gpu, dataset=DATASET, cfg=scene,
scene=scene, log_dir=log_dir,
data_dir=data_dir, debug_from=debug_from,
data_device=data_device, resolution=factor, project=PROJECT)
print(cmd)
fail = os.system(cmd)
if fail == 0:
if not dry_run:
# fusion
if do_extract_mesh:
if not check_finish(scene, f"{log_dir}/point_cloud", 'train'): return False
for vs in voxel_size:
cmd = extract_mesh_cmd.format(gpu=gpu, ply=PLY, step=step, fuse_method=fuse_method, voxel_size=vs, num_cluster=num_cluster, max_depth=max_depth, log_dir=log_dir, prob_thr=prob_thr)
fail = os.system(cmd)
if fail == 0: break
print(cmd)
# evaluation
# You need to install open3d==0.9 for evaluation
# evaluate the mesh
cmd = eval_tnt_cmd.format(num_threads=NUM_THREADS, gpu=gpu, eval_env=eval_env, data_dir=data_dir, scene=scene, log_dir=log_dir, ply=PLY)
if fail == 0:
if not dry_run:
if do_f1:
if not check_finish(scene, f"{log_dir}/{PLY}", 'mesh'): return False
print(cmd)
fail = os.system(cmd)
if not check_finish(scene, f"{log_dir}/evaluation/evaluation.txt", 'f1'): return False
# return True
return fail == 0
# Using ThreadPoolExecutor to manage the thread pool
with ThreadPoolExecutor(max_workers) as executor:
dispatch_jobs(jobs, executor, excluded_gpus, train_scene)
show_matrix(total_list, [output_dir], TRIAL_NAME)
print(TRIAL_NAME, " done")
================================================
FILE: python_scripts/show_360.py
================================================
import json
import numpy as np
scenes = ['bicycle', 'flowers', 'garden', 'stump', 'treehill', 'room', 'counter', 'kitchen', 'bonsai']
output_dirs = ["exp_360/release"]
outdoor_scenes = ["bicycle", "flowers", "garden", "stump", "treehill"]
indoor_scenes = ["room", "counter", "kitchen", "bonsai"]
all_metrics = {"PSNR": [], "SSIM": [], "LPIPS": [], 'scene': []}
indoor_metrics = {"PSNR": [], "SSIM": [], "LPIPS": [], 'scene': []}
outdoor_metrics = {"PSNR": [], "SSIM": [], "LPIPS": [], 'scene': []}
TRIAL_NAME = 'vcr_gaus'
def show_matrix(scenes, output_dirs, TRIAL_NAME):
for scene in scenes:
for output in output_dirs:
json_file = f"{output}/{scene}/{TRIAL_NAME}/results.json"
data = json.load(open(json_file))
data = data['ours_30000']
for k in ["PSNR", "SSIM", "LPIPS"]:
all_metrics[k].append(data[k])
if scene in indoor_scenes:
indoor_metrics[k].append(data[k])
else:
outdoor_metrics[k].append(data[k])
all_metrics['scene'].append(scene)
if scene in indoor_scenes:
indoor_metrics['scene'].append(scene)
else:
outdoor_metrics['scene'].append(scene)
latex = []
for k in ["PSNR", "SSIM", "LPIPS"]:
numbers = np.asarray(all_metrics[k]).mean(axis=0).tolist()
numbers = [numbers]
if k == "PSNR":
numbers = [f"{x:.2f}" for x in numbers]
else:
numbers = [f"{x:.3f}" for x in numbers]
latex.extend([k+': ', numbers[-1]+' '])
indoor_latex = []
for k in ["PSNR", "SSIM", "LPIPS"]:
numbers = np.asarray(indoor_metrics[k]).mean(axis=0).tolist()
numbers = [numbers]
if k == "PSNR":
numbers = [f"{x:.2f}" for x in numbers]
else:
numbers = [f"{x:.3f}" for x in numbers]
indoor_latex.extend([k+': ', numbers[-1]+' '])
outdoor_latex = []
for k in ["PSNR", "SSIM", "LPIPS"]:
numbers = np.asarray(outdoor_metrics[k]).mean(axis=0).tolist()
numbers = [numbers]
if k == "PSNR":
numbers = [f"{x:.2f}" for x in numbers]
else:
numbers = [f"{x:.3f}" for x in numbers]
outdoor_latex.extend([k+': ', numbers[-1]+' '])
print('Outdoor scenes')
for i in range(len(outdoor_metrics['scene'])):
print('PSNR: {:.3f}, SSIM: {:.3f}, LPIPS: {:.3f}, scene: {}'.format(outdoor_metrics['PSNR'][i], outdoor_metrics['SSIM'][i], outdoor_metrics['LPIPS'][i], outdoor_metrics['scene'][i]))
print('Indoor scenes')
for i in range(len(indoor_metrics['scene'])):
print('PSNR: {:.3f}, SSIM: {:.3f}, LPIPS: {:.3f}, scene: {}'.format(indoor_metrics['PSNR'][i], indoor_metrics['SSIM'][i], indoor_metrics['LPIPS'][i], indoor_metrics['scene'][i]))
print('Outdoor:')
print("".join(outdoor_latex))
print('Indoor:')
print("".join(indoor_latex))
if __name__ == "__main__":
show_matrix(scenes, output_dirs, TRIAL_NAME)
================================================
FILE: python_scripts/show_dtu.py
================================================
import os
import json
import numpy as np
scenes = [24, 37, 40, 55, 63, 65, 69, 83, 97, 105, 106, 110, 114, 118, 122]
output_dirs = ["exp_dtu/release"]
TRIAL_NAME = 'vcr_gaus'
def show_matrix_old(scenes, output_dirs, TRIAL_NAME):
all_metrics = {"mean_d2s": [], "mean_s2d": [], "overall": []}
print(output_dirs)
for scene in scenes:
print(scene,end=" ")
for output in output_dirs:
json_file = f"{output}/scan{scene}/test/ours_30000/tsdf/results.json"
data = json.load(open(json_file))
for k in ["mean_d2s", "mean_s2d", "overall"]:
all_metrics[k].append(data[k])
print(f"{data[k]:.3f}", end=" ")
print()
latex = []
for k in ["mean_d2s", "mean_s2d", "overall"]:
numbers = np.asarray(all_metrics[k]).mean(axis=0).tolist()
numbers = all_metrics[k] + [numbers]
numbers = [f"{x:.2f}" for x in numbers]
if k == "overall":
latex.extend(numbers)
print(" & ".join(latex))
def show_matrix(scenes, output_dirs, TRIAL_NAME):
all_metrics = {"mean_d2s": [], "mean_s2d": [], "overall": [], 'scene': []}
for scene in scenes:
for output in output_dirs:
json_file = f"{output}/{scene}/{TRIAL_NAME}/results.json"
if not os.path.exists(json_file):
print(f"Scene \033[1;31m{scene}\033[0m was not evaluated.")
continue
data = json.load(open(json_file))
for k in ["mean_d2s", "mean_s2d", "overall"]:
all_metrics[k].append(data[k])
all_metrics['scene'].append(scene)
latex = []
for k in ["mean_d2s", "mean_s2d", "overall"]:
numbers = np.asarray(all_metrics[k]).mean(axis=0).tolist()
numbers = all_metrics[k] + [numbers]
numbers = [f"{x:.2f}" for x in numbers]
latex.extend([k+': ', numbers[-1]+' '])
for i in range(len(all_metrics['scene'])):
print('d2s: {:.3f}, s2d: {:.3f}, overall: {:.3f}, scene: {}'.format(all_metrics['mean_d2s'][i], all_metrics['mean_s2d'][i], all_metrics['overall'][i], all_metrics['scene'][i]))
print("".join(latex))
if __name__ == "__main__":
show_matrix(scenes, output_dirs, TRIAL_NAME)
================================================
FILE: python_scripts/show_tnt.py
================================================
import os
import numpy as np
training_list = [
'Barn', 'Caterpillar', 'Courthouse', 'Ignatius', 'Meetingroom', 'Truck'
]
scenes = training_list
DATASET = 'tnt'
base_dir = "/your/log/path/"
TRIAL_NAME = 'vcr_gaus'
PROJECT = 'sq_gs'
output_dirs = [f"{base_dir}/{PROJECT}/{DATASET}"]
def show_matrix(scenes, output_dirs, TRIAL_NAME):
all_metrics = {"precision": [], "recall": [], "f-score": [], 'scene': []}
for scene in scenes:
for output in output_dirs:
# precision
eval_file = os.path.join(output, scene, f"{TRIAL_NAME}/evaluation/evaluation.txt")
if not os.path.exists(eval_file):
print(f"Scene \033[1;31m{scene}\033[0m was not evaluated.")
continue
with open(eval_file, 'r') as f:
matrix = f.readlines()
precision = float(matrix[2].split(" ")[-1])
recall = float(matrix[3].split(" ")[-1])
f_score = float(matrix[4].split(" ")[-1])
all_metrics["precision"].append(precision)
all_metrics["recall"].append(recall)
all_metrics["f-score"].append(f_score)
all_metrics['scene'].append(scene)
latex = []
for k in ["precision","recall", "f-score"]:
numbers = all_metrics[k]
mean = np.mean(numbers)
numbers = numbers + [mean]
numbers = [f"{x:.3f}" for x in numbers]
latex.extend([k+': ', numbers[-1]+' '])
for i in range(len(all_metrics['scene'])):
print('precision: {:.3f}, recall: {:.3f}, f-score: {:.3f}, scene: {}'.format(all_metrics['precision'][i], all_metrics['recall'][i], all_metrics['f-score'][i], all_metrics['scene'][i]))
print("".join(latex))
return
if __name__ == "__main__":
show_matrix(scenes, output_dirs, TRIAL_NAME)
================================================
FILE: requirements.txt
================================================
submodules/diff-gaussian-rasterization
submodules/simple-knn/
git+https://github.com/facebookresearch/pytorch3d.git@stable
================================================
FILE: scene/__init__.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import os
import random
import json
import torch
from arguments import ModelParams
from scene.gaussian_model import GaussianModel
from tools.system_utils import searchForMaxIteration
from scene.dataset_readers import sceneLoadTypeCallbacks
from tools.camera_utils import cameraList_from_camInfos, camera_to_JSON
from tools.graphics_utils import get_all_px_dir
class Scene:
gaussians : GaussianModel
def __init__(self, args : ModelParams, gaussians : GaussianModel, load_iteration=None, shuffle=True, resolution_scales=[1.0]):
"""b
:param path: Path to colmap scene main folder.
"""
self.model_path = args.model_path
self.loaded_iter = None
self.gaussians = gaussians
self.split = args.split
load_depth = args.load_depth
load_normal = args.load_normal
load_mask = args.load_mask
if load_iteration:
if load_iteration == -1:
self.loaded_iter = searchForMaxIteration(os.path.join(self.model_path, "point_cloud"))
else:
self.loaded_iter = load_iteration
print("Loading trained model at iteration {}".format(self.loaded_iter))
self.train_cameras = {}
self.test_cameras = {}
if os.path.exists(os.path.join(args.source_path, "sparse")):
scene_info = sceneLoadTypeCallbacks["Colmap"](args.source_path, args.images, args.eval, args.llffhold, args.ratio, split=self.split, load_depth=load_depth, load_normal=load_normal, load_mask=load_mask, normal_folder=args.normal_folder, depth_folder=args.depth_folder)
elif os.path.exists(os.path.join(args.source_path, "transforms_train.json")):
print("Found transforms_train.json file, assuming Blender data set!")
scene_info = sceneLoadTypeCallbacks["Blender"](args.source_path, args.white_background, args.eval)
else:
assert False, "Could not recognize scene type!"
self.trans = scene_info.trans
self.scale = scene_info.scale
if not self.loaded_iter:
with open(scene_info.ply_path, 'rb') as src_file, open(os.path.join(self.model_path, "input.ply") , 'wb') as dest_file:
dest_file.write(src_file.read())
json_cams = []
camlist = []
if scene_info.test_cameras:
camlist.extend(scene_info.test_cameras)
if scene_info.train_cameras:
camlist.extend(scene_info.train_cameras)
for id, cam in enumerate(camlist):
json_cams.append(camera_to_JSON(id, cam))
with open(os.path.join(self.model_path, "cameras.json"), 'w') as file:
json.dump(json_cams, file)
if shuffle:
random.shuffle(scene_info.train_cameras) # Multi-res consistent random shuffling
# random.shuffle(scene_info.test_cameras) # Multi-res consistent random shuffling
self.cameras_extent = scene_info.nerf_normalization["radius"]
gaussians.extent = self.cameras_extent
for resolution_scale in resolution_scales:
print("Loading Training Cameras")
self.train_cameras[resolution_scale] = cameraList_from_camInfos(scene_info.train_cameras, resolution_scale, args)
print("Loading Test Cameras")
self.test_cameras[resolution_scale] = cameraList_from_camInfos(scene_info.test_cameras, resolution_scale, args)
for idx, camera in enumerate(self.train_cameras[resolution_scale] + self.test_cameras[resolution_scale]):
camera.idx = idx
if self.loaded_iter:
self.gaussians.load_ply(os.path.join(self.model_path,
"point_cloud",
"iteration_" + str(self.loaded_iter),
"point_cloud.ply"))
else:
self.gaussians.create_from_pcd(scene_info.point_cloud, self.cameras_extent)
if args.depth_type == "traditional":
self.dirs = None
elif args.depth_type == "intersection":
self.dirs = get_all_px_dir(self.getTrainCameras()[0].intr, self.getTrainCameras()[0].image_height, self.getTrainCameras()[0].image_width).cuda()
self.first_name = scene_info.first_name
def save(self, iteration, visi=None, surf=None, save_splat=False):
point_cloud_path = os.path.join(self.model_path, "point_cloud/iteration_{}".format(iteration))
self.gaussians.save_ply(os.path.join(point_cloud_path, "point_cloud.ply"))
self.gaussians.save_inside_ply(os.path.join(point_cloud_path, "point_cloud_inside.ply"))
if visi is not None:
self.gaussians.save_visi_ply(os.path.join(point_cloud_path, "visi.ply"), visi)
if surf is not None:
self.gaussians.save_visi_ply(os.path.join(point_cloud_path, "surf.ply"), surf)
if save_splat:
self.gaussians.save_splat(os.path.join(point_cloud_path, "pcd.splat"))
def getTrainCameras(self, scale=1.0):
return self.train_cameras[scale]
def getTestCameras(self, scale=1.0):
return self.test_cameras[scale]
def getFullCameras(self, scale=1.0):
if self.split:
return self.train_cameras[scale] + self.test_cameras[scale]
else:
return self.train_cameras[scale]
def getUpCameras(self):
return self.random_cameras_up
def getAroundCameras(self):
return self.random_cameras_around
def getRandCameras(self, n, up=False, around=True, sample_mode='uniform'):
if up and around:
n = n // 2
cameras = []
if up:
up_cameras = self.getUpCameras().copy()
idx = torch.randperm(len(up_cameras))[: n]
if n == 1:
cameras.append(up_cameras[idx])
else:
cameras.extend(up_cameras[idx])
if around:
around_cameras = self.getAroundCameras()
if sample_mode == 'random':
idx = torch.randperm(len(around_cameras))[: n]
elif sample_mode == 'uniform':
idx = torch.arange(len(around_cameras))[::len(around_cameras)//n]
else:
assert False, f"Unknown sample_mode: {sample_mode}"
if n == 1:
cameras.append(around_cameras[idx])
else:
cameras.extend(around_cameras[idx])
return cameras
================================================
FILE: scene/appearance_network.py
================================================
import torch
import torch.nn as nn
import torch.nn.functional as F
class UpsampleBlock(nn.Module):
def __init__(self, num_input_channels, num_output_channels):
super(UpsampleBlock, self).__init__()
self.pixel_shuffle = nn.PixelShuffle(2)
self.conv = nn.Conv2d(num_input_channels // (2 * 2), num_output_channels, 3, stride=1, padding=1)
self.relu = nn.ReLU()
def forward(self, x):
x = self.pixel_shuffle(x)
x = self.conv(x)
x = self.relu(x)
return x
class AppearanceNetwork(nn.Module):
def __init__(self, num_input_channels, num_output_channels):
super(AppearanceNetwork, self).__init__()
self.conv1 = nn.Conv2d(num_input_channels, 256, 3, stride=1, padding=1)
self.up1 = UpsampleBlock(256, 128)
self.up2 = UpsampleBlock(128, 64)
self.up3 = UpsampleBlock(64, 32)
self.up4 = UpsampleBlock(32, 16)
self.conv2 = nn.Conv2d(16, 16, 3, stride=1, padding=1)
self.conv3 = nn.Conv2d(16, num_output_channels, 3, stride=1, padding=1)
self.relu = nn.ReLU()
self.sigmoid = nn.Sigmoid()
def forward(self, x):
x = self.conv1(x)
x = self.relu(x)
x = self.up1(x)
x = self.up2(x)
x = self.up3(x)
x = self.up4(x)
# bilinear interpolation
x = F.interpolate(x, scale_factor=2, mode='bilinear', align_corners=True)
x = self.conv2(x)
x = self.relu(x)
x = self.conv3(x)
x = self.sigmoid(x)
return x
if __name__ == "__main__":
H, W = 1200//32, 1600//32
input_channels = 3 + 64
output_channels = 3
input = torch.randn(1, input_channels, H, W).cuda()
model = AppearanceNetwork(input_channels, output_channels).cuda()
output = model(input)
print(output.shape)
================================================
FILE: scene/cameras.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import torch
from torch import nn
import numpy as np
from tools.graphics_utils import getWorld2View2, getProjectionMatrix, getIntrinsic
class Camera(nn.Module):
def __init__(self, colmap_id, R, T, FoVx, FoVy, image, gt_alpha_mask,
image_name, uid, depth=None, normal=None, mask=None,
trans=np.array([0.0, 0.0, 0.0]), scale=1.0, data_device = "cuda"
):
super(Camera, self).__init__()
self.uid = uid
self.colmap_id = colmap_id
self.R = R
self.T = T
self.FoVx = FoVx
self.FoVy = FoVy
self.image_name = image_name
try:
self.data_device = torch.device(data_device)
except Exception as e:
print(e)
print(f"[Warning] Custom device {data_device} failed, fallback to default cuda device" )
self.data_device = torch.device("cuda")
self.original_image = image.clamp(0.0, 1.0).to(self.data_device)
self.image_width = self.original_image.shape[2]
self.image_height = self.original_image.shape[1]
if gt_alpha_mask is not None:
self.gt_alpha_mask = gt_alpha_mask
if mask is not None:
mask = mask.squeeze(-1).cuda()
mask[self.gt_alpha_mask[0] == 0] = 0
else:
mask = self.gt_alpha_mask.bool().squeeze(0).cuda()
else:
self.original_image *= torch.ones((1, self.image_height, self.image_width), device=self.data_device)
self.gt_alpha_mask = None
self.depth = depth.to(data_device) if depth is not None else None
self.normal = normal.to(data_device) if normal is not None else None
if mask is not None:
self.mask = mask.squeeze(-1).cuda()
self.zfar = 100.0
self.znear = 0.01
self.trans = trans
self.scale = scale
self.world_view_transform = torch.tensor(getWorld2View2(R, T, trans, scale)).transpose(0, 1).cuda() # w2c
self.projection_matrix = getProjectionMatrix(znear=self.znear, zfar=self.zfar, fovX=self.FoVx, fovY=self.FoVy).transpose(0,1).cuda()
self.full_proj_transform = (self.world_view_transform.unsqueeze(0).bmm(self.projection_matrix.unsqueeze(0))).squeeze(0) # w2c2image
self.camera_center = self.world_view_transform.inverse()[3, :3]
intr = getIntrinsic(self.FoVx, self.FoVy, self.image_height, self.image_width).cuda()
self.intr = intr
class MiniCam:
def __init__(self, width, height, fovy, fovx, znear, zfar, world_view_transform, full_proj_transform):
self.image_width = width
self.image_height = height
self.FoVy = fovy
self.FoVx = fovx
self.znear = znear
self.zfar = zfar
self.world_view_transform = world_view_transform
self.full_proj_transform = full_proj_transform
view_inv = torch.inverse(self.world_view_transform)
self.camera_center = view_inv[3][:3]
class SampleCam(nn.Module):
def __init__(self, w2c, width, height, FoVx, FoVy, device='cuda'):
super(SampleCam, self).__init__()
self.FoVx = FoVx
self.FoVy = FoVy
self.image_width = width
self.image_height = height
self.zfar = 100.0
self.znear = 0.01
try:
self.data_device = torch.device(device)
except Exception as e:
print(e)
print(f"[Warning] Custom device {device} failed, fallback to default cuda device" )
self.data_device = torch.device("cuda")
w2c = w2c.to(self.data_device)
self.world_view_transform = w2c.transpose(0, 1)
self.projection_matrix = getProjectionMatrix(znear=self.znear, zfar=self.zfar, fovX=self.FoVx, fovY=self.FoVy).transpose(0,1).to(w2c.device)
self.full_proj_transform = self.world_view_transform @ self.projection_matrix
self.camera_center = self.world_view_transform.inverse()[3, :3]
class MiniCam2:
def __init__(self, c2w, width, height, fovy, fovx, znear, zfar):
# c2w (pose) should be in NeRF convention.
self.image_width = width
self.image_height = height
self.FoVy = fovy
self.FoVx = fovx
self.znear = znear
self.zfar = zfar
w2c = np.linalg.inv(c2w)
# rectify...
w2c[1:3, :3] *= -1
w2c[:3, 3] *= -1
self.world_view_transform = torch.tensor(w2c).transpose(0, 1).cuda()
self.projection_matrix = (
getProjectionMatrix(
znear=self.znear, zfar=self.zfar, fovX=self.FoVx, fovY=self.FoVy
)
.transpose(0, 1)
.cuda()
)
self.full_proj_transform = self.world_view_transform @ self.projection_matrix
self.camera_center = -torch.tensor(c2w[:3, 3]).cuda()
================================================
FILE: scene/colmap_loader.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import numpy as np
import collections
import struct
CameraModel = collections.namedtuple(
"CameraModel", ["model_id", "model_name", "num_params"])
Camera = collections.namedtuple(
"Camera", ["id", "model", "width", "height", "params"])
BaseImage = collections.namedtuple(
"Image", ["id", "qvec", "tvec", "camera_id", "name", "xys", "point3D_ids"])
Point3D = collections.namedtuple(
"Point3D", ["id", "xyz", "rgb", "error", "image_ids", "point2D_idxs"])
CAMERA_MODELS = {
CameraModel(model_id=0, model_name="SIMPLE_PINHOLE", num_params=3),
CameraModel(model_id=1, model_name="PINHOLE", num_params=4),
CameraModel(model_id=2, model_name="SIMPLE_RADIAL", num_params=4),
CameraModel(model_id=3, model_name="RADIAL", num_params=5),
CameraModel(model_id=4, model_name="OPENCV", num_params=8),
CameraModel(model_id=5, model_name="OPENCV_FISHEYE", num_params=8),
CameraModel(model_id=6, model_name="FULL_OPENCV", num_params=12),
CameraModel(model_id=7, model_name="FOV", num_params=5),
CameraModel(model_id=8, model_name="SIMPLE_RADIAL_FISHEYE", num_params=4),
CameraModel(model_id=9, model_name="RADIAL_FISHEYE", num_params=5),
CameraModel(model_id=10, model_name="THIN_PRISM_FISHEYE", num_params=12)
}
CAMERA_MODEL_IDS = dict([(camera_model.model_id, camera_model)
for camera_model in CAMERA_MODELS])
CAMERA_MODEL_NAMES = dict([(camera_model.model_name, camera_model)
for camera_model in CAMERA_MODELS])
def qvec2rotmat(qvec):
return np.array([
[1 - 2 * qvec[2]**2 - 2 * qvec[3]**2,
2 * qvec[1] * qvec[2] - 2 * qvec[0] * qvec[3],
2 * qvec[3] * qvec[1] + 2 * qvec[0] * qvec[2]],
[2 * qvec[1] * qvec[2] + 2 * qvec[0] * qvec[3],
1 - 2 * qvec[1]**2 - 2 * qvec[3]**2,
2 * qvec[2] * qvec[3] - 2 * qvec[0] * qvec[1]],
[2 * qvec[3] * qvec[1] - 2 * qvec[0] * qvec[2],
2 * qvec[2] * qvec[3] + 2 * qvec[0] * qvec[1],
1 - 2 * qvec[1]**2 - 2 * qvec[2]**2]])
def rotmat2qvec(R):
Rxx, Ryx, Rzx, Rxy, Ryy, Rzy, Rxz, Ryz, Rzz = R.flat
K = np.array([
[Rxx - Ryy - Rzz, 0, 0, 0],
[Ryx + Rxy, Ryy - Rxx - Rzz, 0, 0],
[Rzx + Rxz, Rzy + Ryz, Rzz - Rxx - Ryy, 0],
[Ryz - Rzy, Rzx - Rxz, Rxy - Ryx, Rxx + Ryy + Rzz]]) / 3.0
eigvals, eigvecs = np.linalg.eigh(K)
qvec = eigvecs[[3, 0, 1, 2], np.argmax(eigvals)]
if qvec[0] < 0:
qvec *= -1
return qvec
class Image(BaseImage):
def qvec2rotmat(self):
return qvec2rotmat(self.qvec)
def read_next_bytes(fid, num_bytes, format_char_sequence, endian_character="<"):
"""Read and unpack the next bytes from a binary file.
:param fid:
:param num_bytes: Sum of combination of {2, 4, 8}, e.g. 2, 6, 16, 30, etc.
:param format_char_sequence: List of {c, e, f, d, h, H, i, I, l, L, q, Q}.
:param endian_character: Any of {@, =, <, >, !}
:return: Tuple of read and unpacked values.
"""
data = fid.read(num_bytes)
return struct.unpack(endian_character + format_char_sequence, data)
def read_points3D_text(path):
"""
see: src/base/reconstruction.cc
void Reconstruction::ReadPoints3DText(const std::string& path)
void Reconstruction::WritePoints3DText(const std::string& path)
"""
xyzs = None
rgbs = None
errors = None
num_points = 0
with open(path, "r") as fid:
while True:
line = fid.readline()
if not line:
break
line = line.strip()
if len(line) > 0 and line[0] != "#":
num_points += 1
xyzs = np.empty((num_points, 3))
rgbs = np.empty((num_points, 3))
errors = np.empty((num_points, 1))
count = 0
with open(path, "r") as fid:
while True:
line = fid.readline()
if not line:
break
line = line.strip()
if len(line) > 0 and line[0] != "#":
elems = line.split()
xyz = np.array(tuple(map(float, elems[1:4])))
rgb = np.array(tuple(map(int, elems[4:7])))
error = np.array(float(elems[7]))
xyzs[count] = xyz
rgbs[count] = rgb
errors[count] = error
count += 1
return xyzs, rgbs, errors
def read_points3D_binary(path_to_model_file):
"""
see: src/base/reconstruction.cc
void Reconstruction::ReadPoints3DBinary(const std::string& path)
void Reconstruction::WritePoints3DBinary(const std::string& path)
"""
with open(path_to_model_file, "rb") as fid:
num_points = read_next_bytes(fid, 8, "Q")[0]
xyzs = np.empty((num_points, 3))
rgbs = np.empty((num_points, 3))
errors = np.empty((num_points, 1))
for p_id in range(num_points):
binary_point_line_properties = read_next_bytes(
fid, num_bytes=43, format_char_sequence="QdddBBBd")
xyz = np.array(binary_point_line_properties[1:4])
rgb = np.array(binary_point_line_properties[4:7])
error = np.array(binary_point_line_properties[7])
track_length = read_next_bytes(
fid, num_bytes=8, format_char_sequence="Q")[0]
track_elems = read_next_bytes(
fid, num_bytes=8*track_length,
format_char_sequence="ii"*track_length)
xyzs[p_id] = xyz
rgbs[p_id] = rgb
errors[p_id] = error
return xyzs, rgbs, errors
def read_intrinsics_text(path):
"""
Taken from https://github.com/colmap/colmap/blob/dev/scripts/python/read_write_model.py
"""
cameras = {}
with open(path, "r") as fid:
while True:
line = fid.readline()
if not line:
break
line = line.strip()
if len(line) > 0 and line[0] != "#":
elems = line.split()
camera_id = int(elems[0])
model = elems[1]
assert model == "PINHOLE", "While the loader support other types, the rest of the code assumes PINHOLE"
width = int(elems[2])
height = int(elems[3])
params = np.array(tuple(map(float, elems[4:])))
cameras[camera_id] = Camera(id=camera_id, model=model,
width=width, height=height,
params=params)
return cameras
def read_extrinsics_binary(path_to_model_file):
"""
see: src/base/reconstruction.cc
void Reconstruction::ReadImagesBinary(const std::string& path)
void Reconstruction::WriteImagesBinary(const std::string& path)
"""
images = {}
with open(path_to_model_file, "rb") as fid:
num_reg_images = read_next_bytes(fid, 8, "Q")[0]
for _ in range(num_reg_images):
binary_image_properties = read_next_bytes(
fid, num_bytes=64, format_char_sequence="idddddddi")
image_id = binary_image_properties[0]
qvec = np.array(binary_image_properties[1:5])
tvec = np.array(binary_image_properties[5:8])
camera_id = binary_image_properties[8]
image_name = ""
current_char = read_next_bytes(fid, 1, "c")[0]
while current_char != b"\x00": # look for the ASCII 0 entry
image_name += current_char.decode("utf-8")
current_char = read_next_bytes(fid, 1, "c")[0]
num_points2D = read_next_bytes(fid, num_bytes=8,
format_char_sequence="Q")[0]
x_y_id_s = read_next_bytes(fid, num_bytes=24*num_points2D,
format_char_sequence="ddq"*num_points2D)
xys = np.column_stack([tuple(map(float, x_y_id_s[0::3])),
tuple(map(float, x_y_id_s[1::3]))])
point3D_ids = np.array(tuple(map(int, x_y_id_s[2::3])))
images[image_id] = Image(
id=image_id, qvec=qvec, tvec=tvec,
camera_id=camera_id, name=image_name,
xys=xys, point3D_ids=point3D_ids)
return images
def read_intrinsics_binary(path_to_model_file):
"""
see: src/base/reconstruction.cc
void Reconstruction::WriteCamerasBinary(const std::string& path)
void Reconstruction::ReadCamerasBinary(const std::string& path)
"""
cameras = {}
with open(path_to_model_file, "rb") as fid:
num_cameras = read_next_bytes(fid, 8, "Q")[0]
for _ in range(num_cameras):
camera_properties = read_next_bytes(
fid, num_bytes=24, format_char_sequence="iiQQ")
camera_id = camera_properties[0]
model_id = camera_properties[1]
model_name = CAMERA_MODEL_IDS[camera_properties[1]].model_name
width = camera_properties[2]
height = camera_properties[3]
num_params = CAMERA_MODEL_IDS[model_id].num_params
params = read_next_bytes(fid, num_bytes=8*num_params,
format_char_sequence="d"*num_params)
cameras[camera_id] = Camera(id=camera_id,
model=model_name,
width=width,
height=height,
params=np.array(params))
assert len(cameras) == num_cameras
return cameras
def read_extrinsics_text(path):
"""
Taken from https://github.com/colmap/colmap/blob/dev/scripts/python/read_write_model.py
"""
images = {}
with open(path, "r") as fid:
while True:
line = fid.readline()
if not line:
break
line = line.strip()
if len(line) > 0 and line[0] != "#":
elems = line.split()
image_id = int(elems[0])
qvec = np.array(tuple(map(float, elems[1:5])))
tvec = np.array(tuple(map(float, elems[5:8])))
camera_id = int(elems[8])
image_name = elems[9]
elems = fid.readline().split()
xys = np.column_stack([tuple(map(float, elems[0::3])),
tuple(map(float, elems[1::3]))])
point3D_ids = np.array(tuple(map(int, elems[2::3])))
images[image_id] = Image(
id=image_id, qvec=qvec, tvec=tvec,
camera_id=camera_id, name=image_name,
xys=xys, point3D_ids=point3D_ids)
return images
def read_colmap_bin_array(path):
"""
Taken from https://github.com/colmap/colmap/blob/dev/scripts/python/read_dense.py
:param path: path to the colmap binary file.
:return: nd array with the floating point values in the value
"""
with open(path, "rb") as fid:
width, height, channels = np.genfromtxt(fid, delimiter="&", max_rows=1,
usecols=(0, 1, 2), dtype=int)
fid.seek(0)
num_delimiter = 0
byte = fid.read(1)
while True:
if byte == b"&":
num_delimiter += 1
if num_delimiter >= 3:
break
byte = fid.read(1)
array = np.fromfile(fid, np.float32)
array = array.reshape((width, height, channels), order="F")
return np.transpose(array, (1, 0, 2)).squeeze()
================================================
FILE: scene/dataset_readers.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import os
import sys
import cv2
import json
import numpy as np
import open3d as o3d
from PIL import Image, ImageFile
from pathlib import Path
from typing import NamedTuple
from plyfile import PlyData, PlyElement
ImageFile.LOAD_TRUNCATED_IMAGES = True
from scene.colmap_loader import read_extrinsics_text, read_intrinsics_text, qvec2rotmat, \
read_extrinsics_binary, read_intrinsics_binary, read_points3D_binary, read_points3D_text
from tools.graphics_utils import getWorld2View2, focal2fov, fov2focal
from tools.sh_utils import SH2RGB
from scene.gaussian_model import BasicPointCloud
from tools.math_utils import normalize_pts
from process_data.convert_data_to_json import bound_by_points
class CameraInfo(NamedTuple):
uid: int
R: np.array
T: np.array
FovY: np.array
FovX: np.array
image: np.array
image_path: str
image_name: str
width: int
height: int
depth: None
normal: None
mask: None
class SceneInfo(NamedTuple):
point_cloud: BasicPointCloud
train_cameras: list
test_cameras: list
nerf_normalization: dict
ply_path: str
trans: np.array
scale: np.array
first_name: str
def getNerfppNorm(cam_info):
def get_center_and_diag(cam_centers):
cam_centers = np.hstack(cam_centers)
avg_cam_center = np.mean(cam_centers, axis=1, keepdims=True)
center = avg_cam_center
dist = np.linalg.norm(cam_centers - center, axis=0, keepdims=True)
diagonal = np.max(dist)
return center.flatten(), diagonal
cam_centers = []
for cam in cam_info:
W2C = getWorld2View2(cam.R, cam.T)
C2W = np.linalg.inv(W2C)
cam_centers.append(C2W[:3, 3:4])
center, diagonal = get_center_and_diag(cam_centers)
radius = diagonal * 1.1
translate = -center
return {"translate": translate, "radius": radius}
def readColmapCameras(cam_extrinsics, cam_intrinsics, images_folder, load_depth=False, load_normal=False, load_mask=False, normal_folder='normals', depth_folder='depths'):
if load_depth:
depths_folder = images_folder.replace('images', depth_folder)
if load_normal:
normals_folder = images_folder.replace('images', normal_folder)
if load_mask:
mask_folder = images_folder.replace('images', 'masks')
cam_infos = []
for idx, key in enumerate(cam_extrinsics):
sys.stdout.write('\r')
# the exact output you're looking for:
sys.stdout.write("Reading camera {}/{}".format(idx+1, len(cam_extrinsics)))
sys.stdout.flush()
extr = cam_extrinsics[key]
intr = cam_intrinsics[extr.camera_id]
height = intr.height
width = intr.width
uid = intr.id
R = np.transpose(qvec2rotmat(extr.qvec))
T = np.array(extr.tvec)
if intr.model=="SIMPLE_PINHOLE":
focal_length_x = intr.params[0]
FovY = focal2fov(focal_length_x, height)
FovX = focal2fov(focal_length_x, width)
elif intr.model=="PINHOLE":
focal_length_x = intr.params[0]
focal_length_y = intr.params[1]
FovY = focal2fov(focal_length_y, height)
FovX = focal2fov(focal_length_x, width)
else:
assert False, "Colmap camera model not handled: only undistorted datasets (PINHOLE or SIMPLE_PINHOLE cameras) supported!"
image_path = os.path.join(images_folder, os.path.basename(extr.name))
image_name = os.path.basename(image_path).split(".")[0]
image = Image.open(image_path)
depth = None
if load_depth:
depth_path = os.path.join(depths_folder, os.path.basename(extr.name).replace('jpg', 'npz').replace('png', 'npz'))
if os.path.exists(depth_path):
depth = np.load(depth_path)['arr_0']
else:
depth_path = os.path.join(depths_folder, os.path.basename(extr.name).replace('jpg', 'png'))
depth = cv2.imread(depth_path, cv2.IMREAD_UNCHANGED)
if depth.ndim == 2: depth = depth[..., None]
normal = None
if load_normal:
normal_path = os.path.join(normals_folder, os.path.basename(extr.name).replace('png', 'npz').replace('jpg', 'npz').replace('JPG', 'npz'))
normal = np.load(normal_path)['arr_0'] # -1, 1
mask = None
if load_mask:
mask_path = os.path.join(mask_folder, os.path.basename(extr.name).replace('jpg', 'png'))
mask_path = mask_path if os.path.exists(mask_path) else \
os.path.join(mask_folder, os.path.basename(extr.name)[1:])
mask = Image.open(mask_path)
cam_info = CameraInfo(uid=uid, R=R, T=T, FovY=FovY, FovX=FovX, image=image,
image_path=image_path, image_name=image_name, width=width, height=height, depth=depth, normal=normal, mask=mask)
cam_infos.append(cam_info)
sys.stdout.write('\n')
return cam_infos
def fetchPly(path):
plydata = PlyData.read(path)
vertices = plydata['vertex']
positions = np.vstack([vertices['x'], vertices['y'], vertices['z']]).T
colors = np.vstack([vertices['red'], vertices['green'], vertices['blue']]).T / 255.0
normals = np.vstack([vertices['nx'], vertices['ny'], vertices['nz']]).T
return BasicPointCloud(points=positions, colors=colors, normals=normals)
def storePly(path, xyz, rgb, normals=None):
# Define the dtype for the structured array
dtype = [('x', 'f4'), ('y', 'f4'), ('z', 'f4'),
('nx', 'f4'), ('ny', 'f4'), ('nz', 'f4'),
('red', 'u1'), ('green', 'u1'), ('blue', 'u1')]
normals = np.zeros_like(xyz) if normals is None else normals
elements = np.empty(xyz.shape[0], dtype=dtype)
attributes = np.concatenate((xyz, normals, rgb), axis=1)
elements[:] = list(map(tuple, attributes))
# Create the PlyData object and write to file
vertex_element = PlyElement.describe(elements, 'vertex')
ply_data = PlyData([vertex_element])
ply_data.write(path)
def get_inside_mask(pts, trans, scale):
pts = normalize_pts(pts, trans, scale)
inside = np.all(np.abs(pts) < 1.5, axis=-1)
return inside
def filter_point_cloud(trans, scale, xyz, rgb, nb_points=5, radius=0.1):
inside = get_inside_mask(xyz, trans, scale)
xyz_inside = xyz[inside]
rgb_inside = rgb[inside]
xyz_outside = xyz[~inside]
rgb_outside = rgb[~inside]
pcd_inside = o3d.geometry.PointCloud()
pcd_inside.points = o3d.utility.Vector3dVector(xyz_inside)
pcd_inside.colors = o3d.utility.Vector3dVector(rgb_inside)
pcd_inside_filter, ind = pcd_inside.remove_radius_outlier(nb_points, radius)
xyz_inside = np.asarray(pcd_inside_filter.points)
rgb_inside = np.asarray(pcd_inside_filter.colors)
xyz = np.concatenate((xyz_inside, xyz_outside), axis=0)
rgb = np.concatenate((rgb_inside, rgb_outside), axis=0)
return xyz, rgb
def readColmapSceneInfo(path, images, eval, llffhold=8, ratio=0, split=False, load_depth=False, load_normal=False, load_mask=False, normal_folder='normals', depth_folder='depths'):
colmap_dir = os.path.join(path, "sparse/0")
if not os.path.exists(colmap_dir):
colmap_dir = os.path.join(path, "sparse")
try:
cameras_extrinsic_file = os.path.join(colmap_dir, "images.bin")
cameras_intrinsic_file = os.path.join(colmap_dir, "cameras.bin")
cam_extrinsics = read_extrinsics_binary(cameras_extrinsic_file)
cam_intrinsics = read_intrinsics_binary(cameras_intrinsic_file)
except:
cameras_extrinsic_file = os.path.join(colmap_dir, "images.txt")
cameras_intrinsic_file = os.path.join(colmap_dir, "cameras.txt")
cam_extrinsics = read_extrinsics_text(cameras_extrinsic_file)
cam_intrinsics = read_intrinsics_text(cameras_intrinsic_file)
ply_path = os.path.join(colmap_dir, "points3D.ply")
bin_path = os.path.join(colmap_dir, "points3D.bin")
txt_path = os.path.join(colmap_dir, "points3D.txt")
reading_dir = "images" if images == None else images
cam_infos_unsorted = readColmapCameras(cam_extrinsics=cam_extrinsics, cam_intrinsics=cam_intrinsics, images_folder=os.path.join(path, reading_dir), load_depth=load_depth, load_normal=load_normal, load_mask=load_mask, normal_folder=normal_folder, depth_folder=depth_folder)
cam_infos = sorted(cam_infos_unsorted.copy(), key = lambda x : x.image_name)
meta_fname = f"{path}/meta.json"
if os.path.exists(meta_fname):
with open(meta_fname) as file:
meta = json.load(file)
trans = np.array(meta["trans"], dtype=np.float32)
scale = np.array(meta["scale"], dtype=np.float32)
else:
print("No meta.json file found, using default values.")
if not os.path.exists(ply_path):
print("Converting point3d.bin to .ply, will happen only the first time you open the scene.")
try:
xyz, rgb, _ = read_points3D_binary(bin_path)
except:
xyz, rgb, _ = read_points3D_text(txt_path)
# xyz, rgb = filter_point_cloud(trans, scale, xyz, rgb)
# storePly(ply_path, xyz, rgb)
# try:
# pcd = fetchPly(ply_path)
# except:
# pcd = None
trans, scale, bounding_box = bound_by_points(xyz)
meta = {
'trans': trans.tolist(),
'scale': scale.tolist()
}
with open(meta_fname, "w") as file:
json.dump(meta, file, indent=4)
if ratio > 0:
len_train = int(len(cam_infos) * ratio)
llffhold = len(cam_infos) // len_train
train_idx = set([int(i * llffhold) for i in range(len_train)])
test_idx = set(range(len(cam_infos))) - train_idx
train_cam_infos = [cam_infos[i] for i in train_idx]
test_cam_infos = [cam_infos[i] for i in test_idx]
elif eval:
if split and "test" in meta:
train_cam_infos = [c for c in cam_infos if c.image_name in meta["train"]]
test_cam_infos = [c for c in cam_infos if c.image_name in meta["test"]]
else:
train_cam_infos = [c for idx, c in enumerate(cam_infos) if idx % llffhold != 0]
test_cam_infos = [c for idx, c in enumerate(cam_infos) if idx % llffhold == 0]
else:
train_cam_infos = cam_infos
test_cam_infos = []
print(f"Train: {len(train_cam_infos)}, Test: {len(test_cam_infos)}")
first_name = test_cam_infos[0].image_name if eval else cam_infos[0].image_name
nerf_normalization = getNerfppNorm(train_cam_infos)
if not os.path.exists(ply_path):
print("Converting point3d.bin to .ply, will happen only the first time you open the scene.")
try:
xyz, rgb, _ = read_points3D_binary(bin_path)
except:
xyz, rgb, _ = read_points3D_text(txt_path)
xyz, rgb = filter_point_cloud(trans, scale, xyz, rgb)
storePly(ply_path, xyz, rgb)
try:
pcd = fetchPly(ply_path)
except:
pcd = None
scene_info = SceneInfo(point_cloud=pcd,
train_cameras=train_cam_infos,
test_cameras=test_cam_infos,
nerf_normalization=nerf_normalization,
ply_path=ply_path,
trans=trans,
scale=scale,
first_name=first_name)
return scene_info
def readCamerasFromTransforms(path, transformsfile, white_background, extension=".png"):
cam_infos = []
with open(os.path.join(path, transformsfile)) as json_file:
contents = json.load(json_file)
fovx = contents["camera_angle_x"]
frames = contents["frames"]
for idx, frame in enumerate(frames):
cam_name = os.path.join(path, frame["file_path"] + extension)
# NeRF 'transform_matrix' is a camera-to-world transform
c2w = np.array(frame["transform_matrix"])
# change from OpenGL/Blender camera axes (Y up, Z back) to COLMAP (Y down, Z forward)
c2w[:3, 1:3] *= -1
# get the world-to-camera transform and set R, T
w2c = np.linalg.inv(c2w)
R = np.transpose(w2c[:3,:3]) # R is stored transposed due to 'glm' in CUDA code
T = w2c[:3, 3]
image_path = os.path.join(path, cam_name)
image_name = Path(cam_name).stem
image = Image.open(image_path)
im_data = np.array(image.convert("RGBA"))
bg = np.array([1,1,1]) if white_background else np.array([0, 0, 0])
norm_data = im_data / 255.0
arr = norm_data[:,:,:3] * norm_data[:, :, 3:4] + bg * (1 - norm_data[:, :, 3:4])
image = Image.fromarray(np.array(arr*255.0, dtype=np.byte), "RGB")
fovy = focal2fov(fov2focal(fovx, image.size[0]), image.size[1])
FovY = fovy
FovX = fovx
cam_infos.append(CameraInfo(uid=idx, R=R, T=T, FovY=FovY, FovX=FovX, image=image,
image_path=image_path, image_name=image_name, width=image.size[0], height=image.size[1]))
return cam_infos
def readNerfSyntheticInfo(path, white_background, eval, extension=".png"):
print("Reading Training Transforms")
train_cam_infos = readCamerasFromTransforms(path, "transforms_train.json", white_background, extension)
print("Reading Test Transforms")
test_cam_infos = readCamerasFromTransforms(path, "transforms_test.json", white_background, extension)
if not eval:
train_cam_infos.extend(test_cam_infos)
test_cam_infos = []
nerf_normalization = getNerfppNorm(train_cam_infos)
ply_path = os.path.join(path, "points3d.ply")
if not os.path.exists(ply_path):
# Since this data set has no colmap data, we start with random points
num_pts = 100_000
print(f"Generating random point cloud ({num_pts})...")
# We create random points inside the bounds of the synthetic Blender scenes
xyz = np.random.random((num_pts, 3)) * 2.6 - 1.3
shs = np.random.random((num_pts, 3)) / 255.0
pcd = BasicPointCloud(points=xyz, colors=SH2RGB(shs), normals=np.zeros((num_pts, 3)))
storePly(ply_path, xyz, SH2RGB(shs) * 255)
try:
pcd = fetchPly(ply_path)
except:
pcd = None
scene_info = SceneInfo(point_cloud=pcd,
train_cameras=train_cam_infos,
test_cameras=test_cam_infos,
nerf_normalization=nerf_normalization,
ply_path=ply_path)
return scene_info
sceneLoadTypeCallbacks = {
"Colmap": readColmapSceneInfo,
"Blender" : readNerfSyntheticInfo
}
================================================
FILE: scene/gaussian_model.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import os
import torch
import numpy as np
from torch import nn
from copy import deepcopy
try:
from simple_knn._C import distCUDA2
except ModuleNotFoundError:
pass
from plyfile import PlyData, PlyElement
from io import BytesIO
from tqdm import trange
from tools.sh_utils import RGB2SH
from tools.system_utils import mkdir_p
from tools.graphics_utils import BasicPointCloud
from tools.math_utils import normalize_pts, get_inside_normalized
from tools.general_utils import strip_symmetric, build_scaling_rotation
from tools.general_utils import inverse_sigmoid, get_expon_lr_func, build_rotation
from tools.denoise_pcd import remove_radius_outlier
from scene.appearance_network import AppearanceNetwork
from tools.semantic_id import BACKGROUND
class GaussianModel:
def setup_functions(self):
def build_covariance_from_scaling_rotation(scaling, scaling_modifier, rotation):
L = build_scaling_rotation(scaling_modifier * scaling, rotation)
actual_covariance = L @ L.transpose(1, 2)
symm = strip_symmetric(actual_covariance)
return symm
self.scaling_activation = torch.exp
self.scaling_inverse_activation = torch.log
self.covariance_activation = build_covariance_from_scaling_rotation
self.opacity_activation = torch.sigmoid
self.inverse_opacity_activation = inverse_sigmoid
self.rotation_activation = torch.nn.functional.normalize
def __init__(self, cfg):
self.active_sh_degree = 0
self.max_sh_degree = cfg.sh_degree
self._xyz = torch.empty(0)
self._features_dc = torch.empty(0)
self._features_rest = torch.empty(0)
self._scaling = torch.empty(0)
self._rotation = torch.empty(0)
self._opacity = torch.empty(0)
self.max_radii2D = torch.empty(0)
self.xyz_gradient_accum = torch.empty(0)
self.denom = torch.empty(0)
self.optimizer = None
self.percent_dense = 0
self.spatial_lr_scale = 0
self.setup_functions()
self.max_mem = cfg.max_mem
self.use_decoupled_appearance = cfg.use_decoupled_appearance
if self.use_decoupled_appearance:
# appearance network and appearance embedding
self.appearance_network = AppearanceNetwork(3+64, 3).cuda()
std = 1e-4
num_embedding = len(os.listdir(os.path.join(cfg.source_path, 'images')))
self._appearance_embeddings = nn.Parameter(torch.empty(num_embedding, 64).cuda())
self._appearance_embeddings.data.normal_(0, std)
self.enable_semantic = cfg.enable_semantic
self._objects_dc = torch.empty(0)
if self.enable_semantic:
self.ch_sem_feat = cfg.ch_sem_feat
self.num_cls = cfg.num_cls
self.classifier = torch.nn.Conv2d(self.ch_sem_feat, self.num_cls, kernel_size=1).cuda()
def capture(self):
return (
self.active_sh_degree,
self._xyz,
self._features_dc,
self._features_rest,
self._scaling,
self._rotation,
self._opacity,
self._objects_dc,
self.max_radii2D,
self.xyz_gradient_accum,
self.denom,
self.optimizer.state_dict(),
self.spatial_lr_scale,
)
def restore(self, model_args, training_args):
(self.active_sh_degree,
self._xyz,
self._features_dc,
self._features_rest,
self._scaling,
self._rotation,
self._opacity,
self._objects_dc,
self.max_radii2D,
xyz_gradient_accum,
denom,
opt_dict,
self.spatial_lr_scale,
) = model_args
self.training_setup(training_args)
self.xyz_gradient_accum = xyz_gradient_accum
self.denom = denom
self.optimizer.load_state_dict(opt_dict)
@property
def get_scaling(self):
scaling = self._scaling
return self.scaling_activation(scaling)
@property
def get_rotation(self):
return self.rotation_activation(self._rotation)
@property
def get_xyz(self):
return self._xyz
@property
def get_features(self):
features_dc = self._features_dc
features_rest = self._features_rest
return torch.cat((features_dc, features_rest), dim=1)
@property
def get_objects(self):
return self._objects_dc
def get_cls(self, idx=None):
assert self.enable_semantic, "Semantic feature is not enabled"
feats = self.get_objects.permute(0, 2, 1)[..., None]
if idx is not None: feats = feats[idx]
return self.classifier(feats).view(-1, self.num_cls).argmax(-1)
def logits_2_label(self, logits):
return torch.argmax(self.logits2prob(logits), dim=-1)
def logits2prob(self, logits):
return torch.nn.functional.softmax(logits, dim=-1)
@property
def get_opacity(self):
return self.opacity_activation(self._opacity)
def get_apperance_embedding(self, idx):
return self._appearance_embeddings[idx]
# @property
def get_normal(self, valid=None, idx=None, refine_sign=True, is_all=False):
'''
rots: N, 3, 3
'''
normal = None
if valid is None:
if is_all:
valid = torch.ones(self.get_xyz.shape[0], device='cuda', dtype=torch.bool)
else:
valid = self.get_inside_gaus_normalized()[0]
normal = torch.zeros_like(self.get_xyz)
_rot = self.get_rotation[valid]
if idx is not None: _rot = _rot[idx]
rots = build_rotation(_rot)
scaling = self.get_scaling[valid]
if idx is not None: scaling = scaling[idx]
axis = torch.argmin(scaling, dim=-1)
normals = rots.gather(2, axis[:, None, None].expand(-1, 3, -1)).squeeze(-1)
if normal is not None:
normal[valid] = normals
normals = normal
return normals
def get_covariance(self, scaling_modifier = 1):
return self.covariance_activation(self.get_scaling, scaling_modifier, self._rotation)
def oneupSHdegree(self):
if self.active_sh_degree < self.max_sh_degree:
self.active_sh_degree += 1
def create_from_pcd(self, pcd : BasicPointCloud, spatial_lr_scale : float):
self.spatial_lr_scale = spatial_lr_scale
fused_point_cloud = torch.tensor(np.asarray(pcd.points)).float().cuda()
fused_color = RGB2SH(torch.tensor(np.asarray(pcd.colors)).float().cuda())
features = torch.zeros((fused_color.shape[0], 3, (self.max_sh_degree + 1) ** 2)).float().cuda()
features[:, :3, 0 ] = fused_color
features[:, 3:, 1:] = 0.0
print("Number of points at initialisation : ", fused_point_cloud.shape[0])
dist2 = torch.clamp_min(distCUDA2(torch.from_numpy(np.asarray(pcd.points)).float().cuda()), 0.0000001)
scales = torch.log(torch.sqrt(dist2))[...,None].repeat(1, 3)
rots = torch.zeros((fused_point_cloud.shape[0], 4), device="cuda")
rots[:, 0] = 1
opacities = inverse_sigmoid(0.1 * torch.ones((fused_point_cloud.shape[0], 1), dtype=torch.float, device="cuda"))
self._xyz = nn.Parameter(fused_point_cloud.requires_grad_(True))
self._features_dc = nn.Parameter(features[:,:,0:1].transpose(1, 2).contiguous().requires_grad_(True))
self._features_rest = nn.Parameter(features[:,:,1:].transpose(1, 2).contiguous().requires_grad_(True))
self._scaling = nn.Parameter(scales.requires_grad_(True))
self._rotation = nn.Parameter(rots.requires_grad_(True))
self._opacity = nn.Parameter(opacities.requires_grad_(True))
self.max_radii2D = torch.zeros((self.get_xyz.shape[0]), device="cuda")
if self.enable_semantic:
# random init obj_id now
fused_objects = RGB2SH(torch.rand((fused_point_cloud.shape[0], self.ch_sem_feat), device="cuda"))
fused_objects = fused_objects[:,:,None]
self._objects_dc = nn.Parameter(fused_objects.transpose(1, 2).contiguous().requires_grad_(True))
def training_setup(self, training_args, neural_sdf_params=None):
self.percent_dense = training_args.percent_dense
self.large_percent_dense = None
if hasattr(training_args, 'densify_large'):
self.large_percent_dense = training_args.densify_large.percent_dense if \
getattr(training_args.densify_large, 'percent_dense', 0) > 0 else None
self.xyz_gradient_accum = torch.zeros((self.get_xyz.shape[0], 1), device="cuda")
self.denom = torch.zeros((self.get_xyz.shape[0], 1), device="cuda")
l = [
{'params': [self._xyz], 'lr': training_args.position_lr_init * self.spatial_lr_scale, "name": "xyz"},
{'params': [self._features_dc], 'lr': training_args.feature_lr, "name": "f_dc"},
{'params': [self._features_rest], 'lr': training_args.feature_lr / 20.0, "name": "f_rest"},
{'params': [self._opacity], 'lr': training_args.opacity_lr, "name": "opacity"},
{'params': [self._scaling], 'lr': training_args.scaling_lr, "name": "scaling"},
{'params': [self._rotation], 'lr': training_args.rotation_lr, "name": "rotation"},
]
if self.use_decoupled_appearance:
l.append({'params': [self._appearance_embeddings], 'lr': training_args.appearance_embeddings_lr, "name": "appearance_embeddings"})
l.append({'params': self.appearance_network.parameters(), 'lr': training_args.appearance_network_lr, "name": "appearance_network"})
if self.enable_semantic:
l.append({'params': [self._objects_dc], 'lr': training_args.feature_lr, "name": "obj_dc"})
l.append({'params': self.classifier.parameters(), 'lr': training_args.cls_lr, "name": "classifier"})
if neural_sdf_params is not None:
l.append({'params': neural_sdf_params.parameters(), 'lr': training_args.sdf_lr, "name": "neural_sdf"})
self.optimizer = torch.optim.Adam(l, lr=0.0, eps=1e-15)
self.xyz_scheduler_args = get_expon_lr_func(lr_init=training_args.position_lr_init*self.spatial_lr_scale,
lr_final=training_args.position_lr_final*self.spatial_lr_scale,
lr_delay_mult=training_args.position_lr_delay_mult,
max_steps=training_args.position_lr_max_steps)
def update_learning_rate(self, iteration):
''' Learning rate scheduling per step '''
for param_group in self.optimizer.param_groups:
if param_group["name"] == "xyz":
lr = self.xyz_scheduler_args(iteration)
param_group['lr'] = lr
return lr
def construct_list_of_attributes(self):
l = ['x', 'y', 'z', 'nx', 'ny', 'nz']
# All channels except the 3 DC
for i in range(self._features_dc.shape[1]*self._features_dc.shape[2]):
l.append('f_dc_{}'.format(i))
for i in range(self._features_rest.shape[1]*self._features_rest.shape[2]):
l.append('f_rest_{}'.format(i))
l.append('opacity')
for i in range(self._scaling.shape[1]):
l.append('scale_{}'.format(i))
for i in range(self._rotation.shape[1]):
l.append('rot_{}'.format(i))
if self.enable_semantic:
for i in range(self._objects_dc.shape[1]*self._objects_dc.shape[2]):
l.append('obj_dc_{}'.format(i))
return l
def save_ply(self, path):
mkdir_p(os.path.dirname(path))
xyz = self._xyz.detach().cpu().numpy()
normals = np.zeros_like(xyz)
f_dc = self._features_dc.detach().transpose(1, 2).flatten(start_dim=1).contiguous().cpu().numpy()
f_rest = self._features_rest.detach().transpose(1, 2).flatten(start_dim=1).contiguous().cpu().numpy()
opacities = self._opacity.detach().cpu().numpy()
scale = self._scaling.detach().cpu().numpy()
rotation = self._rotation.detach().cpu().numpy()
if self.enable_semantic:
obj_dc = self._objects_dc.detach().transpose(1, 2).flatten(start_dim=1).contiguous().cpu().numpy()
dtype_full = [(attribute, 'f4') for attribute in self.construct_list_of_attributes()]
elements = np.empty(xyz.shape[0], dtype=dtype_full)
attributes = np.concatenate((xyz, normals, f_dc, f_rest, opacities, scale, rotation), axis=1)
if self.enable_semantic:
attributes = np.concatenate((attributes, obj_dc), axis=1)
elements[:] = list(map(tuple, attributes))
el = PlyElement.describe(elements, 'vertex')
PlyData([el]).write(path)
state_dict = {}
if self.use_decoupled_appearance:
state_dict["appearance_network"] = self.appearance_network.state_dict()
state_dict["appearance_embeddings"] = self._appearance_embeddings
if self.enable_semantic:
state_dict["classifier"] = self.classifier.state_dict()
if len(state_dict) > 0:
torch.save(state_dict, os.path.join(os.path.dirname(path), 'model.pth'))
@torch.no_grad()
def save_inside_ply(self, path, inside=None):
mkdir_p(os.path.dirname(path))
if inside is None:
inside = self.get_inside_gaus_normalized()[0]
xyz = self._xyz[inside].detach()
_normals = self.get_normal(inside, refine_sign=True).detach()
normals = _normals
inside = inside.cpu().numpy()
xyz = xyz.cpu().numpy()
normals = normals.cpu().numpy()
f_dc = self._features_dc[inside].detach().transpose(1, 2).flatten(start_dim=1).contiguous().cpu().numpy()
f_rest = self._features_rest[inside].detach().transpose(1, 2).flatten(start_dim=1).contiguous().cpu().numpy()
opacities = self._opacity[inside].detach().cpu().numpy()
scale = self._scaling[inside].detach().cpu().numpy()
rotation = self._rotation[inside].detach().cpu().numpy()
if self.enable_semantic:
obj_dc = self._objects_dc[inside].detach().transpose(1, 2).flatten(start_dim=1).contiguous().cpu().numpy()
dtype_full = [(attribute, 'f4') for attribute in self.construct_list_of_attributes()]
elements = np.empty(xyz.shape[0], dtype=dtype_full)
attributes = np.concatenate((xyz, normals, f_dc, f_rest, opacities, scale, rotation), axis=1)
if self.enable_semantic:
attributes = np.concatenate((attributes, obj_dc), axis=1)
elements[:] = list(map(tuple, attributes))
el = PlyElement.describe(elements, 'vertex')
PlyData([el]).write(path)
def save_visi_ply(self, path, visi):
inside = self.get_inside_gaus_normalized()[0]
inside = inside & visi
self.save_inside_ply(path, inside)
def reset_opacity(self):
opacities_new = inverse_sigmoid(torch.min(self.get_opacity, torch.ones_like(self.get_opacity)*0.01))
optimizable_tensors = self.replace_tensor_to_optimizer(opacities_new, "opacity")
self._opacity = optimizable_tensors["opacity"]
def load_ply(self, path):
plydata = PlyData.read(path)
xyz = np.stack((np.asarray(plydata.elements[0]["x"]),
np.asarray(plydata.elements[0]["y"]),
np.asarray(plydata.elements[0]["z"])), axis=1)
opacities = np.asarray(plydata.elements[0]["opacity"])[..., np.newaxis]
features_dc = np.zeros((xyz.shape[0], 3, 1))
features_dc[:, 0, 0] = np.asarray(plydata.elements[0]["f_dc_0"])
features_dc[:, 1, 0] = np.asarray(plydata.elements[0]["f_dc_1"])
features_dc[:, 2, 0] = np.asarray(plydata.elements[0]["f_dc_2"])
extra_f_names = [p.name for p in plydata.elements[0].properties if p.name.startswith("f_rest_")]
extra_f_names = sorted(extra_f_names, key = lambda x: int(x.split('_')[-1]))
assert len(extra_f_names)==3*(self.max_sh_degree + 1) ** 2 - 3
features_extra = np.zeros((xyz.shape[0], len(extra_f_names)))
for idx, attr_name in enumerate(extra_f_names):
features_extra[:, idx] = np.asarray(plydata.elements[0][attr_name])
# Reshape (P,F*SH_coeffs) to (P, F, SH_coeffs except DC)
features_extra = features_extra.reshape((features_extra.shape[0], 3, (self.max_sh_degree + 1) ** 2 - 1))
scale_names = [p.name for p in plydata.elements[0].properties if p.name.startswith("scale_")]
scale_names = sorted(scale_names, key = lambda x: int(x.split('_')[-1]))
scales = np.zeros((xyz.shape[0], len(scale_names)))
for idx, attr_name in enumerate(scale_names):
scales[:, idx] = np.asarray(plydata.elements[0][attr_name])
rot_names = [p.name for p in plydata.elements[0].properties if p.name.startswith("rot")]
rot_names = sorted(rot_names, key = lambda x: int(x.split('_')[-1]))
rots = np.zeros((xyz.shape[0], len(rot_names)))
for idx, attr_name in enumerate(rot_names):
rots[:, idx] = np.asarray(plydata.elements[0][attr_name])
if self.enable_semantic:
objects_dc = np.zeros((xyz.shape[0], self.ch_sem_feat, 1))
for idx in range(self.ch_sem_feat):
objects_dc[:,idx,0] = np.asarray(plydata.elements[0]["obj_dc_"+str(idx)])
self._xyz = nn.Parameter(torch.tensor(xyz, dtype=torch.float, device="cuda").requires_grad_(True))
self._features_dc = nn.Parameter(torch.tensor(features_dc, dtype=torch.float, device="cuda").transpose(1, 2).contiguous().requires_grad_(True))
self._features_rest = nn.Parameter(torch.tensor(features_extra, dtype=torch.float, device="cuda").transpose(1, 2).contiguous().requires_grad_(True))
self._opacity = nn.Parameter(torch.tensor(opacities, dtype=torch.float, device="cuda").requires_grad_(True))
self._scaling = nn.Parameter(torch.tensor(scales, dtype=torch.float, device="cuda").requires_grad_(True))
self._rotation = nn.Parameter(torch.tensor(rots, dtype=torch.float, device="cuda").requires_grad_(True))
if self.enable_semantic:
self._objects_dc = nn.Parameter(torch.tensor(objects_dc, dtype=torch.float, device="cuda").transpose(1, 2).contiguous().requires_grad_(True))
self.active_sh_degree = self.max_sh_degree
ckpt_path = os.path.join(os.path.dirname(path), 'model.pth')
if os.path.exists(ckpt_path):
state_dict = torch.load(ckpt_path)
if self.enable_semantic:
self.classifier.load_state_dict(state_dict["classifier"])
if self.use_decoupled_appearance:
self.appearance_network.load_state_dict(state_dict["appearance_network"])
self._appearance_embeddings = nn.Parameter(state_dict["appearance_embeddings"].cuda())
def replace_tensor_to_optimizer(self, tensor, name):
optimizable_tensors = {}
for group in self.optimizer.param_groups:
if group["name"] in ["appearance_embeddings", "appearance_network", "classifier"]:
continue
if group["name"] == name:
stored_state = self.optimizer.state.get(group['params'][0], None)
stored_state["exp_avg"] = torch.zeros_like(tensor)
stored_state["exp_avg_sq"] = torch.zeros_like(tensor)
del self.optimizer.state[group['params'][0]]
group["params"][0] = nn.Parameter(tensor.requires_grad_(True))
self.optimizer.state[group['params'][0]] = stored_state
optimizable_tensors[group["name"]] = group["params"][0]
return optimizable_tensors
def _prune_optimizer(self, mask):
optimizable_tensors = {}
for group in self.optimizer.param_groups:
if group["name"] in ["appearance_embeddings", "appearance_network", "classifier"]:
continue
stored_state = self.optimizer.state.get(group['params'][0], None)
if stored_state is not None:
stored_state["exp_avg"] = stored_state["exp_avg"][mask]
stored_state["exp_avg_sq"] = stored_state["exp_avg_sq"][mask]
del self.optimizer.state[group['params'][0]]
group["params"][0] = nn.Parameter((group["params"][0][mask].requires_grad_(True)))
self.optimizer.state[group['params'][0]] = stored_state
optimizable_tensors[group["name"]] = group["params"][0]
else:
group["params"][0] = nn.Parameter(group["params"][0][mask].requires_grad_(True))
optimizable_tensors[group["name"]] = group["params"][0]
return optimizable_tensors
def prune_points(self, mask):
valid_points_mask = ~mask
optimizable_tensors = self._prune_optimizer(valid_points_mask)
self._xyz = optimizable_tensors["xyz"]
self._features_dc = optimizable_tensors["f_dc"]
self._features_rest = optimizable_tensors["f_rest"]
self._opacity = optimizable_tensors["opacity"]
self._scaling = optimizable_tensors["scaling"]
self._rotation = optimizable_tensors["rotation"]
if self.enable_semantic:
self._objects_dc = optimizable_tensors["obj_dc"]
self.xyz_gradient_accum = self.xyz_gradient_accum[valid_points_mask]
self.denom = self.denom[valid_points_mask]
self.max_radii2D = self.max_radii2D[valid_points_mask]
def cat_tensors_to_optimizer(self, tensors_dict):
optimizable_tensors = {}
for group in self.optimizer.param_groups:
if group["name"] in ["appearance_embeddings", "appearance_network", "classifier"]:
continue
assert len(group["params"]) == 1
extension_tensor = tensors_dict[group["name"]]
stored_state = self.optimizer.state.get(group['params'][0], None)
if stored_state is not None:
stored_state["exp_avg"] = torch.cat((stored_state["exp_avg"], torch.zeros_like(extension_tensor)), dim=0)
stored_state["exp_avg_sq"] = torch.cat((stored_state["exp_avg_sq"], torch.zeros_like(extension_tensor)), dim=0)
del self.optimizer.state[group['params'][0]]
group["params"][0] = nn.Parameter(torch.cat((group["params"][0], extension_tensor), dim=0).requires_grad_(True))
self.optimizer.state[group['params'][0]] = stored_state
optimizable_tensors[group["name"]] = group["params"][0]
else:
group["params"][0] = nn.Parameter(torch.cat((group["params"][0], extension_tensor), dim=0).requires_grad_(True))
optimizable_tensors[group["name"]] = group["params"][0]
return optimizable_tensors
def densification_postfix(self, new_xyz, new_features_dc, new_features_rest, new_opacities, new_scaling, new_rotation, new_objects_dc=None, reset=True):
d = {"xyz": new_xyz,
"f_dc": new_features_dc,
"f_rest": new_features_rest,
"opacity": new_opacities,
"scaling" : new_scaling,
"rotation" : new_rotation}
if self.enable_semantic:
d["obj_dc"] = new_objects_dc
optimizable_tensors = self.cat_tensors_to_optimizer(d)
self._xyz = optimizable_tensors["xyz"]
self._features_dc = optimizable_tensors["f_dc"]
self._features_rest = optimizable_tensors["f_rest"]
self._opacity = optimizable_tensors["opacity"]
self._scaling = optimizable_tensors["scaling"]
self._rotation = optimizable_tensors["rotation"]
if self.enable_semantic:
self._objects_dc = optimizable_tensors["obj_dc"]
if reset:
self.xyz_gradient_accum = torch.zeros((self.get_xyz.shape[0], 1), device="cuda")
self.denom = torch.zeros((self.get_xyz.shape[0], 1), device="cuda")
self.max_radii2D = torch.zeros((self.get_xyz.shape[0]), device="cuda")
else:
self.xyz_gradient_accum = torch.cat((self.xyz_gradient_accum, torch.zeros((new_xyz.shape[0], 1), device="cuda")), dim=0)
self.denom = torch.cat((self.denom, torch.zeros((new_xyz.shape[0], 1), device="cuda")), dim=0)
self.max_radii2D = torch.cat((self.max_radii2D, torch.zeros((new_xyz.shape[0]), device="cuda")), dim=0)
def densify_and_split(self, grads, grad_threshold, scene_extent, visi=None, N=2):
n_init_points = self.get_xyz.shape[0]
# Extract points that satisfy the gradient condition
padded_grad = torch.zeros((n_init_points), device="cuda")
padded_grad[:grads.shape[0]] = grads.squeeze()
selected_pts_mask = torch.where(padded_grad >= grad_threshold, True, False)
selected_pts_mask = torch.logical_and(selected_pts_mask,
torch.max(self.get_scaling, dim=1).values > self.percent_dense*scene_extent)
if self.large_percent_dense is not None:
densify_pts_mask = torch.max(self.get_scaling, dim=1).values > self.large_percent_dense * scene_extent
inside, _ = self.get_inside_gaus_normalized()
densify_pts_mask = torch.logical_and(densify_pts_mask, inside)
if visi is not None:
padded_vis = torch.zeros((n_init_points), device="cuda").bool()
padded_vis[:visi.shape[0]] = visi
densify_pts_mask = torch.logical_and(densify_pts_mask, padded_vis)
selected_pts_mask = torch.logical_or(selected_pts_mask, densify_pts_mask)
stds = self.get_scaling[selected_pts_mask].repeat(N,1)
means =torch.zeros((stds.size(0), 3),device="cuda")
samples = torch.normal(mean=means, std=stds)
rots = build_rotation(self._rotation[selected_pts_mask]).repeat(N,1,1)
new_xyz = torch.bmm(rots, samples.unsqueeze(-1)).squeeze(-1) + self.get_xyz[selected_pts_mask].repeat(N, 1)
new_scaling = self.scaling_inverse_activation(self.get_scaling[selected_pts_mask].repeat(N,1) / (0.8*N))
new_rotation = self._rotation[selected_pts_mask].repeat(N,1)
new_features_dc = self._features_dc[selected_pts_mask].repeat(N,1,1)
new_features_rest = self._features_rest[selected_pts_mask].repeat(N,1,1)
new_opacity = self._opacity[selected_pts_mask].repeat(N,1)
new_objects_dc = self._objects_dc[selected_pts_mask].repeat(N,1,1) if self.enable_semantic else None
self.densification_postfix(new_xyz, new_features_dc, new_features_rest, new_opacity, new_scaling, new_rotation, new_objects_dc)
prune_filter = torch.cat((selected_pts_mask, torch.zeros(N * selected_pts_mask.sum(), device="cuda", dtype=bool)))
self.prune_points(prune_filter)
def get_dir_max_scaling(self, scaling, rots):
'''
rots: N, 3, 3
'''
axis = torch.argmax(scaling, dim=-1)
max_scaling = scaling[torch.arange(scaling.shape[0]), axis]
dirs = rots.gather(2, axis[:, None, None].expand(-1, 3, -1)).squeeze(-1)
return dirs, max_scaling, axis
def densify_and_split_along_maxscaling(self, grads, grad_threshold, scene_extent, visi=None, N=2, n_std=2):
n_init_points = self.get_xyz.shape[0]
# Extract points that satisfy the gradient condition
padded_grad = torch.zeros((n_init_points), device="cuda")
padded_grad[:grads.shape[0]] = grads.squeeze()
selected_pts_mask = torch.where(padded_grad >= grad_threshold, True, False)
selected_pts_mask = torch.logical_and(selected_pts_mask,
torch.max(self.get_scaling, dim=1).values > self.percent_dense*scene_extent)
if self.large_percent_dense is not None and (torch.cuda.memory_allocated(0) / 1024**3 < self.max_mem):
densify_pts_mask = torch.max(self.get_scaling, dim=1).values > self.large_percent_dense * scene_extent
inside, _ = self.get_inside_gaus_normalized()
densify_pts_mask = torch.logical_and(densify_pts_mask, inside)
if visi is not None:
padded_vis = torch.zeros((n_init_points), device="cuda").bool()
padded_vis[:visi.shape[0]] = visi
densify_pts_mask = torch.logical_and(densify_pts_mask, padded_vis)
selected_pts_mask = torch.logical_or(selected_pts_mask, densify_pts_mask)
scaling = self.get_scaling[selected_pts_mask]
rots = build_rotation(self._rotation[selected_pts_mask])
dirs, max_scaling, axis = self.get_dir_max_scaling(scaling, rots)
radii = (n_std * max_scaling / 3.)[..., None] # 3 std
new_xyz1 = self.get_xyz[selected_pts_mask] + dirs * radii
new_xyz2 = self.get_xyz[selected_pts_mask] - dirs * radii
new_xyz = torch.cat((new_xyz1, new_xyz2), dim=0)
new_scaling = scaling.detach().clone()
new_scaling[torch.arange(new_scaling.shape[0]), axis] = max_scaling / (0.8*N)
new_scaling = self.scaling_inverse_activation(new_scaling)
new_scaling = torch.cat((new_scaling, new_scaling), dim=0)
new_rotation = self._rotation[selected_pts_mask]
new_rotation = torch.cat((new_rotation, new_rotation), dim=0)
new_features_dc = self._features_dc[selected_pts_mask]
new_features_dc = torch.cat((new_features_dc, new_features_dc), dim=0)
new_features_rest = self._features_rest[selected_pts_mask]
new_features_rest = torch.cat((new_features_rest, new_features_rest), dim=0)
new_opacity = self._opacity[selected_pts_mask]
new_opacity = torch.cat((new_opacity, new_opacity), dim=0)
new_opacity = self._opacity[selected_pts_mask].repeat(N,1)
new_objects_dc = self._objects_dc[selected_pts_mask].repeat(N,1,1) if self.enable_semantic else None
self.densification_postfix(new_xyz, new_features_dc, new_features_rest, new_opacity, new_scaling, new_rotation, new_objects_dc)
prune_filter = torch.cat((selected_pts_mask, torch.zeros(N * selected_pts_mask.sum(), device="cuda", dtype=bool)))
self.prune_points(prune_filter)
def densify_and_clone(self, grads, grad_threshold, scene_extent):
# Extract points that satisfy the gradient condition
selected_pts_mask = torch.where(torch.norm(grads, dim=-1) >= grad_threshold, True, False)
selected_pts_mask = torch.logical_and(selected_pts_mask,
torch.max(self.get_scaling, dim=1).values <= self.percent_dense*scene_extent)
new_xyz = self._xyz[selected_pts_mask]
new_features_dc = self._features_dc[selected_pts_mask]
new_features_rest = self._features_rest[selected_pts_mask]
new_opacities = self._opacity[selected_pts_mask]
new_scaling = self._scaling[selected_pts_mask]
new_rotation = self._rotation[selected_pts_mask]
new_objects_dc = self._objects_dc[selected_pts_mask] if self.enable_semantic else None
self.densification_postfix(new_xyz, new_features_dc, new_features_rest, new_opacities, new_scaling, new_rotation, new_objects_dc)
def densify_and_prune(self, max_grad, min_opacity, extent, max_screen_size, visi=None):
grads = self.xyz_gradient_accum / self.denom
grads[grads.isnan()] = 0.0
self.densify_and_clone(grads, max_grad, extent)
self.densify_and_split_along_maxscaling(grads, max_grad, extent, visi=visi)
prune_mask = (self.get_opacity < min_opacity).squeeze()
if max_screen_size:
big_points_vs = self.max_radii2D > max_screen_size
big_points_ws = self.get_scaling.max(dim=1).values > 0.1 * extent
prune_mask = torch.logical_or(torch.logical_or(prune_mask, big_points_vs), big_points_ws)
self.prune_points(prune_mask)
torch.cuda.empty_cache()
def prune_gaussians(self, percent, import_score: list):
sorted_tensor, _ = torch.sort(import_score, dim=0)
index_nth_percentile = int(percent * (sorted_tensor.shape[0] - 1))
value_nth_percentile = sorted_tensor[index_nth_percentile]
prune_mask = (import_score <= value_nth_percentile).squeeze()
# TODO(Kevin) Emergent, change it back. This is just for testing
self.prune_points(prune_mask)
def add_densification_stats(self, viewspace_point_tensor, update_filter):
self.xyz_gradient_accum[update_filter] += torch.norm(viewspace_point_tensor.grad[update_filter,:2], dim=-1, keepdim=True)
self.denom[update_filter] += 1
def get_inside_gaus_normalized(self):
inside, pts = get_inside_normalized(self.get_xyz, self.trans, self.scale)
return inside, pts
def normalize_pts(self, pts):
pts = normalize_pts(pts, self.trans, self.scale)
return pts
def filter_points(self, nb_points=5, radius=0.01, std_ratio=0.01):
inside, _ = self.get_inside_gaus_normalized()
xyz = self.get_xyz[inside]
filte_valid = remove_radius_outlier(xyz, nb_points, radius*self.extent)
inside[inside.clone()] = filte_valid
return inside
def prune_outside(self):
inside, _ = self.get_inside_gaus_normalized()
self.prune_points(~inside)
def prune_outliers(self):
mask = torch.ones(self.get_xyz.shape[0], dtype=torch.bool, device="cuda")
valid = self.filter_points()
mask[valid] = False
self.prune_points(mask)
def prune_semantics(self, cls=BACKGROUND):
mask = torch.ones(self.get_xyz.shape[0], dtype=torch.bool, device="cuda")
mask[self.get_cls() != cls] = False
self.prune_points(mask)
if __name__ == '__main__':
model = GaussianModel(2)
m2 = deepcopy(model)
================================================
FILE: tools/__init__.py
================================================
================================================
FILE: tools/camera.py
================================================
'''
-----------------------------------------------------------------------------
Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
NVIDIA CORPORATION and its licensors retain all intellectual property
and proprietary rights in and to this software, related documentation
and any modifications thereto. Any use, reproduction, disclosure or
distribution of this software and related documentation without an express
license agreement from NVIDIA CORPORATION is strictly prohibited.
-----------------------------------------------------------------------------
'''
import numpy as np
import torch
class Pose():
"""
A class of operations on camera poses (PyTorch tensors with shape [...,3,4]).
Each [3,4] camera pose takes the form of [R|t].
"""
def __call__(self, R=None, t=None):
# Construct a camera pose from the given R and/or t.
assert R is not None or t is not None
if R is None:
if not isinstance(t, torch.Tensor):
t = torch.tensor(t)
R = torch.eye(3, device=t.device).repeat(*t.shape[:-1], 1, 1)
elif t is None:
if not isinstance(R, torch.Tensor):
R = torch.tensor(R)
t = torch.zeros(R.shape[:-1], device=R.device)
else:
if not isinstance(R, torch.Tensor):
R = torch.tensor(R)
if not isinstance(t, torch.Tensor):
t = torch.tensor(t)
assert R.shape[:-1] == t.shape and R.shape[-2:] == (3, 3)
R = R.float()
t = t.float()
pose = torch.cat([R, t[..., None]], dim=-1)
assert pose.shape[-2:] == (3, 4)
return pose
def invert(self, pose, use_inverse=False):
# Invert a camera pose.
R, t = pose[..., :3], pose[..., 3:]
R_inv = R.inverse() if use_inverse else R.transpose(-1, -2)
t_inv = (-R_inv @ t)[..., 0]
pose_inv = self(R=R_inv, t=t_inv)
return pose_inv
def compose(self, pose_list):
# Compose a sequence of poses together.
# pose_new(x) = poseN o ... o pose2 o pose1(x)
pose_new = pose_list[0]
for pose in pose_list[1:]:
pose_new = self.compose_pair(pose_new, pose)
return pose_new
def compose_pair(self, pose_a, pose_b):
R_a, t_a = pose_a[..., :3], pose_a[..., 3:]
R_b, t_b = pose_b[..., :3], pose_b[..., 3:]
R_new = R_b @ R_a
t_new = (R_b @ t_a + t_b)[..., 0]
pose_new = self(R=R_new, t=t_new)
return pose_new
def scale_center(self, pose, scale):
"""Scale the camera center from the origin.
0 = R@c+t --> c = -R^T@t (camera center in world coordinates)
0 = R@(sc)+t' --> t' = -R@(sc) = -R@(-R^T@st) = st
"""
R, t = pose[..., :3], pose[..., 3:]
pose_new = torch.cat([R, t * scale], dim=-1)
return pose_new
def interpolate(self, pose_a, pose_b, alpha):
"""Interpolate between two poses with Slerp.
Args:
pose_a (tensor [...,3,4]): Pose at time t=0.
pose_b (tensor [...,3,4]): Pose at time t=1.
alpha (tensor [...,1]): Interpolation parameter.
Returns:
pose (tensor [...,3,4]): Pose at time t.
"""
R_a, t_a = pose_a[..., :3], pose_a[..., 3:]
R_b, t_b = pose_b[..., :3], pose_b[..., 3:]
q_a = quaternion.R_to_q(R_a) # [...,4]
q_b = quaternion.R_to_q(R_b) # [...,4]
q_intp = quaternion.interpolate(q_a, q_b, alpha) # [...,4]
R_intp = quaternion.q_to_R(q_intp) # [...,3,3]
t_intp = (1 - alpha) * t_a + alpha * t_b # [...,3]
pose_intp = torch.cat([R_intp, t_intp], dim=-1) # [...,3,4]
return pose_intp
class Lie():
"""
Lie algebra for SO(3) and SE(3) operations in PyTorch.
"""
def so3_to_SO3(self, w): # [..., 3]
wx = self.skew_symmetric(w)
theta = w.norm(dim=-1)[..., None, None]
eye = torch.eye(3, device=w.device, dtype=torch.float32)
A = self.taylor_A(theta)
B = self.taylor_B(theta)
R = eye + A * wx + B * wx @ wx
return R
def SO3_to_so3(self, R, eps=1e-7): # [..., 3, 3]
trace = R[..., 0, 0] + R[..., 1, 1] + R[..., 2, 2]
theta = ((trace - 1) / 2).clamp(-1 + eps, 1 - eps).acos_()[
..., None, None] % np.pi # ln(R) will explode if theta==pi
lnR = 1 / (2 * self.taylor_A(theta) + 1e-8) * (R - R.transpose(-2, -1)) # FIXME: wei-chiu finds it weird
w0, w1, w2 = lnR[..., 2, 1], lnR[..., 0, 2], lnR[..., 1, 0]
w = torch.stack([w0, w1, w2], dim=-1)
return w
def se3_to_SE3(self, wu): # [...,3]
w, u = wu.split([3, 3], dim=-1)
wx = self.skew_symmetric(w)
theta = w.norm(dim=-1)[..., None, None]
eye = torch.eye(3, device=w.device, dtype=torch.float32)
A = self.taylor_A(theta)
B = self.taylor_B(theta)
C = self.taylor_C(theta)
R = eye + A * wx + B * wx @ wx
V = eye + B * wx + C * wx @ wx
Rt = torch.cat([R, (V @ u[..., None])], dim=-1)
return Rt
def SE3_to_se3(self, Rt, eps=1e-8): # [...,3,4]
R, t = Rt.split([3, 1], dim=-1)
w = self.SO3_to_so3(R)
wx = self.skew_symmetric(w)
theta = w.norm(dim=-1)[..., None, None]
eye = torch.eye(3, device=w.device, dtype=torch.float32)
A = self.taylor_A(theta)
B = self.taylor_B(theta)
invV = eye - 0.5 * wx + (1 - A / (2 * B)) / (theta ** 2 + eps) * wx @ wx
u = (invV @ t)[..., 0]
wu = torch.cat([w, u], dim=-1)
return wu
def skew_symmetric(self, w):
w0, w1, w2 = w.unbind(dim=-1)
zero = torch.zeros_like(w0)
wx = torch.stack([torch.stack([zero, -w2, w1], dim=-1),
torch.stack([w2, zero, -w0], dim=-1),
torch.stack([-w1, w0, zero], dim=-1)], dim=-2)
return wx
def taylor_A(self, x, nth=10):
# Taylor expansion of sin(x)/x.
ans = torch.zeros_like(x)
denom = 1.
for i in range(nth + 1):
if i > 0:
denom *= (2 * i) * (2 * i + 1)
ans = ans + (-1) ** i * x ** (2 * i) / denom
return ans
def taylor_B(self, x, nth=10):
# Taylor expansion of (1-cos(x))/x**2.
ans = torch.zeros_like(x)
denom = 1.
for i in range(nth + 1):
denom *= (2 * i + 1) * (2 * i + 2)
ans = ans + (-1) ** i * x ** (2 * i) / denom
return ans
def taylor_C(self, x, nth=10):
# Taylor expansion of (x-sin(x))/x**3.
ans = torch.zeros_like(x)
denom = 1.
for i in range(nth + 1):
denom *= (2 * i + 2) * (2 * i + 3)
ans = ans + (-1) ** i * x ** (2 * i) / denom
return ans
class Quaternion():
def q_to_R(self, q): # [...,4]
# https://en.wikipedia.org/wiki/Rotation_matrix#Quaternion
qa, qb, qc, qd = q.unbind(dim=-1)
R = torch.stack(
[torch.stack([1 - 2 * (qc ** 2 + qd ** 2), 2 * (qb * qc - qa * qd), 2 * (qa * qc + qb * qd)], dim=-1),
torch.stack([2 * (qb * qc + qa * qd), 1 - 2 * (qb ** 2 + qd ** 2), 2 * (qc * qd - qa * qb)], dim=-1),
torch.stack([2 * (qb * qd - qa * qc), 2 * (qa * qb + qc * qd), 1 - 2 * (qb ** 2 + qc ** 2)], dim=-1)],
dim=-2)
return R
def R_to_q(self, R, eps=1e-6): # [...,3,3]
# https://en.wikipedia.org/wiki/Rotation_matrix#Quaternion
row0, row1, row2 = R.unbind(dim=-2)
R00, R01, R02 = row0.unbind(dim=-1)
R10, R11, R12 = row1.unbind(dim=-1)
R20, R21, R22 = row2.unbind(dim=-1)
t = R[..., 0, 0] + R[..., 1, 1] + R[..., 2, 2]
r = (1 + t + eps).sqrt()
qa = 0.5 * r
qb = (R21 - R12).sign() * 0.5 * (1 + R00 - R11 - R22 + eps).sqrt()
qc = (R02 - R20).sign() * 0.5 * (1 - R00 + R11 - R22 + eps).sqrt()
qd = (R10 - R01).sign() * 0.5 * (1 - R00 - R11 + R22 + eps).sqrt()
q = torch.stack([qa, qb, qc, qd], dim=-1)
return q
def invert(self, q): # [...,4]
qa, qb, qc, qd = q.unbind(dim=-1)
norm = q.norm(dim=-1, keepdim=True)
q_inv = torch.stack([qa, -qb, -qc, -qd], dim=-1) / norm ** 2
return q_inv
def product(self, q1, q2): # [...,4]
q1a, q1b, q1c, q1d = q1.unbind(dim=-1)
q2a, q2b, q2c, q2d = q2.unbind(dim=-1)
hamil_prod = torch.stack([q1a * q2a - q1b * q2b - q1c * q2c - q1d * q2d,
q1a * q2b + q1b * q2a + q1c * q2d - q1d * q2c,
q1a * q2c - q1b * q2d + q1c * q2a + q1d * q2b,
q1a * q2d + q1b * q2c - q1c * q2b + q1d * q2a], dim=-1)
return hamil_prod
def interpolate(self, q1, q2, alpha): # [...,4],[...,4],[...,1]
# https://en.wikipedia.org/wiki/Slerp
cos_angle = (q1 * q2).sum(dim=-1, keepdim=True) # [...,1]
flip = cos_angle < 0
q1 = q1 * (~flip) - q1 * flip # [...,4]
theta = cos_angle.abs().acos() # [...,1]
slerp = (((1 - alpha) * theta).sin() * q1 + (alpha * theta).sin() * q2) / theta.sin() # [...,4]
return slerp
pose = Pose()
lie = Lie()
quaternion = Quaternion()
def to_hom(X):
# Get homogeneous coordinates of the input.
X_hom = torch.cat([X, torch.ones_like(X[..., :1])], dim=-1)
return X_hom
# Basic operations of transforming 3D points between world/camera/image coordinates.
def world2cam(X, pose): # [B,N,3]
X_hom = to_hom(X)
return X_hom @ pose.transpose(-1, -2)
def cam2img(X, cam_intr):
return X @ cam_intr.transpose(-1, -2)
def img2cam(X, cam_intr):
return X @ cam_intr.inverse().transpose(-1, -2)
def cam2world(X, pose):
X_hom = to_hom(X)
pose_inv = Pose().invert(pose)
return X_hom @ pose_inv.transpose(-1, -2)
def angle_to_rotation_matrix(a, axis):
# Get the rotation matrix from Euler angle around specific axis.
roll = dict(X=1, Y=2, Z=0)[axis]
if isinstance(a, float):
a = torch.tensor(a)
zero = torch.zeros_like(a)
eye = torch.ones_like(a)
M = torch.stack([torch.stack([a.cos(), -a.sin(), zero], dim=-1),
torch.stack([a.sin(), a.cos(), zero], dim=-1),
torch.stack([zero, zero, eye], dim=-1)], dim=-2)
M = M.roll((roll, roll), dims=(-2, -1))
return M
def get_center_and_ray(pose, intr, image_size):
"""
Args:
pose (tensor [3,4]/[B,3,4]): Camera pose.
intr (tensor [3,3]/[B,3,3]): Camera intrinsics.
image_size (list of int): Image size.
Returns:
center_3D (tensor [HW,3]/[B,HW,3]): Center of the camera.
ray (tensor [HW,3]/[B,HW,3]): Ray of the camera with depth=1 (note: not unit ray).
"""
H, W = image_size
# Given the intrinsic/extrinsic matrices, get the camera center and ray directions.
with torch.no_grad():
# Compute image coordinate grid.
y_range = torch.arange(H, dtype=torch.float32, device=pose.device).add_(0.5)
x_range = torch.arange(W, dtype=torch.float32, device=pose.device).add_(0.5)
Y, X = torch.meshgrid(y_range, x_range, indexing="ij") # [H,W]
xy_grid = torch.stack([X, Y], dim=-1).view(-1, 2) # [HW,2]
# Compute center and ray.
if len(pose.shape) == 3:
batch_size = len(pose)
xy_grid = xy_grid.repeat(batch_size, 1, 1) # [B,HW,2]
grid_3D = img2cam(to_hom(xy_grid), intr) # [HW,3]/[B,HW,3]
center_3D = torch.zeros_like(grid_3D) # [HW,3]/[B,HW,3]
# Transform from camera to world coordinates.
grid_3D = cam2world(grid_3D, pose) # [HW,3]/[B,HW,3]
center_3D = cam2world(center_3D, pose) # [HW,3]/[B,HW,3]
ray = grid_3D - center_3D # [B,HW,3]
return center_3D, ray
def get_3D_points_from_dist(center, ray_unit, dist, multi=True):
# Two possible use cases: (1) center + ray_unit * dist, or (2) center + ray * depth
if multi:
center, ray_unit = center[..., None, :], ray_unit[..., None, :] # [...,1,3]
# x = c+dv
points_3D = center + ray_unit * dist # [...,3]/[...,N,3]
return points_3D
def convert_NDC(center, ray, intr, near=1):
# Shift camera center (ray origins) to near plane (z=1).
# (Unlike conventional NDC, we assume the cameras are facing towards the +z direction.)
center = center + (near - center[..., 2:]) / ray[..., 2:] * ray
# Projection.
cx, cy, cz = center.unbind(dim=-1) # [...,R]
rx, ry, rz = ray.unbind(dim=-1) # [...,R]
scale_x = intr[..., 0, 0] / intr[..., 0, 2] # [...]
scale_y = intr[..., 1, 1] / intr[..., 1, 2] # [...]
cnx = scale_x[..., None] * (cx / cz)
cny = scale_y[..., None] * (cy / cz)
cnz = 1 - 2 * near / cz
rnx = scale_x[..., None] * (rx / rz - cx / cz)
rny = scale_y[..., None] * (ry / rz - cy / cz)
rnz = 2 * near / cz
center_ndc = torch.stack([cnx, cny, cnz], dim=-1) # [...,R,3]
ray_ndc = torch.stack([rnx, rny, rnz], dim=-1) # [...,R,3]
return center_ndc, ray_ndc
def convert_NDC2(center, ray, intr):
# Similar to convert_NDC() but shift the ray origins to its own image plane instead of the global near plane.
# Also this version is much more interpretable.
scale_x = intr[..., 0, 0] / intr[..., 0, 2] # [...]
scale_y = intr[..., 1, 1] / intr[..., 1, 2] # [...]
# Get the metric image plane (i.e. new "center"): (sx*cx/cz, sy*cy/cz, 1-2/cz).
center = center + ray # This is the key difference.
cx, cy, cz = center.unbind(dim=-1) # [...,R]
image_plane = torch.stack([scale_x[..., None] * cx / cz,
scale_x[..., None] * cy / cz,
1 - 2 / cz], dim=-1)
# Get the infinity plane: (sx*rx/rz, sy*ry/rz, 1).
rx, ry, rz = ray.unbind(dim=-1) # [...,R]
inf_plane = torch.stack([scale_x[..., None] * rx / rz,
scale_y[..., None] * ry / rz,
torch.ones_like(rz)], dim=-1)
# The NDC ray is the difference between the two planes, assuming t \in [0,1].
ndc_ray = inf_plane - image_plane
return image_plane, ndc_ray
def rotation_distance(R1, R2, eps=1e-7):
# http://www.boris-belousov.net/2016/12/01/quat-dist/
R_diff = R1 @ R2.transpose(-2, -1)
trace = R_diff[..., 0, 0] + R_diff[..., 1, 1] + R_diff[..., 2, 2]
angle = ((trace - 1) / 2).clamp(-1 + eps, 1 - eps).acos_() # numerical stability near -1/+1
return angle
def get_oscil_novel_view_poses(N=60, angle=0.05, dist=5):
# Create circular viewpoints (small oscillations).
theta = torch.arange(N) / N * 2 * np.pi
R_x = angle_to_rotation_matrix((theta.sin() * angle).asin(), "X")
R_y = angle_to_rotation_matrix((theta.cos() * angle).asin(), "Y")
pose_rot = pose(R=R_y @ R_x)
pose_shift = pose(t=[0, 0, dist])
pose_oscil = pose.compose([pose.invert(pose_shift), pose_rot, pose_shift])
return pose_oscil
def cross_product_matrix(x):
"""Matrix form of cross product opertaion.
param x: [3,] tensor.
return: [3, 3] tensor representing the matrix form of cross product.
"""
return torch.tensor(
[[0, -x[2], x[1]],
[x[2], 0, -x[0]],
[-x[1], x[0], 0, ]]
)
def essential_matrix(poses):
"""Compute Essential Matrix from a relative pose.
param poses: [views, 3, 4] tensor representing relative poses.
return: [views, 3, 3] tensor representing Essential Matrix.
"""
r = poses[..., 0:3]
t = poses[..., 3]
tx = torch.stack([cross_product_matrix(tt) for tt in t], axis=0)
return tx @ r
def fundamental_matrix(poses, intr1, intr2):
"""Compute Fundamental Matrix from a relative pose and intrinsics.
param poses: [views, 3, 4] tensor representing relative poses.
intr1: [3, 3] tensor. Camera intrinsic of reference image.
intr2: [views, 3, 3] tensor. Camera Intrinsic of target image.
return: [views, 3, 3] tensor representing Fundamental Matrix.
"""
return intr2.inverse().transpose(-1, -2) @ essential_matrix(poses) @ intr1.inverse()
def get_ray_depth_plane_intersection(center, ray, depths):
"""Compute the intersection of a ray with a depth plane.
Args:
center (tensor [B,HW,3]): Camera center of the target pose.
ray (tensor [B,HW,3]): Ray direction of the target pose.
depth (tensor [L]): The depth values from the source view (e.g. for MPI planes).
Returns:
intsc_points (tensor [B,HW,L,3]): Intersecting 3D points with the MPI.
"""
# Each 3D point x along the ray v from center c can be written as x = c+t*v.
# Plane equation: n@x = d, where normal n = (0,0,1), d = depth.
# --> t = (d-n@c)/(n@v).
# --> x = c+t*v = c+(d-n@c)/(n@v)*v.
center, ray = center[:, :, None], ray[:, :, None] # [B,HW,L,3], [B,HW,1,3]
depths = depths[None, None, :, None] # [1,1,L,1]
intsc_points = center + (depths - center[..., 2:]) / ray[..., 2:] * ray # [B,HW,L,3]
return intsc_points
def unit_view_vector_to_rotation_matrix(v, axes="ZYZ"):
"""
Args:
v (tensor [...,3]): Unit vectors on the view sphere.
axes: rotation axis order.
Returns:
rotation_matrix (tensor [...,3,3]): rotation matrix R @ v + [0, 0, 1] = 0.
"""
alpha = torch.arctan2(v[..., 1], v[..., 0]) # [...]
beta = np.pi - v[..., 2].arccos() # [...]
euler_angles = torch.stack([torch.ones_like(alpha) * np.pi / 2, -beta, alpha], dim=-1) # [...,3]
rot2 = angle_to_rotation_matrix(euler_angles[..., 2], axes[2]) # [...,3,3]
rot1 = angle_to_rotation_matrix(euler_angles[..., 1], axes[1]) # [...,3,3]
rot0 = angle_to_rotation_matrix(euler_angles[..., 0], axes[0]) # [...,3,3]
rot = rot2 @ rot1 @ rot0 # [...,3,3]
return rot.transpose(-2, -1)
def sample_on_spherical_cap(anchor, N, max_angle):
"""Sample n points on the view hemisphere within the angle to x.
Args:
anchor (tensor [...,3]): Reference 3-D unit vector on the view hemisphere.
N (int): Number of sampled points.
max_angle (float): Sampled points should have max angle to x.
Returns:
sampled_points (tensor [...,N,3]): Sampled points on the spherical caps.
"""
batch_shape = anchor.shape[:-1]
# First, sample uniformly on a unit 2D disk.
radius = torch.rand(*batch_shape, N, device=anchor.device) # [...,N]
theta = torch.rand(*batch_shape, N, device=anchor.device) * 2 * np.pi # [...,N]
x = radius.sqrt() * theta.cos() # [...,N]
y = radius.sqrt() * theta.sin() # [...,N]
# Reparametrize to a unit spherical cap with height h.
# http://marc-b-reynolds.github.io/distribution/2016/11/28/Uniform.html
h = 1 - np.cos(max_angle) # spherical cap height
k = h * radius # [...,N]
s = (h * (2 - k)).sqrt() # [...,N]
points = torch.stack([s * x, s * y, 1 - k], dim=-1) # [...,N,3]
# Transform to center around the anchor.
ref_z = torch.tensor([0., 0., 1.], device=anchor.device)
v = -anchor.cross(ref_z) # [...,3]
ss_v = lie.skew_symmetric(v) # [...,3,3]
R = torch.eye(3, device=anchor.device) + ss_v + ss_v @ ss_v / (1 + anchor @ ref_z)[..., None, None] # [...,3,3]
points = points @ R.transpose(-2, -1) # [...,N,3]
return points
def sample_on_spherical_cap_northern(anchor, N, max_angle, away_from=None, max_reject_count=None):
"""Sample n points only the northern view hemisphere within the angle to x."""
def find_invalid_points(points):
southern = points[..., 2] < 0 # [...,N]
if away_from is not None:
cosine_ab = (away_from * anchor).sum(dim=-1, keepdim=True) # [...,1]
cosine_ac = (away_from[..., None, :] * points).sum(dim=-1) # [...,N]
not_outwards = cosine_ab < cosine_ac # [...,N]
invalid = southern | not_outwards
else:
invalid = southern
return invalid
assert (anchor[..., 2] > 0).all()
assert anchor.norm(dim=-1).allclose(torch.ones_like(anchor[..., 0]))
points = sample_on_spherical_cap(anchor, N, max_angle) # [...,N,3]
invalid = find_invalid_points(points)
count = 0
while invalid.any():
# Reject and resample.
points_resample = sample_on_spherical_cap(anchor, N, max_angle)
points[invalid] = points_resample[invalid]
invalid = find_invalid_points(points)
count += 1
if max_reject_count and count > max_reject_count:
points = anchor.repeat(N, 1)
return points
================================================
FILE: tools/camera_utils.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import math
import torch
import numpy as np
from tqdm import tqdm
from scipy.spatial.transform import Rotation as R
try:
from scene.cameras import Camera
except ImportError:
pass
from tools.general_utils import PILtoTorch, NumpytoTorch
from tools.graphics_utils import fov2focal
from tools.math_utils import inv_normalize_pts
from scene.cameras import SampleCam
WARNED = False
def loadCam(args, id, cam_info, resolution_scale):
orig_w, orig_h = cam_info.image.size
if args.resolution in [1, 2, 4, 8]:
resolution = round(orig_w/(resolution_scale * args.resolution)), round(orig_h/(resolution_scale * args.resolution))
else: # should be a type that converts to float
if args.resolution == -1:
if orig_w > 1600:
global WARNED
if not WARNED:
print("[ INFO ] Encountered quite large input images (>1.6K pixels width), rescaling to 1.6K.\n "
"If this is not desired, please explicitly specify '--resolution/-r' as 1")
WARNED = True
global_down = orig_w / 1600
else:
global_down = 1
else:
global_down = orig_w / args.resolution
scale = float(global_down) * float(resolution_scale)
resolution = (int(orig_w / scale), int(orig_h / scale))
resized_image_rgb = PILtoTorch(cam_info.image, resolution) / 255.
gt_image = resized_image_rgb[:3, ...]
loaded_mask = None
if resized_image_rgb.shape[0] == 4:
loaded_mask = resized_image_rgb[3:4, ...]
depth = None
if cam_info.depth is not None:
size = list(resolution)[::-1]
depth = NumpytoTorch(cam_info.depth, size)
normal = None
if cam_info.normal is not None:
size = list(resolution)[::-1]
normal = NumpytoTorch(cam_info.normal, size).permute(1, 2, 0) # H, W, 3
mask = None
if cam_info.mask is not None:
mask = PILtoTorch(cam_info.mask, resolution).squeeze(0)
if mask.dim() == 3: mask = mask[0]
return Camera(colmap_id=cam_info.uid, R=cam_info.R, T=cam_info.T,
FoVx=cam_info.FovX, FoVy=cam_info.FovY,
image=gt_image, gt_alpha_mask=loaded_mask,
image_name=cam_info.image_name, uid=id, data_device=args.data_device, depth=depth, normal=normal, mask=mask)
def cameraList_from_camInfos(cam_infos, resolution_scale, args):
camera_list = []
for id, c in tqdm(enumerate(cam_infos), total=len(cam_infos), desc="Processing data", leave=False):
camera_list.append(loadCam(args, id, c, resolution_scale))
return camera_list
def camera_to_JSON(id, camera):
Rt = np.zeros((4, 4))
Rt[:3, :3] = camera.R.transpose()
Rt[:3, 3] = camera.T
Rt[3, 3] = 1.0
W2C = np.linalg.inv(Rt)
pos = W2C[:3, 3]
rot = W2C[:3, :3]
serializable_array_2d = [x.tolist() for x in rot]
camera_entry = {
'id' : id,
'img_name' : camera.image_name,
'width' : camera.width,
'height' : camera.height,
'position': pos.tolist(),
'rotation': serializable_array_2d,
'fy' : fov2focal(camera.FovY, camera.height),
'fx' : fov2focal(camera.FovX, camera.width)
}
return camera_entry
def find_up_axis(R):
'''
R: world to bounding box coordinate system
'''
up_vector = torch.tensor([0, -1, 0], dtype=torch.float32, device=R.device) # world colmap
up_vector = R @ up_vector # bounding box coordinate system
up_axis = torch.argmax(torch.abs(up_vector))
up_sign = torch.sign(up_vector[up_axis])
return up_axis, up_sign
def find_axis(R, axis_name='up'):
'''
colmap coordinate system
R: world to bounding box coordinate system
'''
if axis_name == 'up':
axis_w=[0, -1, 0]
elif axis_name == 'front':
axis_w=[0, 0, 1]
elif axis_name == 'right':
axis_w=[1, 0, 0]
else:
raise ValueError(f'axis_name: "{axis_name}" should be one of [up, front, right]')
axis_w = torch.tensor(axis_w, dtype=torch.float32, device=R.device) # world colmap
axis_c = R @ axis_w # bounding box coordinate system
axis = torch.argmax(torch.abs(axis_c))
sign = torch.sign(axis_c[axis])
return axis, sign
def dot(x, y):
if isinstance(x, np.ndarray):
return np.sum(x * y, -1, keepdims=True)
else:
return torch.sum(x * y, -1, keepdim=True)
def length(x, eps=1e-20):
if isinstance(x, np.ndarray):
return np.sqrt(np.maximum(np.sum(x * x, axis=-1, keepdims=True), eps))
else:
return torch.sqrt(torch.clamp(dot(x, x), min=eps))
def safe_normalize(x, eps=1e-20):
return x / length(x, eps)
def look_at_np(campos, target, opengl=True):
# campos: [N, 3], camera/eye position
# target: [N, 3], object to look at
# return: [N, 3, 3], rotation matrix
if not opengl:
# camera forward aligns with -z colmap
forward_vector = safe_normalize(target - campos)
up_vector = np.array([0, 1, 0], dtype=np.float32)
right_vector = safe_normalize(np.cross(forward_vector, up_vector)) # z x up
up_vector = safe_normalize(np.cross(right_vector, forward_vector))
else:
# camera forward aligns with +z
forward_vector = safe_normalize(campos - target)
up_vector = np.array([0, 1, 0], dtype=np.float32)
right_vector = safe_normalize(np.cross(up_vector, forward_vector)) # up x z
up_vector = safe_normalize(np.cross(forward_vector, right_vector))
R = np.stack([right_vector, up_vector, forward_vector], axis=1) # axis=1 !!!!! 把行拼起来了 w2c
return R
def look_at(campos, target, opengl=True):
# campos: [N, 3], camera/eye position
# target: [N, 3], object to look at
# return: [N, 3, 3], rotation matrix
up_vector = torch.tensor([0, 1, 0], dtype=torch.float32, device=campos.device)
if campos.dim() == 2: up_vector = up_vector[None, :]
if not opengl:
# camera forward aligns with -z colmap
forward_vector = safe_normalize(target - campos)
right_vector = safe_normalize(torch.cross(forward_vector, up_vector)) # z x up
up_vector = safe_normalize(torch.cross(right_vector, forward_vector))
else:
# camera forward aligns with +z
forward_vector = safe_normalize(campos - target)
right_vector = safe_normalize(torch.cross(up_vector, forward_vector)) # up x z
up_vector = safe_normalize(torch.cross(forward_vector, right_vector))
R = torch.stack([right_vector, up_vector, forward_vector], dim=1) # axis=1 !!!!! 把行拼起来了 w2c
return R
# elevation & azimuth to pose (cam2world) matrix
def orbit_camera(elevation, azimuth, radius=1, is_degree=True, target=None, opengl=True):
# radius: scalar
# elevation: scalar, in (-90, 90), from +y to -y is (-90, 90)
# azimuth: scalar, in (-180, 180), from +z to +x is (0, 90)
# return: [4, 4], camera pose matrix
if is_degree:
elevation = np.deg2rad(elevation)
azimuth = np.deg2rad(azimuth)
x = radius * np.cos(elevation) * np.sin(azimuth)
y = - radius * np.sin(elevation)
z = radius * np.cos(elevation) * np.cos(azimuth)
if target is None:
target = np.zeros([3], dtype=np.float32)
campos = np.array([x, y, z]) + target # [3]
T = np.eye(4, dtype=np.float32)
T[:3, :3] = look_at_np(campos, target, opengl) # ??? should be look_at(campos, target, opengl).transpose(0, 2, 1)
T[:3, 3] = campos
return T
def cubic_camera(n, trans, scale, target=None, opengl=False):
xyz = np.random.rand(n, 3) * 2 - 1
for i in range(3): xyz[i::3, i] = xyz[i::3, i] / np.abs(xyz[i::3, i]) # Unit cube
if target is None: target = np.zeros([1, 3], dtype=np.float32)
xyz = inv_normalize_pts(xyz, trans, scale)
target = inv_normalize_pts(target, trans, scale)
T = np.zeros((n, 4, 4))
up_vector = [1, 0, 0]
T[:, :3, :3] = look_at(xyz, target, opengl, up_vector) # c2w
T[:, :3, 3] = xyz
T[:, 3, 3] = 1
T = np.linalg.inv(T) # w2c
return T
def check_tensor(x):
if isinstance(x, np.ndarray):
return torch.from_numpy(x).to(torch.float32)
else: return x
def up_camera(n, trans, scale, target=None, opengl=False): # colmap
trans = check_tensor(trans)
scale = check_tensor(scale)
device = trans.device
up_axis, up_sign = find_up_axis(trans[:3, :3])
v_axis = [i for i in [0, 1, 2] if i != up_axis]
xyz = torch.rand(n, 3).to(device) * 2 - 1
xyz[:, up_axis] = up_sign # up
if target is None:
target = check_tensor(target)
target = torch.zeros([1, 3], dtype=torch.float32, device=device)
target[:, up_axis] = 1 * -up_sign # 5
xyz = inv_normalize_pts(xyz, trans, scale)
target = inv_normalize_pts(target, trans, scale)
T = torch.zeros((xyz.shape[0], 4, 4), device=device) # w2c
R = look_at(xyz, target, opengl) # w2c
T[:, :3, :3] = R
T[:, :3, 3] = - (R @ xyz[..., None]).squeeze(-1) # w2c
T[:, 3, 3] = 1
return T
def around_camera(n, trans, scale, height=None, target=None, opengl=False):
trans = check_tensor(trans)
scale = check_tensor(scale)
device = trans.device
grid_points = torch.Tensor([
[-1, -1, -1],
[1, 1, 1],
]).to(device)
up_axis, up_sign = find_up_axis(trans[:3, :3])
v_axis = [i for i in [0, 1, 2] if i != up_axis]
xyz = torch.rand(n, 3).to(device) * 2 - 1
for i in v_axis: xyz[i-1::2, i] = xyz[i-1::2, i] / torch.abs(xyz[i-1::2, i])
if target is None:
target = check_tensor(target)
target = torch.zeros([1, 3], dtype=torch.float32, device=device)
xyz = inv_normalize_pts(xyz, trans, scale)
target = inv_normalize_pts(target, trans, scale)
grid_points = inv_normalize_pts(grid_points, trans, scale)
if height is None: height = target[0, 1]
xyz[:, 1] = height
T = torch.zeros((xyz.shape[0], 4, 4), device=device) # w2c
R = look_at(xyz, target, opengl) # w2c
T[:, :3, :3] = R
T[:, :3, 3] = - (R @ xyz[..., None]).squeeze(-1) # w2c
T[:, 3, 3] = 1
return T
def bb_camera(n, trans, scale, height=None, target=None, opengl=False, up=True, around=True, look_mode='target', sample_mode='grid', boundary=0.9, bidirect=False): # colmap 0.8
trans = check_tensor(trans)
scale = check_tensor(scale)
device = trans.device
if scale.ndim == 0: scale = torch.ones(3, dtype=torch.float32, device=device) * scale
rot = trans[:3, :3] if trans.ndim == 2 else torch.eye(3, device=device)
up_axis, up_sign = find_axis(rot, axis_name='up')
if sample_mode == 'grid' or (up and around):
right_axis, right_sign = find_axis(rot, axis_name='right')
front_axis, front_sign = find_axis(rot, axis_name='front')
v_axis = [i for i in [0, 1, 2] if i != up_axis]
up_n = around_n = n
if up and around:
h = scale[up_axis]
l = scale[right_axis]
w = scale[front_axis]
around_area = 2 * (l * h + h * w)
up_area = l * w
total_area = around_area + up_area
up_n = int((n * up_area / total_area) * 1)
xyz = []
if target is None:
if look_mode == 'target':
target = torch.zeros([1, 3], dtype=torch.float32, device=device)
target[:, up_axis] = 1 * -up_sign # 5
else:
target = []
else:
target = check_tensor(target)
if up:
if sample_mode == 'random':
xyz_up = torch.rand(up_n, 3).to(device) * 2 - 1
elif sample_mode == 'grid':
xyz_up = up_grid_posi(up_n, scale, right_axis, up_axis, front_axis).to(device)
around_n = n - up_n
xyz_up[:, up_axis] = up_sign
xyz.append(xyz_up)
if look_mode == 'direction':
tgt_up = xyz_up.clone()
tgt_up[:, up_axis] *= -1
target.append(tgt_up)
if around:
if sample_mode == 'random':
xyz_around = torch.rand(around_n, 3).to(device) * 2 - 1
elif sample_mode == 'grid':
if not bidirect:
xyz_around = around_grid_posi(around_n, scale, right_axis, up_axis, front_axis, up_sign=up_sign).to(device)
else:
n1 = around_n // 2
xyz1 = around_grid_posi(n1, scale, right_axis, up_axis, front_axis, sign=1, up_sign=up_sign).to(device)
n2 = around_n - xyz1.shape[0]
xyz2 = around_grid_posi(n2, scale, right_axis, up_axis, front_axis, sign=-1, up_sign=up_sign).to(device)
xyz_around = torch.cat([xyz1, xyz2], 0)
n_trg = xyz_up.shape[0] + xyz_around.shape[0] if up else xyz_around.shape[0]
target = target.repeat(n_trg, 1)
target[-xyz2.shape[0]:, up_axis] *= -1
xyz_around[:, up_axis] = xyz_around[:, up_axis] * boundary + (1 - boundary) * up_sign
xyz.append(xyz_around)
if look_mode == 'direction':
trg_around = xyz_around.clone()
for i in v_axis: trg_around[i-1::2, i] *= -1
target.append(trg_around)
xyz = torch.cat(xyz, 0)
if look_mode == 'direction':
target = torch.cat(target, 0)
xyz = inv_normalize_pts(xyz, trans, scale)
target = inv_normalize_pts(target, trans, scale)
T = torch.zeros((xyz.shape[0], 4, 4), device=device) # w2c
R = look_at(xyz, target, opengl) # w2c
T[:, :3, :3] = R
T[:, :3, 3] = - (R @ xyz[..., None]).squeeze(-1) # w2c
T[:, 3, 3] = 1
return T
def around_grid_posi(num_points, scale, right_axis, up_axis, front_axis, sign=1, up_sign=1):
device = scale.device
indexing = 'xy'
h = scale[up_axis]
l = scale[right_axis]
w = scale[front_axis]
total_area = 2 * (l * h + h * w)
ratio = (num_points / total_area).sqrt()
h_points = torch.round(h * ratio).int()
l_points = torch.round(l * ratio).int()
w_points = torch.round(w * ratio).int()
total_points = []
h_coord = torch.arange(start=-1, end=1, step=2 / h_points, device=device) * up_sign
step = 2 / l_points
st = -1 if sign == 1 else -1 + step
l_coord = torch.arange(start=st, end=1, step=step, device=device) # * sign
grid_l, grid_h = torch.meshgrid([l_coord, h_coord], indexing=indexing)
lh = torch.stack([grid_l.flatten(), grid_h.flatten()], dim=1)
points = torch.ones([lh.shape[0], 3], dtype=torch.float32, device=device) * 1
points[:, [right_axis, up_axis]] = lh
total_points.append(points)
# back
step = - 2 / l_points
st = 1 if sign == 1 else 1 + step
l_coord = torch.arange(start=st, end=-1, step=step, device=device) # * sign
grid_l, grid_h = torch.meshgrid([l_coord, h_coord], indexing=indexing)
lh = torch.stack([grid_l.flatten(), grid_h.flatten()], dim=1)
points = torch.ones([lh.shape[0], 3], dtype=torch.float32, device=device) * -1
points[:, [right_axis, up_axis]] = lh
total_points.append(points)
# right
step = - 2 / w_points
st = 1 if sign == 1 else 1 + step
w_coord = torch.arange(start=st, end=-1, step=step, device=device)
grid_h, grid_w = torch.meshgrid([h_coord, w_coord], indexing=indexing)
hw = torch.stack([grid_h.flatten(), grid_w.flatten()], dim=1)
points = torch.ones([hw.shape[0], 3], dtype=torch.float32, device=device) * 1
points[:, [up_axis, front_axis]] = hw
total_points.append(points)
# left
step = 2 / w_points
st = -1 if sign == 1 else -1 + step
w_coord = torch.arange(start=st, end=1, step = step, device=device)
grid_h, grid_w = torch.meshgrid([h_coord, w_coord], indexing=indexing)
hw = torch.stack([grid_h.flatten(), grid_w.flatten()], dim=1)
points = torch.ones([hw.shape[0], 3], dtype=torch.float32, device=device) * -1
points[:, [up_axis, front_axis]] = hw
total_points.append(points)
points = torch.cat(total_points, 0)
return points
def up_grid_posi(num_points, scale, right_axis, up_axis, front_axis):
h = scale[up_axis]
l = scale[right_axis]
w = scale[front_axis]
total_area = l * w
ratio = math.sqrt(num_points / total_area)
l_points = torch.round(l * ratio).int()
w_points = torch.round(w * ratio).int()
# up
l_coord = torch.linspace(start=-1, end=1, steps=l_points) # * 0.9
w_coord = torch.linspace(start=-1, end=1, steps=w_points) # * 0.9
grid_l, grid_w = torch.meshgrid([l_coord, w_coord], indexing='xy')
lw = torch.stack([grid_l.flatten(), grid_w.flatten()], dim=1)
points = torch.ones([lw.shape[0], 3], dtype=torch.float32) * 1
points[:, [right_axis, front_axis]] = lw
return points
def grid_camera(trans, scale, opengl=False):
trans = check_tensor(trans)
scale = check_tensor(scale)
device = trans.device
xyz = torch.tensor(
[
[-1, -1, -1],
[1, 1, 1],
[-1, 1, 1],
[1, -1, -1],
[-1, 1, -1],
[1, -1, 1],
[1, 1, -1],
[-1, -1, 1],
], dtype=torch.float32, device=device
)
if target is None:
target = check_tensor(target)
target = torch.zeros([1, 3], dtype=torch.float32, device=device)
xyz = inv_normalize_pts(xyz, trans, scale)
target = inv_normalize_pts(target, trans, scale)
T = torch.zeros((xyz.shape[0], 4, 4), device=device) # w2c
R = look_at(xyz, target, opengl) # w2c
T[:, :3, :3] = R
T[:, :3, 3] = - (R @ xyz[..., None]).squeeze(-1) # w2c
T[:, 3, 3] = 1
return T
def sample_cameras(model, n, up=False, around=True, look_mode='target', sample_mode='grid', bidirect=True):
cam_height = None
w2cs = bb_camera(n, model.trans, model.scale, cam_height, up=up, around=around, \
look_mode=look_mode, sample_mode=sample_mode, bidirect=bidirect)
# traincam = self.scene.getTrainCameras()[0]
# FoVx = traincam.FoVx # 1.3990553440909452
# FoVy = traincam.FoVy # 0.8764846384037163
# width = traincam.image_width # 1500
# height = traincam.image_height # 835
FoVx = FoVy = 2.5 # 3.14 / 2
width = height = 1500
cams = []
for i in range(w2cs.shape[0]):
w2c = w2cs[i]
cam = SampleCam(w2c, width, height, FoVx, FoVy)
cams.append(cam)
return cams
class OrbitCamera:
def __init__(self, W, H, r=2, fovy=60, near=0.01, far=100):
self.W = W
self.H = H
self.radius = r # camera distance from center
self.fovy = np.deg2rad(fovy) # deg 2 rad
self.near = near
self.far = far
self.center = np.array([0, 0, 0], dtype=np.float32) # look at this point
self.rot = R.from_matrix(np.eye(3))
self.up = np.array([0, 1, 0], dtype=np.float32) # need to be normalized!
@property
def fovx(self):
return 2 * np.arctan(np.tan(self.fovy / 2) * self.W / self.H)
@property
def campos(self):
return self.pose[:3, 3]
# pose (c2w)
@property
def pose(self):
# first move camera to radius
res = np.eye(4, dtype=np.float32)
res[2, 3] = self.radius # opengl convention...
# rotate
rot = np.eye(4, dtype=np.float32)
rot[:3, :3] = self.rot.as_matrix()
res = rot @ res
# translate
res[:3, 3] -= self.center
return res
# view (w2c)
@property
def view(self):
return np.linalg.inv(self.pose)
# projection (perspective)
@property
def perspective(self):
y = np.tan(self.fovy / 2)
aspect = self.W / self.H
return np.array(
[
[1 / (y * aspect), 0, 0, 0],
[0, -1 / y, 0, 0],
[
0,
0,
-(self.far + self.near) / (self.far - self.near),
-(2 * self.far * self.near) / (self.far - self.near),
],
[0, 0, -1, 0],
],
dtype=np.float32,
)
# intrinsics
@property
def intrinsics(self):
focal = self.H / (2 * np.tan(self.fovy / 2))
return np.array([focal, focal, self.W // 2, self.H // 2], dtype=np.float32)
@property
def mvp(self):
return self.perspective @ np.linalg.inv(self.pose) # [4, 4]
def orbit(self, dx, dy):
# rotate along camera up/side axis!
side = self.rot.as_matrix()[:3, 0]
rotvec_x = self.up * np.radians(-0.05 * dx)
rotvec_y = side * np.radians(-0.05 * dy)
self.rot = R.from_rotvec(rotvec_x) * R.from_rotvec(rotvec_y) * self.rot
def scale(self, delta):
self.radius *= 1.1 ** (-delta)
def pan(self, dx, dy, dz=0):
# pan in camera coordinate system (careful on the sensitivity!)
self.center += 0.0005 * self.rot.as_matrix()[:3, :3] @ np.array([-dx, -dy, dz])
================================================
FILE: tools/crop_mesh.py
================================================
import os
import argparse
import numpy as np
import trimesh
def align_gt_with_cam(pts, trans):
trans_inv = np.linalg.inv(trans)
pts_aligned = pts @ trans_inv[:3, :3].transpose(-1, -2) + trans_inv[:3, -1]
return pts_aligned
def filter_largest_cc(mesh):
components = mesh.split(only_watertight=False)
areas = np.array([c.area for c in components], dtype=float)
if len(areas) > 0 and mesh.vertices.shape[0] > 0:
new_mesh = components[areas.argmax()]
else:
new_mesh = trimesh.Trimesh()
return new_mesh
def main(args):
assert os.path.exists(args.ply_path), f"PLY file {args.ply_path} does not exist."
gt_trans = np.loadtxt(args.align_path)
mesh_rec = trimesh.load(args.ply_path, process=False)
mesh_gt = trimesh.load(args.gt_path, process=False)
mesh_gt.vertices = align_gt_with_cam(mesh_gt.vertices, gt_trans)
to_align, _ = trimesh.bounds.oriented_bounds(mesh_gt)
mesh_gt.vertices = (to_align[:3, :3] @ mesh_gt.vertices.T + to_align[:3, 3:]).T
mesh_rec.vertices = (to_align[:3, :3] @ mesh_rec.vertices.T + to_align[:3, 3:]).T
min_points = mesh_gt.vertices.min(axis=0)
max_points = mesh_gt.vertices.max(axis=0)
mask_min = (mesh_rec.vertices - min_points[None]) > 0
mask_max = (mesh_rec.vertices - max_points[None]) < 0
mask = np.concatenate((mask_min, mask_max), axis=1).all(axis=1)
face_mask = mask[mesh_rec.faces].all(axis=1)
mesh_rec.update_vertices(mask)
mesh_rec.update_faces(face_mask)
mesh_rec.vertices = (to_align[:3, :3].T @ mesh_rec.vertices.T - to_align[:3, :3].T @ to_align[:3, 3:]).T
mesh_gt.vertices = (to_align[:3, :3].T @ mesh_gt.vertices.T - to_align[:3, :3].T @ to_align[:3, 3:]).T
# save mesh_rec and mesh_rec in args.out_path
mesh_rec.export(args.out_path)
# downsample mesh_gt
idx = np.random.choice(np.arange(len(mesh_gt.vertices)), 5000000)
mesh_gt.vertices = mesh_gt.vertices[idx]
mesh_gt.colors = mesh_gt.colors[idx]
mesh_gt.export(args.gt_path.replace('.ply', '_trans.ply'))
return
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument(
"--gt_path",
type=str,
default='/your/path//Barn_GT.ply',
help="path to a dataset/scene directory containing X.json, X.ply, ...",
)
parser.add_argument(
"--align_path",
type=str,
default='/your/path//Barn_trans.txt',
help="path to a dataset/scene directory containing X.json, X.ply, ...",
)
parser.add_argument(
"--ply_path",
type=str,
default='/your/path//Barn_lowres.ply',
help="path to reconstruction ply file",
)
parser.add_argument(
"--scene",
type=str,
default='Barn',
help="path to reconstruction ply file",
)
parser.add_argument(
"--out_path",
type=str,
default='/your/path//Barn_lowres_crop.ply',
help=
"output directory, default: an evaluation directory is created in the directory of the ply file",
)
args = parser.parse_args()
main(args)
================================================
FILE: tools/denoise_pcd.py
================================================
from pytorch3d.ops import ball_query, knn_points
def remove_radius_outlier(xyz, nb_points=5, radius=0.1):
if xyz.dim() == 2: xyz = xyz[None]
nn_dists, nn_idx, nn = ball_query(xyz, xyz, K=nb_points+1, radius=radius)
valid = ~(nn_idx[0]==-1).any(-1)
return valid
def remove_statistical_outlier(xyz, nb_points=20, std_ratio=20.):
if xyz.dim() == 2: xyz = xyz[None]
nn_dists, nn_idx, nn = knn_points(xyz, xyz, K=nb_points, return_sorted=False)
# Compute distances to neighbors
distances = nn_dists.squeeze(0) # Shape: (N, nb_neighbors)
# Compute mean and standard deviation of distances
mean_distances = distances.mean(dim=-1)
std_distances = distances.std(dim=-1)
# Identify points that are not outliers
threshold = mean_distances + std_ratio * std_distances
valid = (distances <= threshold.unsqueeze(1)).any(dim=1)
return valid
if __name__ == '__main__':
import torch
import time
gpu = 0
device = torch.device('cuda:{:d}'.format(gpu) if torch.cuda.is_available() else 'cpu')
t1 = time.time()
xyz = torch.rand(int(1e7), 3).to(device)
remove_statistical_outlier(xyz)
print('time:', time.time()-t1, 's')
================================================
FILE: tools/depth2mesh.py
================================================
import os
import sys
import math
import torch
import argparse
import numpy as np
import open3d as o3d
import open3d.core as o3c
sys.path.append(os.getcwd())
from configs.config import Config
from gaussian_renderer import render
from scene import Scene, GaussianModel
from tools.semantic_id import BACKGROUND
from tools.graphics_utils import depth2point
from tools.general_utils import set_random_seed
from tools.math_utils import get_inside_normalized
from tools.mesh_utils import GaussianExtractor, post_process_mesh
@torch.no_grad()
def tsdf_fusion(args, cfg, model, cameras, dirs, bg, outdir, mesh_name='fused_mesh.ply', max_depth=5.0):
o3d_device = o3d.core.Device("CUDA:0")
vbg = o3d.t.geometry.VoxelBlockGrid(
attr_names=('tsdf', 'weight', 'color'),
attr_dtypes=(o3c.float32, o3c.float32, o3c.float32),
attr_channels=((1), (1), (3)),
voxel_size=args.voxel_size,
block_resolution=16,
block_count=60000,
device=o3d_device)
with torch.no_grad():
for _, view in enumerate(cameras):
render_pkg = render(view, model, cfg, bg, dirs=dirs)
if args.depth_mode == 'mean':
depth = render_pkg["depth"]
elif args.depth_mode == 'median':
depth = render_pkg["median_depth"]
rgb = render_pkg["render"]
alpha = render_pkg["alpha"]
if view.gt_alpha_mask is not None:
depth[(view.gt_alpha_mask < 0.5)] = 0
depth[(alpha < args.alpha_thres)] = 0
rendered_pcd_world = depth2point(depth[0], view.intr, view.world_view_transform.transpose(0, 1))[1]
inside = get_inside_normalized(rendered_pcd_world.view(-1, 3), model.trans, model.scale)[0]
depth.view(-1)[~inside] = 0
if 'render_sem' in render_pkg:
semantic = render_pkg["render_sem"]
prob = model.logits2prob(semantic)
mask = (prob[..., BACKGROUND] > args.prob_thres)[None]
depth[mask] = 0
intrinsic=o3d.camera.PinholeCameraIntrinsic(width=view.image_width,
height=view.image_height,
cx = view.image_width/2,
cy = view.image_height/2,
fx = view.image_width / (2 * math.tan(view.FoVx / 2.)),
fy = view.image_height / (2 * math.tan(view.FoVy / 2.)))
extrinsic = np.asarray((view.world_view_transform.T).cpu().numpy())
rgb = rgb.clamp(0, 1)
o3d_color = o3d.t.geometry.Image(np.asarray(rgb.permute(1,2,0).cpu().numpy(), order="C"))
o3d_depth = o3d.t.geometry.Image(np.asarray(depth.permute(1,2,0).cpu().numpy(), order="C"))
o3d_color = o3d_color.to(o3d_device)
o3d_depth = o3d_depth.to(o3d_device)
intrinsic = o3d.core.Tensor(intrinsic.intrinsic_matrix, o3d.core.Dtype.Float64)#.to(o3d_device)
extrinsic = o3d.core.Tensor(extrinsic, o3d.core.Dtype.Float64)#.to(o3d_device)
frustum_block_coords = vbg.compute_unique_block_coordinates(
o3d_depth, intrinsic, extrinsic, 1.0, max_depth)
vbg.integrate(frustum_block_coords, o3d_depth, o3d_color, intrinsic,
intrinsic, extrinsic, 1.0, max_depth)
mesh = vbg.extract_triangle_mesh().to_legacy()
# write mesh
o3d.io.write_triangle_mesh(os.path.join(outdir, mesh_name), mesh)
# Clean Mesh
if args.clean:
import pymeshlab
ms = pymeshlab.MeshSet()
ms.load_new_mesh(os.path.join(outdir, mesh_name))
ms.meshing_remove_unreferenced_vertices()
ms.meshing_remove_duplicate_faces()
ms.meshing_remove_null_faces()
ms.meshing_remove_connected_component_by_face_number(mincomponentsize=20000)
ms.save_current_mesh(os.path.join(outdir, mesh_name))
with open(os.path.join(outdir, 'voxel_size.txt'), 'w') as f:
f.write(f'voxel_size: {args.voxel_size}')
def tsdf_cpu(args, cfg, model, cameras, dirs, bg, outdir, mesh_name='fused_mesh.ply', max_depth=5.0):
gaussExtractor = GaussianExtractor(model, render, cfg, bg_color=bg, dirs=dirs, prob_thres=args.prob_thres, alpha_thres=args.alpha_thres)
gaussExtractor.gaussians.active_sh_degree = 0
gaussExtractor.reconstruction(cameras)
# extract the mesh and save
if args.unbounded:
mesh = gaussExtractor.extract_mesh_unbounded(resolution=args.mesh_res)
else:
mesh = gaussExtractor.extract_mesh_bounded(voxel_size=args.voxel_size, sdf_trunc=5*args.voxel_size, depth_trunc=max_depth)
o3d.io.write_triangle_mesh(os.path.join(outdir, mesh_name), mesh)
print("mesh saved at {}".format(os.path.join(outdir, mesh_name)))
# post-process the mesh and save, saving the largest N clusters
mesh_post = post_process_mesh(mesh, cluster_to_keep=args.num_cluster)
o3d.io.write_triangle_mesh(os.path.join(outdir, mesh_name), mesh_post)
return
def main(args):
cfg = Config(args.cfg_path)
cfg.model.data_device = 'cpu'
cfg.model.load_normal = False
cfg.model.load_mask = False
args.voxel_size = cfg.model.mesh.voxel_size if args.voxel_size == 0 else args.voxel_size
set_random_seed(cfg.seed)
model = GaussianModel(cfg.model)
scene = Scene(cfg.model, model, load_iteration=-1, shuffle=False)
model.trans = torch.from_numpy(scene.trans).cuda()
model.scale = torch.from_numpy(scene.scale).cuda() * 1.1
model.extent = scene.cameras_extent
cameras = scene.getTrainCameras().copy()[::args.split]
model.training_setup(cfg.optim)
model.max_radii2D = torch.zeros((model.get_xyz.shape[0]), device="cuda")
model.scale = torch.from_numpy(scene.scale).cuda()
model.prune_outliers()
bg_color = [1, 1, 1] if cfg.model.white_background else [0, 0, 0]
background = torch.tensor(bg_color, dtype=torch.float32, device="cuda")
print(f'Fusing into {args.mesh_name} vs: {args.voxel_size}...')
if args.method == 'tsdf':
dirs = scene.dirs
max_depth = (model.scale ** 2).sum().sqrt().item()
max_depth = args.max_depth
tsdf_fusion(args, cfg, model, cameras, dirs, background, cfg.logdir, args.mesh_name, max_depth)
elif args.method == 'tsdf_cpu':
dirs = scene.dirs
max_depth = args.max_depth
tsdf_cpu(args, cfg, model, cameras, dirs, background, cfg.logdir, args.mesh_name, max_depth)
return
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--input', type=str, default='Barn')
parser.add_argument('--outdir', type=str, default=None)
parser.add_argument('--mesh_name', type=str, default='vcr_gaus.ply')
parser.add_argument('--scene', type=str, default='Barn')
parser.add_argument('--data_path', type=str, default='Barn')
parser.add_argument('--method', type=str, default='tsdf', choices=['tsdf', 'point2mesh', 'tsdf_cpu'])
parser.add_argument('--depth_mode', type=str, default='mean', choices=['mean', 'median'])
parser.add_argument('--rec_method', type=str, default='poisson', choices=['nksr', 'poisson'])
parser.add_argument('--split', type=int, default=3)
parser.add_argument('--resolution', type=float, default=1.0)
parser.add_argument('--detail_level', type=float, default=1.0)
parser.add_argument('--voxel_size', type=float, default=5e-3)
parser.add_argument('--sdf_trunc', type=float, default=0.08)
parser.add_argument('--alpha_thres', type=float, default=0.5)
parser.add_argument('--prob_thres', type=float, default=0.15)
parser.add_argument('--mise_iter', type=int, default=1)
parser.add_argument('--depth', type=int, default=9)
parser.add_argument('--max_depth', type=float, default=6.0)
parser.add_argument('--est_normal', action='store_true')
parser.add_argument('--cfg_path', type=str, default='configs/config_base.yaml')
parser.add_argument('--clean', action='store_true', help='perform a clean operation')
parser.add_argument("--unbounded", action="store_true", help='Mesh: using unbounded mode for meshing')
parser.add_argument("--num_cluster", default=1000, type=int, help='Mesh: number of connected clusters to export')
args = parser.parse_args()
main(args)
================================================
FILE: tools/distributed.py
================================================
'''
-----------------------------------------------------------------------------
Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
NVIDIA CORPORATION and its licensors retain all intellectual property
and proprietary rights in and to this software, related documentation
and any modifications thereto. Any use, reproduction, disclosure or
distribution of this software and related documentation without an express
license agreement from NVIDIA CORPORATION is strictly prohibited.
-----------------------------------------------------------------------------
'''
import functools
import ctypes
import torch
import torch.distributed as dist
from contextlib import contextmanager
def init_dist(local_rank, backend='nccl', **kwargs):
r"""Initialize distributed training"""
if dist.is_available():
if dist.is_initialized():
return torch.cuda.current_device()
torch.cuda.set_device(local_rank)
dist.init_process_group(backend=backend, init_method='env://', **kwargs)
# Increase the L2 fetch granularity for faster speed.
_libcudart = ctypes.CDLL('libcudart.so')
# Set device limit on the current device
# cudaLimitMaxL2FetchGranularity = 0x05
pValue = ctypes.cast((ctypes.c_int * 1)(), ctypes.POINTER(ctypes.c_int))
_libcudart.cudaDeviceSetLimit(ctypes.c_int(0x05), ctypes.c_int(128))
_libcudart.cudaDeviceGetLimit(pValue, ctypes.c_int(0x05))
def get_rank():
r"""Get rank of the thread."""
rank = 0
if dist.is_available():
if dist.is_initialized():
rank = dist.get_rank()
return rank
def get_world_size():
r"""Get world size. How many GPUs are available in this job."""
world_size = 1
if dist.is_available():
if dist.is_initialized():
world_size = dist.get_world_size()
return world_size
def broadcast_object_list(message, src=0):
r"""Broadcast object list from the master to the others"""
# Send logdir from master to all workers.
if dist.is_available():
if dist.is_initialized():
torch.distributed.broadcast_object_list(message, src=src)
return message
def master_only(func):
r"""Apply this function only to the master GPU."""
@functools.wraps(func)
def wrapper(*args, **kwargs):
r"""Simple function wrapper for the master function"""
if get_rank() == 0:
return func(*args, **kwargs)
else:
return None
return wrapper
def is_master():
r"""check if current process is the master"""
return get_rank() == 0
def is_dist():
return dist.is_initialized()
def barrier():
if is_dist():
dist.barrier()
@contextmanager
def master_first():
if not is_master():
barrier()
yield
if dist.is_initialized() and is_master():
barrier()
def is_local_master():
return torch.cuda.current_device() == 0
@master_only
def master_only_print(*args):
r"""master-only print"""
print(*args)
def dist_reduce_tensor(tensor, rank=0, reduce='mean'):
r""" Reduce to rank 0 """
world_size = get_world_size()
if world_size < 2:
return tensor
with torch.no_grad():
dist.reduce(tensor, dst=rank)
if get_rank() == rank:
if reduce == 'mean':
tensor /= world_size
elif reduce == 'sum':
pass
else:
raise NotImplementedError
return tensor
def dist_all_reduce_tensor(tensor, reduce='mean'):
r""" Reduce to all ranks """
world_size = get_world_size()
if world_size < 2:
return tensor
with torch.no_grad():
dist.all_reduce(tensor)
if reduce == 'mean':
tensor /= world_size
elif reduce == 'sum':
pass
else:
raise NotImplementedError
return tensor
def dist_all_gather_tensor(tensor):
r""" gather to all ranks """
world_size = get_world_size()
if world_size < 2:
return [tensor]
tensor_list = [
torch.ones_like(tensor) for _ in range(dist.get_world_size())]
with torch.no_grad():
dist.all_gather(tensor_list, tensor)
return tensor_list
================================================
FILE: tools/general_utils.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import torch
import sys
from datetime import datetime
import numpy as np
import random
import torchvision.transforms.functional as torchvision_F
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
def inverse_sigmoid(x):
return torch.log(x/(1-x))
def PILtoTorch(pil_image, resolution):
resized_image_PIL = pil_image.resize(resolution)
resized_image = torch.from_numpy(np.array(resized_image_PIL))
if len(resized_image.shape) == 3:
return resized_image.permute(2, 0, 1)
else:
return resized_image.unsqueeze(dim=-1).permute(2, 0, 1)
def NumpytoTorch(image, resolution):
image = torch.from_numpy(image)
if image.ndim == 4: image = image.squeeze(0)
if image.shape[-1] == 3 or image.shape[-1] == 1:
image = image.permute(2, 0, 1)
_, orig_h, orig_w = image.shape
if resolution == [orig_h, orig_w]:
resized_image = image
else:
resized_image = torchvision_F.resize(image, resolution, antialias=True)
return resized_image
def get_expon_lr_func(
lr_init, lr_final, lr_delay_steps=0, lr_delay_mult=1.0, max_steps=1000000
):
"""
Copied from Plenoxels
Continuous learning rate decay function. Adapted from JaxNeRF
The returned rate is lr_init when step=0 and lr_final when step=max_steps, and
is log-linearly interpolated elsewhere (equivalent to exponential decay).
If lr_delay_steps>0 then the learning rate will be scaled by some smooth
function of lr_delay_mult, such that the initial learning rate is
lr_init*lr_delay_mult at the beginning of optimization but will be eased back
to the normal learning rate when steps>lr_delay_steps.
:param conf: config subtree 'lr' or similar
:param max_steps: int, the number of steps during optimization.
:return HoF which takes step as input
"""
def helper(step):
if step < 0 or (lr_init == 0.0 and lr_final == 0.0):
# Disable this parameter
return 0.0
if lr_delay_steps > 0:
# A kind of reverse cosine decay.
delay_rate = lr_delay_mult + (1 - lr_delay_mult) * np.sin(
0.5 * np.pi * np.clip(step / lr_delay_steps, 0, 1)
)
else:
delay_rate = 1.0
t = np.clip(step / max_steps, 0, 1)
log_lerp = np.exp(np.log(lr_init) * (1 - t) + np.log(lr_final) * t)
return delay_rate * log_lerp
return helper
def strip_lowerdiag(L):
uncertainty = torch.zeros((L.shape[0], 6), dtype=torch.float, device="cuda")
uncertainty[:, 0] = L[:, 0, 0]
uncertainty[:, 1] = L[:, 0, 1]
uncertainty[:, 2] = L[:, 0, 2]
uncertainty[:, 3] = L[:, 1, 1]
uncertainty[:, 4] = L[:, 1, 2]
uncertainty[:, 5] = L[:, 2, 2]
return uncertainty
def strip_symmetric(sym):
return strip_lowerdiag(sym)
def build_rotation(r):
norm = torch.sqrt(r[:,0]*r[:,0] + r[:,1]*r[:,1] + r[:,2]*r[:,2] + r[:,3]*r[:,3])
q = r / norm[:, None]
R = torch.zeros((q.size(0), 3, 3), device='cuda')
r = q[:, 0]
x = q[:, 1]
y = q[:, 2]
z = q[:, 3]
R[:, 0, 0] = 1 - 2 * (y*y + z*z)
R[:, 0, 1] = 2 * (x*y - r*z)
R[:, 0, 2] = 2 * (x*z + r*y)
R[:, 1, 0] = 2 * (x*y + r*z)
R[:, 1, 1] = 1 - 2 * (x*x + z*z)
R[:, 1, 2] = 2 * (y*z - r*x)
R[:, 2, 0] = 2 * (x*z - r*y)
R[:, 2, 1] = 2 * (y*z + r*x)
R[:, 2, 2] = 1 - 2 * (x*x + y*y)
return R
def build_scaling_rotation(s, r):
L = torch.zeros((s.shape[0], 3, 3), dtype=torch.float, device="cuda")
R = build_rotation(r)
L[:,0,0] = s[:,0]
L[:,1,1] = s[:,1]
L[:,2,2] = s[:,2]
L = R @ L
return L
def safe_state(silent):
old_f = sys.stdout
class F:
def __init__(self, silent):
self.silent = silent
def write(self, x):
if not self.silent:
if x.endswith("\n"):
old_f.write(x.replace("\n", " [{}]\n".format(str(datetime.now().strftime("%d/%m %H:%M:%S")))))
else:
old_f.write(x)
def flush(self):
old_f.flush()
sys.stdout = F(silent)
def set_random_seed(seed):
r"""Set random seeds for everything, including random, numpy, torch.manual_seed, torch.cuda_manual_seed.
torch.cuda.manual_seed_all is not necessary (included in torch.manual_seed)
Args:
seed (int): Random seed.
"""
print(f"Using random seed {seed}")
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed) # sets seed on the current CPU & all GPUs
torch.cuda.manual_seed(seed) # sets seed on current GPU
# torch.cuda.manual_seed_all(seed) # included in torch.manual_seed
torch.cuda.set_device(torch.device("cuda:0"))
================================================
FILE: tools/graphics_utils.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import torch
import math
import numpy as np
from typing import NamedTuple
class BasicPointCloud(NamedTuple):
points : np.array
colors : np.array
normals : np.array
def geom_transform_points(points, transf_matrix):
P, _ = points.shape
ones = torch.ones(P, 1, dtype=points.dtype, device=points.device)
points_hom = torch.cat([points, ones], dim=1)
points_out = torch.matmul(points_hom, transf_matrix.unsqueeze(0))
denom = points_out[..., 3:] + 0.0000001
return (points_out[..., :3] / denom).squeeze(dim=0)
def getWorld2View(R, t):
Rt = np.zeros((4, 4))
Rt[:3, :3] = R.transpose()
Rt[:3, 3] = t
Rt[3, 3] = 1.0
return np.float32(Rt)
def getWorld2View2(R, t, translate=np.array([.0, .0, .0]), scale=1.0):
Rt = np.zeros((4, 4)) # w2c
Rt[:3, :3] = R.transpose() # w2c
Rt[:3, 3] = t # w2c
Rt[3, 3] = 1.0
C2W = np.linalg.inv(Rt) # c2w
cam_center = C2W[:3, 3]
cam_center = (cam_center + translate) * scale
C2W[:3, 3] = cam_center
Rt = np.linalg.inv(C2W) # w2c
return np.float32(Rt)
def getView2World(R, t):
'''
R: w2c
t: w2c
'''
Rt = np.zeros((4, 4))
Rt[:3, :3] = R.transpose() # c2w
Rt[:3, 3] = -R.transpose() @ t # c2w
Rt[3, 3] = 1.0
return Rt
def getProjectionMatrix(znear, zfar, fovX, fovY):
'''
normalized intrinsics
'''
tanHalfFovY = math.tan((fovY / 2))
tanHalfFovX = math.tan((fovX / 2))
top = tanHalfFovY * znear
bottom = -top
right = tanHalfFovX * znear
left = -right
P = torch.zeros(4, 4)
z_sign = 1.0
P[0, 0] = 2.0 * znear / (right - left)
P[1, 1] = 2.0 * znear / (top - bottom)
P[0, 2] = (right + left) / (right - left)
P[1, 2] = (top + bottom) / (top - bottom)
P[3, 2] = z_sign
P[2, 2] = z_sign * zfar / (zfar - znear)
P[2, 3] = -(zfar * znear) / (zfar - znear)
return P
def getIntrinsic(fovX, fovY, h, w):
focal_length_y = fov2focal(fovY, h)
focal_length_x = fov2focal(fovX, w)
intrinsic = np.eye(3)
intrinsic = torch.eye(3, dtype=torch.float32)
intrinsic[0, 0] = focal_length_x # FovX
intrinsic[1, 1] = focal_length_y # FovY
intrinsic[0, 2] = w / 2
intrinsic[1, 2] = h / 2
return intrinsic
def fov2focal(fov, pixels):
return pixels / (2 * math.tan(fov / 2))
def focal2fov(focal, pixels):
return 2*math.atan(pixels/(2*focal))
def ndc_2_cam(ndc_xyz, intrinsic, W, H):
inv_scale = torch.tensor([[W - 1, H - 1]], device=ndc_xyz.device)
cam_z = ndc_xyz[..., 2:3]
cam_xy = ndc_xyz[..., :2] * inv_scale * cam_z
cam_xyz = torch.cat([cam_xy, cam_z], dim=-1)
cam_xyz = cam_xyz @ torch.inverse(intrinsic[0, ...].t())
return cam_xyz
def depth2point_cam(sampled_depth, ref_intrinsic):
B, N, C, H, W = sampled_depth.shape
valid_z = sampled_depth
valid_x = torch.arange(W, dtype=torch.float32, device=sampled_depth.device).add_(0.5) / (W - 1)
valid_y = torch.arange(H, dtype=torch.float32, device=sampled_depth.device).add_(0.5) / (H - 1)
valid_y, valid_x = torch.meshgrid(valid_y, valid_x, indexing='ij')
# B,N,H,W
valid_x = valid_x[None, None, None, ...].expand(B, N, C, -1, -1)
valid_y = valid_y[None, None, None, ...].expand(B, N, C, -1, -1)
ndc_xyz = torch.stack([valid_x, valid_y, valid_z], dim=-1).view(B, N, C, H, W, 3) # 1, 1, 5, 512, 640, 3
cam_xyz = ndc_2_cam(ndc_xyz, ref_intrinsic, W, H) # 1, 1, 5, 512, 640, 3
return ndc_xyz, cam_xyz
def depth2point(depth_image, intrinsic_matrix, extrinsic_matrix):
_, xyz_cam = depth2point_cam(depth_image[None,None,None,...], intrinsic_matrix[None,...])
xyz_cam = xyz_cam.reshape(-1,3)
xyz_world = torch.cat([xyz_cam, torch.ones_like(xyz_cam[...,0:1])], axis=-1) @ torch.inverse(extrinsic_matrix).transpose(0,1)
xyz_world = xyz_world[...,:3]
return xyz_cam.reshape(*depth_image.shape, 3), xyz_world.reshape(*depth_image.shape, 3)
@torch.no_grad()
def get_all_px_dir(intrinsics, height, width):
"""
# Calculate the view direction for all pixels/rays in the image.
# This is used for intersection calculation between ray and voxel textures.
# """
a, ray_dir = depth2point_cam(torch.ones(1, 1, 1, height, width).cuda(), intrinsics[None])
a, ray_dir = a.squeeze(), ray_dir.squeeze()
ray_dir = torch.nn.functional.normalize(ray_dir, dim=-1)
ray_dir = ray_dir.permute(2, 0, 1) # 3, H, W
return ray_dir
================================================
FILE: tools/image_utils.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import torch
def mse(img1, img2):
return (((img1 - img2)) ** 2).view(img1.shape[0], -1).mean(1, keepdim=True)
def psnr(img1, img2):
mse = (((img1 - img2)) ** 2).view(img1.shape[0], -1).mean(1, keepdim=True)
return 20 * torch.log10(1.0 / torch.sqrt(mse))
================================================
FILE: tools/loss_utils.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
"""
[1] Feature Preserving Point Set Surfaces based on Non-Linear Kernel Regression
Cengiz Oztireli, Gaël Guennebaud, Markus Gross
[2] Consolidation of Unorganized Point Clouds for Surface Reconstruction
Hui Huang, Dan Li, Hao Zhang, Uri Ascher Daniel Cohen-Or
[3] Differentiable Surface Splatting for Point-based Geometry Processing
Wang Yifan, Felice Serena, Shihao Wu, Cengiz Oeztireli, Olga Sorkine-Hornung
[4] 3D Gaussian Splatting for Real-Time Radiance Field Rendering
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, George Drettakis
"""
from typing import Optional
from math import exp
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
def entropy_loss(opacity):
loss = (- opacity * torch.log(opacity + 1e-6) - \
(1 - opacity) * torch.log(1 - opacity + 1e-6)).mean()
return loss
def l1_loss(network_output, gt):
return torch.abs((network_output - gt)).mean()
def log_l1_loss(network_output, gt):
loss = torch.log(1 + torch.abs((network_output - gt))).mean()
return loss
def l2_loss(network_output, gt):
return ((network_output - gt) ** 2).mean()
def gaussian(window_size, sigma):
gauss = torch.Tensor([exp(-(x - window_size // 2) ** 2 / float(2 * sigma ** 2)) for x in range(window_size)])
return gauss / gauss.sum()
def create_window(window_size, channel):
_1D_window = gaussian(window_size, 1.5).unsqueeze(1)
_2D_window = _1D_window.mm(_1D_window.t()).float().unsqueeze(0).unsqueeze(0)
window = Variable(_2D_window.expand(channel, 1, window_size, window_size).contiguous())
return window
def ssim(img1, img2, window_size=11, size_average=True):
channel = img1.size(-3)
window = create_window(window_size, channel)
if img1.is_cuda:
window = window.cuda(img1.get_device())
window = window.type_as(img1)
return _ssim(img1, img2, window, window_size, channel, size_average)
def _ssim(img1, img2, window, window_size, channel, size_average=True):
mu1 = F.conv2d(img1, window, padding=window_size // 2, groups=channel)
mu2 = F.conv2d(img2, window, padding=window_size // 2, groups=channel)
mu1_sq = mu1.pow(2)
mu2_sq = mu2.pow(2)
mu1_mu2 = mu1 * mu2
sigma1_sq = F.conv2d(img1 * img1, window, padding=window_size // 2, groups=channel) - mu1_sq
sigma2_sq = F.conv2d(img2 * img2, window, padding=window_size // 2, groups=channel) - mu2_sq
sigma12 = F.conv2d(img1 * img2, window, padding=window_size // 2, groups=channel) - mu1_mu2
C1 = 0.01 ** 2
C2 = 0.03 ** 2
ssim_map = ((2 * mu1_mu2 + C1) * (2 * sigma12 + C2)) / ((mu1_sq + mu2_sq + C1) * (sigma1_sq + sigma2_sq + C2))
if size_average:
return ssim_map.mean()
else:
return ssim_map.mean(1).mean(1).mean(1)
def eikonal_loss(gradients):
gradient_error = (gradients.norm(dim=-1) - 1.0) ** 2 # [B,R,N]
gradient_error = gradient_error.nan_to_num(nan=0.0, posinf=0.0, neginf=0.0) # [B,R,N]
return gradient_error.mean()
def curvature_loss(hessian):
laplacian = hessian.sum(dim=-1).abs() # [B,R,N]
laplacian = laplacian.nan_to_num(nan=0.0, posinf=0.0, neginf=0.0) # [B,R,N]
return laplacian.mean()
def compute_normal_loss(normal_pred, normal_gt, weight=None):
if weight is not None:
weight = weight.view(-1, 1)
else:
weight = 1.0
normal_pred = normal_pred.view(-1, 3)
normal_gt = normal_gt.view(-1, 3)
cos = (1.0 - torch.sum(normal_pred * normal_gt * weight, dim=-1).abs()).mean()
return cos
def monosdf_normal_loss(normal_pred: torch.Tensor, normal_gt: torch.Tensor, weight: Optional[torch.Tensor] = None):
"""normal consistency loss as monosdf
Args:
normal_pred (torch.Tensor): volume rendered normal
normal_gt (torch.Tensor): monocular normal
"""
if weight is None: weight = 1.0
l1 = (weight * torch.abs(normal_pred - normal_gt).sum(dim=-1)).mean()
cos = (weight * (1.0 - torch.sum(normal_pred * normal_gt, dim=-1))).mean()
return l1 + cos
def cos_weight(render_normal, gt_normal, exp_t=1.0):
cos = torch.sum(render_normal * gt_normal, dim=-1)
if exp_t > 0:
cos = torch.exp((cos - 1) / exp_t)
else:
cos = torch.ones_like(cos)
return cos.detach()
# copy from MiDaS
def compute_scale_and_shift(prediction, target, mask):
# system matrix: A = [[a_00, a_01], [a_10, a_11]]
a_00 = torch.sum(mask * prediction * prediction, (1, 2))
a_01 = torch.sum(mask * prediction, (1, 2))
a_11 = torch.sum(mask, (1, 2))
# right hand side: b = [b_0, b_1]
b_0 = torch.sum(mask * prediction * target, (1, 2))
b_1 = torch.sum(mask * target, (1, 2))
# solution: x = A^-1 . b = [[a_11, -a_01], [-a_10, a_00]] / (a_00 * a_11 - a_01 * a_10) . b
x_0 = torch.zeros_like(b_0)
x_1 = torch.zeros_like(b_1)
det = a_00 * a_11 - a_01 * a_01
valid = det.nonzero()
x_0[valid] = (a_11[valid] * b_0[valid] - a_01[valid] * b_1[valid]) / det[valid]
x_1[valid] = (-a_01[valid] * b_0[valid] + a_00[valid] * b_1[valid]) / det[valid]
return x_0, x_1
def reduction_batch_based(image_loss, M):
# average of all valid pixels of the batch
# avoid division by 0 (if sum(M) = sum(sum(mask)) = 0: sum(image_loss) = 0)
divisor = torch.sum(M)
if divisor == 0:
return 0
else:
return torch.sum(image_loss) / divisor
def reduction_image_based(image_loss, M):
# mean of average of valid pixels of an image
# avoid division by 0 (if M = sum(mask) = 0: image_loss = 0)
valid = M.nonzero()
image_loss[valid] = image_loss[valid] / M[valid]
return torch.mean(image_loss)
def mse_loss(prediction, target, mask, reduction=reduction_batch_based):
M = torch.sum(mask, (1, 2))
res = prediction - target
image_loss = torch.sum(mask * res * res, (1, 2))
return reduction(image_loss, 2 * M)
def gradient_loss(prediction, target, mask, reduction=reduction_batch_based):
M = torch.sum(mask, (1, 2))
diff = prediction - target
diff = torch.mul(mask, diff)
grad_x = torch.abs(diff[:, :, 1:] - diff[:, :, :-1])
mask_x = torch.mul(mask[:, :, 1:], mask[:, :, :-1])
grad_x = torch.mul(mask_x, grad_x)
grad_y = torch.abs(diff[:, 1:, :] - diff[:, :-1, :])
mask_y = torch.mul(mask[:, 1:, :], mask[:, :-1, :])
grad_y = torch.mul(mask_y, grad_y)
image_loss = torch.sum(grad_x, (1, 2)) + torch.sum(grad_y, (1, 2))
return reduction(image_loss, M)
class MSELoss(nn.Module):
def __init__(self, reduction='batch-based'):
super().__init__()
if reduction == 'batch-based':
self.__reduction = reduction_batch_based
else:
self.__reduction = reduction_image_based
def forward(self, prediction, target, mask):
return mse_loss(prediction, target, mask, reduction=self.__reduction)
class GradientLoss(nn.Module):
def __init__(self, scales=4, reduction='batch-based'):
super().__init__()
if reduction == 'batch-based':
self.__reduction = reduction_batch_based
else:
self.__reduction = reduction_image_based
self.__scales = scales
def forward(self, prediction, target, mask):
total = 0
for scale in range(self.__scales):
step = pow(2, scale)
total += gradient_loss(prediction[:, ::step, ::step], target[:, ::step, ::step],
mask[:, ::step, ::step], reduction=self.__reduction)
return total
class ScaleAndShiftInvariantLoss(nn.Module):
def __init__(self, alpha=0.5, scales=1, reduction='batch-based'):
super().__init__()
self.__data_loss = MSELoss(reduction=reduction)
self.__regularization_loss = GradientLoss(scales=scales, reduction=reduction)
self.__alpha = alpha
self.__prediction_ssi = None
def forward(self, prediction, target, mask=None):
target = target * 50 + 0.5
if mask is None: mask = torch.ones_like(target)
scale, shift = compute_scale_and_shift(prediction, target, mask)
self.__prediction_ssi = scale.view(-1, 1, 1) * prediction + shift.view(-1, 1, 1)
total = self.__data_loss(self.__prediction_ssi, target, mask)
if self.__alpha > 0:
total += self.__alpha * self.__regularization_loss(self.__prediction_ssi, target, mask)
return total
def __get_prediction_ssi(self):
return self.__prediction_ssi
prediction_ssi = property(__get_prediction_ssi)
# end copy
def normal2curv(normal, mask = None):
n = normal
m = mask
n = torch.nn.functional.pad(n[None], [0, 0, 1, 1, 1, 1], mode='replicate')
m = torch.nn.functional.pad(m[None].to(torch.float32), [0, 0, 1, 1, 1, 1], mode='replicate').to(torch.bool)
n_c = (n[:, 1:-1, 1:-1, :] ) * m[:, 1:-1, 1:-1, :]
n_u = (n[:, :-2, 1:-1, :] - n_c) * m[:, :-2, 1:-1, :]
n_l = (n[:, 1:-1, :-2, :] - n_c) * m[:, 1:-1, :-2, :]
n_b = (n[:, 2: , 1:-1, :] - n_c) * m[:, 2: , 1:-1, :]
n_r = (n[:, 1:-1, 2: , :] - n_c) * m[:, 1:-1, 2: , :]
curv = (n_u + n_l + n_b + n_r)[0]
curv = curv * mask
curv = curv.norm(1, -1, True)
return curv
def L1_loss_appearance(image, gt_image, gaussians, view_idx, return_transformed_image=False):
appearance_embedding = gaussians.get_apperance_embedding(view_idx)
# center crop the image
origH, origW = image.shape[1:]
H = origH // 32 * 32
W = origW // 32 * 32
left = origW // 2 - W // 2
top = origH // 2 - H // 2
crop_image = image[:, top:top+H, left:left+W]
crop_gt_image = gt_image[:, top:top+H, left:left+W]
# down sample the image
crop_image_down = torch.nn.functional.interpolate(crop_image[None], size=(H//32, W//32), mode="bilinear", align_corners=True)[0]
crop_image_down = torch.cat([crop_image_down, appearance_embedding[None].repeat(H//32, W//32, 1).permute(2, 0, 1)], dim=0)[None]
mapping_image = gaussians.appearance_network(crop_image_down)
transformed_image = mapping_image * crop_image
if not return_transformed_image:
return l1_loss(transformed_image, crop_gt_image)
else:
transformed_image = torch.nn.functional.interpolate(transformed_image, size=(origH, origW), mode="bilinear", align_corners=True)[0]
return transformed_image
================================================
FILE: tools/math_utils.py
================================================
import torch
def eps_sqrt(squared, eps=1e-17):
"""
Prepare for the input for sqrt, make sure the input positive and
larger than eps
"""
return torch.clamp(squared.abs(), eps)
def ndc_to_pix(p, resolution):
"""
Reverse of pytorch3d pix_to_ndc function
Args:
p (float tensor): (..., 3)
resolution (scalar): image resolution (for now, supports only aspectratio = 1)
Returns:
pix (long tensor): (..., 2)
"""
pix = resolution - ((p[..., :2] + 1.0) * resolution - 1.0) / 2
return pix
def decompose_to_R_and_t(transform_mat, row_major=True):
""" decompose a 4x4 transform matrix to R (3,3) and t (1,3)"""
assert(transform_mat.shape[-2:] == (4, 4)), \
"Expecting batches of 4x4 matrice"
# ... 3x3
if not row_major:
transform_mat = transform_mat.transpose(-2, -1)
R = transform_mat[..., :3, :3]
t = transform_mat[..., -1, :3]
return R, t
def to_homogen(x, dim=-1):
""" append one to the specified dimension """
if dim < 0:
dim = x.ndim + dim
shp = x.shape
new_shp = shp[:dim] + (1, ) + shp[dim + 1:]
x_homogen = x.new_ones(new_shp)
x_homogen = torch.cat([x, x_homogen], dim=dim)
return x_homogen
def normalize_pts(pts, trans, scale):
'''
trans: (4, 4), world to
'''
if trans.ndim == 1:
pts = (pts - trans) / scale
else:
pts = ((trans[:3, :3] @ pts.T + trans[:3, 3:]).T) / scale
return pts
def inv_normalize_pts(pts, trans, scale):
if trans.ndim == 1:
pts = pts * scale + trans
else:
pts = (pts * scale[None] - trans[:3, 3:].T) @ trans[:3, :3]
return pts
def get_inside_normalized(xyz, trans, scale):
pts = normalize_pts(xyz, trans, scale)
with torch.no_grad():
inside = torch.all(torch.abs(pts) < 1, dim=-1)
return inside, pts
================================================
FILE: tools/mcube_utils.py
================================================
#
# Copyright (C) 2024, ShanghaiTech
# SVIP research group, https://github.com/svip-lab
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact huangbb@shanghaitech.edu.cn
#
import numpy as np
import torch
import trimesh
from skimage import measure
# modified from here https://github.com/autonomousvision/sdfstudio/blob/370902a10dbef08cb3fe4391bd3ed1e227b5c165/nerfstudio/utils/marching_cubes.py#L201
def marching_cubes_with_contraction(
sdf,
resolution=512,
bounding_box_min=(-1.0, -1.0, -1.0),
bounding_box_max=(1.0, 1.0, 1.0),
return_mesh=False,
level=0,
simplify_mesh=True,
inv_contraction=None,
max_range=32.0,
):
assert resolution % 512 == 0
resN = resolution
cropN = 512
level = 0
N = resN // cropN
grid_min = bounding_box_min
grid_max = bounding_box_max
xs = np.linspace(grid_min[0], grid_max[0], N + 1)
ys = np.linspace(grid_min[1], grid_max[1], N + 1)
zs = np.linspace(grid_min[2], grid_max[2], N + 1)
meshes = []
for i in range(N):
for j in range(N):
for k in range(N):
print(i, j, k)
x_min, x_max = xs[i], xs[i + 1]
y_min, y_max = ys[j], ys[j + 1]
z_min, z_max = zs[k], zs[k + 1]
x = np.linspace(x_min, x_max, cropN)
y = np.linspace(y_min, y_max, cropN)
z = np.linspace(z_min, z_max, cropN)
xx, yy, zz = np.meshgrid(x, y, z, indexing="ij")
points = torch.tensor(np.vstack([xx.ravel(), yy.ravel(), zz.ravel()]).T, dtype=torch.float).cuda()
@torch.no_grad()
def evaluate(points):
z = []
for _, pnts in enumerate(torch.split(points, 256**3, dim=0)):
z.append(sdf(pnts))
z = torch.cat(z, axis=0)
return z
# construct point pyramids
points = points.reshape(cropN, cropN, cropN, 3)
points = points.reshape(-1, 3)
pts_sdf = evaluate(points.contiguous())
z = pts_sdf.detach().cpu().numpy()
if not (np.min(z) > level or np.max(z) < level):
z = z.astype(np.float32)
verts, faces, normals, _ = measure.marching_cubes(
volume=z.reshape(cropN, cropN, cropN),
level=level,
spacing=(
(x_max - x_min) / (cropN - 1),
(y_max - y_min) / (cropN - 1),
(z_max - z_min) / (cropN - 1),
),
)
verts = verts + np.array([x_min, y_min, z_min])
meshcrop = trimesh.Trimesh(verts, faces, normals)
meshes.append(meshcrop)
print("finished one block")
combined = trimesh.util.concatenate(meshes)
combined.merge_vertices(digits_vertex=6)
# inverse contraction and clipping the points range
if inv_contraction is not None:
combined.vertices = inv_contraction(torch.from_numpy(combined.vertices).float().cuda()).cpu().numpy()
combined.vertices = np.clip(combined.vertices, -max_range, max_range)
return combined
================================================
FILE: tools/mesh_utils.py
================================================
import torch
import numpy as np
import os
import math
from tqdm import tqdm
from functools import partial
import open3d as o3d
from tools.render_utils import save_img_f32, save_img_u8
from tools.semantic_id import BACKGROUND
from tools.graphics_utils import depth2point
from tools.math_utils import get_inside_normalized
def post_process_mesh(mesh, cluster_to_keep=1000):
"""
Post-process a mesh to filter out floaters and disconnected parts
"""
import copy
print("post processing the mesh to have {} clusterscluster_to_kep".format(cluster_to_keep))
mesh_0 = copy.deepcopy(mesh)
with o3d.utility.VerbosityContextManager(o3d.utility.VerbosityLevel.Debug) as cm:
triangle_clusters, cluster_n_triangles, cluster_area = (mesh_0.cluster_connected_triangles())
triangle_clusters = np.asarray(triangle_clusters)
cluster_n_triangles = np.asarray(cluster_n_triangles)
cluster_area = np.asarray(cluster_area)
n_cluster = np.sort(cluster_n_triangles.copy())[-cluster_to_keep]
n_cluster = max(n_cluster, 50) # filter meshes smaller than 50
triangles_to_remove = cluster_n_triangles[triangle_clusters] < n_cluster
mesh_0.remove_triangles_by_mask(triangles_to_remove)
mesh_0.remove_unreferenced_vertices()
mesh_0.remove_degenerate_triangles()
print("num vertices raw {}".format(len(mesh.vertices)))
print("num vertices post {}".format(len(mesh_0.vertices)))
return mesh_0
def to_cam_open3d(viewpoint_stack):
camera_traj = []
for i, viewpoint_cam in enumerate(viewpoint_stack):
intrinsic=o3d.camera.PinholeCameraIntrinsic(width=viewpoint_cam.image_width,
height=viewpoint_cam.image_height,
cx = viewpoint_cam.image_width/2,
cy = viewpoint_cam.image_height/2,
fx = viewpoint_cam.image_width / (2 * math.tan(viewpoint_cam.FoVx / 2.)),
fy = viewpoint_cam.image_height / (2 * math.tan(viewpoint_cam.FoVy / 2.)))
extrinsic=np.asarray((viewpoint_cam.world_view_transform.T).cpu().numpy())
camera = o3d.camera.PinholeCameraParameters()
camera.extrinsic = extrinsic
camera.intrinsic = intrinsic
camera_traj.append(camera)
return camera_traj
class GaussianExtractor(object):
def __init__(self, gaussians, render, cfg, bg_color=None, dirs=None, prob_thres=0.2, alpha_thres=0.5):
"""
a class that extracts attributes a scene presented by 2DGS
Usage example:
>>> gaussExtrator = GaussianExtractor(gaussians, render, pipe)
>>> gaussExtrator.reconstruction(view_points)
>>> mesh = gaussExtractor.export_mesh_bounded(...)
"""
if bg_color is None:
bg_color = [0, 0, 0]
if isinstance(bg_color, torch.Tensor): background = bg_color.clone().detach()
else: background = torch.tensor(bg_color, dtype=torch.float32, device="cuda")
self.gaussians = gaussians
self.render = partial(render, cfg=cfg, bg_color=background, dirs=dirs)
self.prob_thres = prob_thres
self.alpha_thres = alpha_thres
self.clean()
@torch.no_grad()
def clean(self):
self.depthmaps = []
self.alphamaps = []
self.rgbmaps = []
self.normals = []
self.depth_normals = []
self.viewpoint_stack = []
@torch.no_grad()
def reconstruction(self, viewpoint_stack):
"""
reconstruct radiance field given cameras
"""
self.clean()
self.viewpoint_stack = viewpoint_stack
for i, viewpoint_cam in tqdm(enumerate(self.viewpoint_stack), desc="reconstruct radiance fields", total=len(self.viewpoint_stack)):
render_pkg = self.render(viewpoint_cam, self.gaussians)
rgb = render_pkg['render']
alpha = render_pkg['alpha']
normal = torch.nn.functional.normalize(render_pkg['normal'], dim=0)
normal = render_pkg['normal'].permute(1, 2, 0)
depth = render_pkg['depth']
if 'render_sem' in render_pkg:
semantic = render_pkg["render_sem"]
prob = self.gaussians.logits2prob(semantic)
mask = (prob[..., BACKGROUND] > self.prob_thres)[None]
depth[mask] = 0
rendered_pcd_world = depth2point(depth[0], viewpoint_cam.intr, viewpoint_cam.world_view_transform.transpose(0, 1))[1]
inside = get_inside_normalized(rendered_pcd_world.view(-1, 3), self.gaussians.trans, self.gaussians.scale)[0]
depth.view(-1)[~inside] = 0
depth_normal = render_pkg['est_normal'].permute(1, 2, 0)
self.rgbmaps.append(rgb.cpu())
self.depthmaps.append(depth.cpu())
self.alphamaps.append(alpha.cpu())
self.normals.append(normal.cpu())
self.depth_normals.append(depth_normal.cpu())
self.rgbmaps = torch.stack(self.rgbmaps, dim=0)
self.depthmaps = torch.stack(self.depthmaps, dim=0)
self.alphamaps = torch.stack(self.alphamaps, dim=0)
self.depth_normals = torch.stack(self.depth_normals, dim=0)
@torch.no_grad()
def extract_mesh_bounded(self, voxel_size=0.004, sdf_trunc=0.02, depth_trunc=3, mask_backgrond=True):
"""
Perform TSDF fusion given a fixed depth range, used in the paper.
voxel_size: the voxel size of the volume
sdf_trunc: truncation value
depth_trunc: maximum depth range, should depended on the scene's scales
mask_backgrond: whether to mask backgroud, only works when the dataset have masks
return o3d.mesh
"""
print("Running tsdf volume integration ...")
print(f'voxel_size: {voxel_size}')
print(f'sdf_trunc: {sdf_trunc}')
print(f'depth_truc: {depth_trunc}')
volume = o3d.pipelines.integration.ScalableTSDFVolume(
voxel_length= voxel_size,
sdf_trunc=sdf_trunc,
color_type=o3d.pipelines.integration.TSDFVolumeColorType.RGB8
)
for i, cam_o3d in tqdm(enumerate(to_cam_open3d(self.viewpoint_stack)), desc="TSDF integration progress", total=len(self.viewpoint_stack)):
rgb = self.rgbmaps[i]
depth = self.depthmaps[i]
# if we have mask provided, use it
if mask_backgrond and (self.viewpoint_stack[i].gt_alpha_mask is not None):
depth[(self.viewpoint_stack[i].gt_alpha_mask < 0.5)] = 0
# make open3d rgbd
rgbd = o3d.geometry.RGBDImage.create_from_color_and_depth(
o3d.geometry.Image(np.asarray(rgb.permute(1,2,0).cpu().numpy() * 255, order="C", dtype=np.uint8)),
o3d.geometry.Image(np.asarray(depth.permute(1,2,0).cpu().numpy(), order="C")),
depth_trunc = depth_trunc, convert_rgb_to_intensity=False,
depth_scale = 1.0
)
volume.integrate(rgbd, intrinsic=cam_o3d.intrinsic, extrinsic=cam_o3d.extrinsic)
mesh = volume.extract_triangle_mesh()
return mesh
@torch.no_grad()
def extract_mesh_unbounded(self, resolution=1024):
"""
Experimental features, extracting meshes from unbounded scenes, not fully test across datasets.
#TODO: support color mesh exporting
sdf_trunc: truncation value
return o3d.mesh
"""
def contract(x):
mag = torch.linalg.norm(x, ord=2, dim=-1)[..., None]
return torch.where(mag < 1, x, (2 - (1 / mag)) * (x / mag))
def uncontract(y):
mag = torch.linalg.norm(y, ord=2, dim=-1)[..., None]
return torch.where(mag < 1, y, (1 / (2-mag) * (y/mag)))
def compute_sdf_perframe(i, points, depthmap, rgbmap, normalmap, viewpoint_cam):
"""
compute per frame sdf
"""
new_points = torch.cat([points, torch.ones_like(points[...,:1])], dim=-1) @ viewpoint_cam.full_proj_transform
z = new_points[..., -1:]
pix_coords = (new_points[..., :2] / new_points[..., -1:])
mask_proj = ((pix_coords > -1. ) & (pix_coords < 1.) & (z > 0)).all(dim=-1)
sampled_depth = torch.nn.functional.grid_sample(depthmap.cuda()[None], pix_coords[None, None], mode='bilinear', padding_mode='border', align_corners=True).reshape(-1, 1)
sampled_rgb = torch.nn.functional.grid_sample(rgbmap.cuda()[None], pix_coords[None, None], mode='bilinear', padding_mode='border', align_corners=True).reshape(3,-1).T
sampled_normal = torch.nn.functional.grid_sample(normalmap.cuda()[None], pix_coords[None, None], mode='bilinear', padding_mode='border', align_corners=True).reshape(3,-1).T
sdf = (sampled_depth-z)
return sdf, sampled_rgb, sampled_normal, mask_proj
def compute_unbounded_tsdf(samples, inv_contraction, voxel_size, return_rgb=False):
"""
Fusion all frames, perform adaptive sdf_funcation on the contract spaces.
"""
if inv_contraction is not None:
samples = inv_contraction(samples)
mask = torch.linalg.norm(samples, dim=-1) > 1
# adaptive sdf_truncation
sdf_trunc = 5 * voxel_size * torch.ones_like(samples[:, 0])
sdf_trunc[mask] *= 1/(2-torch.linalg.norm(samples, dim=-1)[mask].clamp(max=1.9))
else:
sdf_trunc = 5 * voxel_size
tsdfs = torch.ones_like(samples[:,0]) * 1
rgbs = torch.zeros((samples.shape[0], 3)).cuda()
weights = torch.ones_like(samples[:,0])
for i, viewpoint_cam in tqdm(enumerate(self.viewpoint_stack), desc="TSDF integration progress"):
sdf, rgb, normal, mask_proj = compute_sdf_perframe(i, samples,
depthmap = self.depthmaps[i],
rgbmap = self.rgbmaps[i],
normalmap = self.depth_normals[i],
viewpoint_cam=self.viewpoint_stack[i],
)
# volume integration
sdf = sdf.flatten()
mask_proj = mask_proj & (sdf > -sdf_trunc)
sdf = torch.clamp(sdf / sdf_trunc, min=-1.0, max=1.0)[mask_proj]
w = weights[mask_proj]
wp = w + 1
tsdfs[mask_proj] = (tsdfs[mask_proj] * w + sdf) / wp
rgbs[mask_proj] = (rgbs[mask_proj] * w[:,None] + rgb[mask_proj]) / wp[:,None]
# update weight
weights[mask_proj] = wp
if return_rgb:
return tsdfs, rgbs
return tsdfs
from tools.render_utils import focus_point_fn
torch.cuda.empty_cache()
c2ws = np.array([np.linalg.inv(np.asarray((cam.world_view_transform.T).cpu().numpy())) for cam in self.viewpoint_stack])
poses = c2ws[:,:3,:] @ np.diag([1, -1, -1, 1])
center = (focus_point_fn(poses))
radius = np.linalg.norm(c2ws[:,:3,3] - center, axis=-1).min()
center = torch.from_numpy(center).float().cuda()
normalize = lambda x: (x - center) / radius
unnormalize = lambda x: (x * radius) + center
inv_contraction = lambda x: unnormalize(uncontract(x))
N = resolution
voxel_size = (radius * 2 / N)
print(f"Computing sdf gird resolution {N} x {N} x {N}")
print(f"Define the voxel_size as {voxel_size}")
sdf_function = lambda x: compute_unbounded_tsdf(x, inv_contraction, voxel_size)
from tools.mcube_utils import marching_cubes_with_contraction
R = contract(normalize(self.gaussians.get_xyz)).norm(dim=-1).cpu().numpy()
R = np.quantile(R, q=0.95)
R = min(R+0.01, 1.9)
mesh = marching_cubes_with_contraction(
sdf=sdf_function,
bounding_box_min=(-R, -R, -R),
bounding_box_max=(R, R, R),
level=0,
resolution=N,
inv_contraction=inv_contraction,
)
# coloring the mesh
torch.cuda.empty_cache()
mesh = mesh.as_open3d
print("texturing mesh ... ")
_, rgbs = compute_unbounded_tsdf(torch.tensor(np.asarray(mesh.vertices)).float().cuda(), inv_contraction=None, voxel_size=voxel_size, return_rgb=True)
mesh.vertex_colors = o3d.utility.Vector3dVector(rgbs.cpu().numpy())
return mesh
@torch.no_grad()
def export_image(self, path):
render_path = os.path.join(path, "renders")
gts_path = os.path.join(path, "gt")
vis_path = os.path.join(path, "vis")
os.makedirs(render_path, exist_ok=True)
os.makedirs(vis_path, exist_ok=True)
os.makedirs(gts_path, exist_ok=True)
for idx, viewpoint_cam in tqdm(enumerate(self.viewpoint_stack), desc="export images"):
gt = viewpoint_cam.original_image[0:3, :, :]
save_img_u8(gt.permute(1,2,0).cpu().numpy(), os.path.join(gts_path, '{0:05d}'.format(idx) + ".png"))
save_img_u8(self.rgbmaps[idx].permute(1,2,0).cpu().numpy(), os.path.join(render_path, '{0:05d}'.format(idx) + ".png"))
save_img_f32(self.depthmaps[idx][0].cpu().numpy(), os.path.join(vis_path, 'depth_{0:05d}'.format(idx) + ".tiff"))
save_img_u8(self.normals[idx].permute(1,2,0).cpu().numpy() * 0.5 + 0.5, os.path.join(vis_path, 'normal_{0:05d}'.format(idx) + ".png"))
save_img_u8(self.depth_normals[idx].permute(1,2,0).cpu().numpy() * 0.5 + 0.5, os.path.join(vis_path, 'depth_normal_{0:05d}'.format(idx) + ".png"))
================================================
FILE: tools/normal_utils.py
================================================
import torch
import torch.nn.functional as F
from tools.graphics_utils import depth2point_cam
def get_normal_sign(normals, begin=None, end=None, trans=None, mode='origin', vec=None):
if mode == 'origin':
if vec is None:
if begin is None:
# center
if trans is not None:
begin = - trans[:3, :3].T @ trans[:3, 3] \
if trans.ndim != 1 else trans
else:
begin = end.mean(0)
begin[1] += 1
vec = end - begin
cos = (normals * vec).sum(-1, keepdim=True)
return cos
def compute_gradient(img):
dy = torch.gradient(img, dim=0)[0]
dx = torch.gradient(img, dim=1)[0]
return dx, dy
def compute_normals(depth_map, K):
# Assuming depth_map is a PyTorch tensor of shape [H, W]
# K_inv is the inverse of the intrinsic matrix
_, cam_coords = depth2point_cam(depth_map[None, None], K[None])
cam_coords = cam_coords.squeeze(0).squeeze(0).squeeze(0) # [H, W, 3]
dx, dy = compute_gradient(cam_coords)
# Cross product of gradients gives normal
normals = torch.cross(dx, dy, dim=-1)
normals = F.normalize(normals, p=2, dim=-1)
return normals
def compute_edge(image, k=11, thr=0.01):
dx, dy = compute_gradient(image)
edge = torch.sqrt(dx**2 + dy**2)
edge = edge / edge.max()
p = (k - 1) // 2
edge = F.max_pool2d(edge[None], kernel_size=k, stride=1, padding=p)[0]
edge[edge>thr] = 1
return edge
def get_edge_aware_distortion_map(gt_image, distortion_map):
grad_img_left = torch.mean(torch.abs(gt_image[:, 1:-1, 1:-1] - gt_image[:, 1:-1, :-2]), 0)
grad_img_right = torch.mean(torch.abs(gt_image[:, 1:-1, 1:-1] - gt_image[:, 1:-1, 2:]), 0)
grad_img_top = torch.mean(torch.abs(gt_image[:, 1:-1, 1:-1] - gt_image[:, :-2, 1:-1]), 0)
grad_img_bottom = torch.mean(torch.abs(gt_image[:, 1:-1, 1:-1] - gt_image[:, 2:, 1:-1]), 0)
max_grad = torch.max(torch.stack([grad_img_left, grad_img_right, grad_img_top, grad_img_bottom], dim=-1), dim=-1)[0]
# pad
max_grad = torch.exp(-max_grad)
max_grad = torch.nn.functional.pad(max_grad, (1, 1, 1, 1), mode="constant", value=0)
return distortion_map * max_grad
================================================
FILE: tools/prune.py
================================================
import torch
from gaussian_renderer import count_render, visi_acc_render
def calculate_v_imp_score(gaussians, imp_list, v_pow):
"""
:param gaussians: A data structure containing Gaussian components with a get_scaling method.
:param imp_list: The importance scores for each Gaussian component.
:param v_pow: The power to which the volume ratios are raised.
:return: A list of adjusted values (v_list) used for pruning.
"""
# Calculate the volume of each Gaussian component
volume = torch.prod(gaussians.get_scaling, dim=1)
# Determine the kth_percent_largest value
index = int(len(volume) * 0.9)
sorted_volume, _ = torch.sort(volume, descending=True)
kth_percent_largest = sorted_volume[index]
# Calculate v_list
v_list = torch.pow(volume / kth_percent_largest, v_pow)
v_list = v_list * imp_list
return v_list
def prune_list(gaussians, viewpoint_stack, pipe, background):
gaussian_list, imp_list = None, None
viewpoint_cam = viewpoint_stack.pop()
render_pkg = count_render(viewpoint_cam, gaussians, pipe, background)
gaussian_list, imp_list = (
render_pkg["gaussians_count"],
render_pkg["important_score"],
)
for iteration in range(len(viewpoint_stack)):
# Pick a random Camera
# prunning
viewpoint_cam = viewpoint_stack.pop()
render_pkg = count_render(viewpoint_cam, gaussians, pipe, background)
gaussians_count, important_score = (
render_pkg["gaussians_count"].detach(),
render_pkg["important_score"].detach(),
)
gaussian_list += gaussians_count
imp_list += important_score
return gaussian_list, imp_list
v_render = visi_acc_render
def get_visi_list(gaussians, viewpoint_stack, pipe, background):
out = {}
gaussian_list = None
viewpoint_cam = viewpoint_stack.pop()
render_pkg = v_render(viewpoint_cam, gaussians, pipe, background)
gaussian_list = render_pkg["countlist"]
for i in range(len(viewpoint_stack)):
# Pick a random Camera
# prunning
viewpoint_cam = viewpoint_stack.pop()
render_pkg = v_render(viewpoint_cam, gaussians, pipe, background)
gaussians_count = render_pkg["countlist"].detach()
gaussian_list += gaussians_count
visi = gaussian_list > 0
out["visi"] = visi
return out
================================================
FILE: tools/render_utils.py
================================================
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import os
from typing import Tuple
import copy
from PIL import Image
import mediapy as media
from matplotlib import cm
from tqdm import tqdm
import torch
def normalize(x: np.ndarray) -> np.ndarray:
"""Normalization helper function."""
return x / np.linalg.norm(x)
def pad_poses(p: np.ndarray) -> np.ndarray:
"""Pad [..., 3, 4] pose matrices with a homogeneous bottom row [0,0,0,1]."""
bottom = np.broadcast_to([0, 0, 0, 1.], p[..., :1, :4].shape)
return np.concatenate([p[..., :3, :4], bottom], axis=-2)
def unpad_poses(p: np.ndarray) -> np.ndarray:
"""Remove the homogeneous bottom row from [..., 4, 4] pose matrices."""
return p[..., :3, :4]
def recenter_poses(poses: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
"""Recenter poses around the origin."""
cam2world = average_pose(poses)
transform = np.linalg.inv(pad_poses(cam2world))
poses = transform @ pad_poses(poses)
return unpad_poses(poses), transform
def average_pose(poses: np.ndarray) -> np.ndarray:
"""New pose using average position, z-axis, and up vector of input poses."""
position = poses[:, :3, 3].mean(0)
z_axis = poses[:, :3, 2].mean(0)
up = poses[:, :3, 1].mean(0)
cam2world = viewmatrix(z_axis, up, position)
return cam2world
def viewmatrix(lookdir: np.ndarray, up: np.ndarray,
position: np.ndarray) -> np.ndarray:
"""Construct lookat view matrix."""
vec2 = normalize(lookdir)
vec0 = normalize(np.cross(up, vec2))
vec1 = normalize(np.cross(vec2, vec0))
m = np.stack([vec0, vec1, vec2, position], axis=1)
return m
def focus_point_fn(poses: np.ndarray) -> np.ndarray:
"""Calculate nearest point to all focal axes in poses."""
directions, origins = poses[:, :3, 2:3], poses[:, :3, 3:4]
m = np.eye(3) - directions * np.transpose(directions, [0, 2, 1])
mt_m = np.transpose(m, [0, 2, 1]) @ m
focus_pt = np.linalg.inv(mt_m.mean(0)) @ (mt_m @ origins).mean(0)[:, 0]
return focus_pt
def transform_poses_pca(poses: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
"""Transforms poses so principal components lie on XYZ axes.
Args:
poses: a (N, 3, 4) array containing the cameras' camera to world transforms.
Returns:
A tuple (poses, transform), with the transformed poses and the applied
camera_to_world transforms.
"""
t = poses[:, :3, 3]
t_mean = t.mean(axis=0)
t = t - t_mean
eigval, eigvec = np.linalg.eig(t.T @ t)
# Sort eigenvectors in order of largest to smallest eigenvalue.
inds = np.argsort(eigval)[::-1]
eigvec = eigvec[:, inds]
rot = eigvec.T
if np.linalg.det(rot) < 0:
rot = np.diag(np.array([1, 1, -1])) @ rot
transform = np.concatenate([rot, rot @ -t_mean[:, None]], -1)
poses_recentered = unpad_poses(transform @ pad_poses(poses))
transform = np.concatenate([transform, np.eye(4)[3:]], axis=0)
# Flip coordinate system if z component of y-axis is negative
if poses_recentered.mean(axis=0)[2, 1] < 0:
poses_recentered = np.diag(np.array([1, -1, -1])) @ poses_recentered
transform = np.diag(np.array([1, -1, -1, 1])) @ transform
return poses_recentered, transform
def generate_ellipse_path(poses: np.ndarray,
n_frames: int = 120,
const_speed: bool = True,
z_variation: float = 0.,
z_phase: float = 0.) -> np.ndarray:
"""Generate an elliptical render path based on the given poses."""
# Calculate the focal point for the path (cameras point toward this).
center = focus_point_fn(poses)
# Path height sits at z=0 (in middle of zero-mean capture pattern).
offset = np.array([center[0], center[1], 0])
# Calculate scaling for ellipse axes based on input camera positions.
sc = np.percentile(np.abs(poses[:, :3, 3] - offset), 90, axis=0)
# Use ellipse that is symmetric about the focal point in xy.
low = -sc + offset
high = sc + offset
# Optional height variation need not be symmetric
z_low = np.percentile((poses[:, :3, 3]), 10, axis=0)
z_high = np.percentile((poses[:, :3, 3]), 90, axis=0)
def get_positions(theta):
# Interpolate between bounds with trig functions to get ellipse in x-y.
# Optionally also interpolate in z to change camera height along path.
return np.stack([
low[0] + (high - low)[0] * (np.cos(theta) * .5 + .5),
low[1] + (high - low)[1] * (np.sin(theta) * .5 + .5),
z_variation * (z_low[2] + (z_high - z_low)[2] *
(np.cos(theta + 2 * np.pi * z_phase) * .5 + .5)),
], -1)
theta = np.linspace(0, 2. * np.pi, n_frames + 1, endpoint=True)
positions = get_positions(theta)
# Throw away duplicated last position.
positions = positions[:-1]
# Set path's up vector to axis closest to average of input pose up vectors.
avg_up = poses[:, :3, 1].mean(0)
avg_up = avg_up / np.linalg.norm(avg_up)
ind_up = np.argmax(np.abs(avg_up))
up = np.eye(3)[ind_up] * np.sign(avg_up[ind_up])
return np.stack([viewmatrix(p - center, up, p) for p in positions])
def generate_path(viewpoint_cameras, n_frames=480):
c2ws = np.array([np.linalg.inv(np.asarray((cam.world_view_transform.T).cpu().numpy())) for cam in viewpoint_cameras])
pose = c2ws[:,:3,:] @ np.diag([1, -1, -1, 1])
pose_recenter, colmap_to_world_transform = transform_poses_pca(pose)
# generate new poses
new_poses = generate_ellipse_path(poses=pose_recenter, n_frames=n_frames)
# warp back to orignal scale
new_poses = np.linalg.inv(colmap_to_world_transform) @ pad_poses(new_poses)
traj = []
for c2w in new_poses:
c2w = c2w @ np.diag([1, -1, -1, 1])
cam = copy.deepcopy(viewpoint_cameras[0])
cam.image_height = int(cam.image_height / 2) * 2
cam.image_width = int(cam.image_width / 2) * 2
cam.world_view_transform = torch.from_numpy(np.linalg.inv(c2w).T).float().cuda()
cam.full_proj_transform = (cam.world_view_transform.unsqueeze(0).bmm(cam.projection_matrix.unsqueeze(0))).squeeze(0)
cam.camera_center = cam.world_view_transform.inverse()[3, :3]
traj.append(cam)
return traj
def load_img(pth: str) -> np.ndarray:
"""Load an image and cast to float32."""
with open(pth, 'rb') as f:
image = np.array(Image.open(f), dtype=np.float32)
return image
def create_videos(base_dir, input_dir, out_name, num_frames=480):
"""Creates videos out of the images saved to disk."""
# Last two parts of checkpoint path are experiment name and scene name.
video_prefix = f'{out_name}'
zpad = max(5, len(str(num_frames - 1)))
idx_to_str = lambda idx: str(idx).zfill(zpad)
os.makedirs(base_dir, exist_ok=True)
render_dist_curve_fn = np.log
# Load one example frame to get image shape and depth range.
depth_file = os.path.join(input_dir, 'vis', f'depth_{idx_to_str(0)}.tiff')
depth_frame = load_img(depth_file)
shape = depth_frame.shape
p = 3
distance_limits = np.percentile(depth_frame.flatten(), [p, 100 - p])
lo, hi = [render_dist_curve_fn(x) for x in distance_limits]
print(f'Video shape is {shape[:2]}')
video_kwargs = {
'shape': shape[:2],
'codec': 'h264',
'fps': 60,
'crf': 18,
}
for k in ['depth', 'normal', 'color']:
video_file = os.path.join(base_dir, f'{video_prefix}_{k}.mp4')
input_format = 'gray' if k == 'alpha' else 'rgb'
file_ext = 'png' if k in ['color', 'normal'] else 'tiff'
idx = 0
if k == 'color':
file0 = os.path.join(input_dir, 'renders', f'{idx_to_str(0)}.{file_ext}')
else:
file0 = os.path.join(input_dir, 'vis', f'{k}_{idx_to_str(0)}.{file_ext}')
if not os.path.exists(file0):
print(f'Images missing for tag {k}')
continue
print(f'Making video {video_file}...')
with media.VideoWriter(
video_file, **video_kwargs, input_format=input_format) as writer:
for idx in tqdm(range(num_frames)):
if k == 'color':
img_file = os.path.join(input_dir, 'renders', f'{idx_to_str(idx)}.{file_ext}')
else:
img_file = os.path.join(input_dir, 'vis', f'{k}_{idx_to_str(idx)}.{file_ext}')
if not os.path.exists(img_file):
ValueError(f'Image file {img_file} does not exist.')
img = load_img(img_file)
if k in ['color', 'normal']:
img = img / 255.
elif k.startswith('depth'):
img = render_dist_curve_fn(img)
img = np.clip((img - np.minimum(lo, hi)) / np.abs(hi - lo), 0, 1)
img = cm.get_cmap('turbo')(img)[..., :3]
frame = (np.clip(np.nan_to_num(img), 0., 1.) * 255.).astype(np.uint8)
writer.add_image(frame)
idx += 1
def save_img_u8(img, pth):
"""Save an image (probably RGB) in [0, 1] to disk as a uint8 PNG."""
with open(pth, 'wb') as f:
Image.fromarray(
(np.clip(np.nan_to_num(img), 0., 1.) * 255.).astype(np.uint8)).save(
f, 'PNG')
def save_img_f32(depthmap, pth):
"""Save an image (probably a depthmap) to disk as a float32 TIFF."""
with open(pth, 'wb') as f:
Image.fromarray(np.nan_to_num(depthmap).astype(np.float32)).save(f, 'TIFF')
================================================
FILE: tools/semantic_id.py
================================================
BACKGROUND = 0
text_label_dict = {
'window': BACKGROUND,
'sky': BACKGROUND,
'sky window': BACKGROUND,
'window sky': BACKGROUND,
'floor': 2,
}
================================================
FILE: tools/sh_utils.py
================================================
# Copyright 2021 The PlenOctree Authors.
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
import torch
C0 = 0.28209479177387814
C1 = 0.4886025119029199
C2 = [
1.0925484305920792,
-1.0925484305920792,
0.31539156525252005,
-1.0925484305920792,
0.5462742152960396
]
C3 = [
-0.5900435899266435,
2.890611442640554,
-0.4570457994644658,
0.3731763325901154,
-0.4570457994644658,
1.445305721320277,
-0.5900435899266435
]
C4 = [
2.5033429417967046,
-1.7701307697799304,
0.9461746957575601,
-0.6690465435572892,
0.10578554691520431,
-0.6690465435572892,
0.47308734787878004,
-1.7701307697799304,
0.6258357354491761,
]
def eval_sh(deg, sh, dirs):
"""
Evaluate spherical harmonics at unit directions
using hardcoded SH polynomials.
Works with torch/np/jnp.
... Can be 0 or more batch dimensions.
Args:
deg: int SH deg. Currently, 0-3 supported
sh: jnp.ndarray SH coeffs [..., C, (deg + 1) ** 2]
dirs: jnp.ndarray unit directions [..., 3]
Returns:
[..., C]
"""
assert deg <= 4 and deg >= 0
coeff = (deg + 1) ** 2
assert sh.shape[-1] >= coeff
result = C0 * sh[..., 0]
if deg > 0:
x, y, z = dirs[..., 0:1], dirs[..., 1:2], dirs[..., 2:3]
result = (result -
C1 * y * sh[..., 1] +
C1 * z * sh[..., 2] -
C1 * x * sh[..., 3])
if deg > 1:
xx, yy, zz = x * x, y * y, z * z
xy, yz, xz = x * y, y * z, x * z
result = (result +
C2[0] * xy * sh[..., 4] +
C2[1] * yz * sh[..., 5] +
C2[2] * (2.0 * zz - xx - yy) * sh[..., 6] +
C2[3] * xz * sh[..., 7] +
C2[4] * (xx - yy) * sh[..., 8])
if deg > 2:
result = (result +
C3[0] * y * (3 * xx - yy) * sh[..., 9] +
C3[1] * xy * z * sh[..., 10] +
C3[2] * y * (4 * zz - xx - yy)* sh[..., 11] +
C3[3] * z * (2 * zz - 3 * xx - 3 * yy) * sh[..., 12] +
C3[4] * x * (4 * zz - xx - yy) * sh[..., 13] +
C3[5] * z * (xx - yy) * sh[..., 14] +
C3[6] * x * (xx - 3 * yy) * sh[..., 15])
if deg > 3:
result = (result + C4[0] * xy * (xx - yy) * sh[..., 16] +
C4[1] * yz * (3 * xx - yy) * sh[..., 17] +
C4[2] * xy * (7 * zz - 1) * sh[..., 18] +
C4[3] * yz * (7 * zz - 3) * sh[..., 19] +
C4[4] * (zz * (35 * zz - 30) + 3) * sh[..., 20] +
C4[5] * xz * (7 * zz - 3) * sh[..., 21] +
C4[6] * (xx - yy) * (7 * zz - 1) * sh[..., 22] +
C4[7] * xz * (xx - 3 * yy) * sh[..., 23] +
C4[8] * (xx * (xx - 3 * yy) - yy * (3 * xx - yy)) * sh[..., 24])
return result
def RGB2SH(rgb):
return (rgb - 0.5) / C0
def SH2RGB(sh):
return sh * C0 + 0.5
================================================
FILE: tools/system_utils.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
from errno import EEXIST
from os import makedirs, path
import os
def mkdir_p(folder_path):
# Creates a directory. equivalent to using mkdir -p on the command line
try:
makedirs(folder_path)
except OSError as exc: # Python >2.5
if exc.errno == EEXIST and path.isdir(folder_path):
pass
else:
raise
def searchForMaxIteration(folder):
saved_iters = [int(fname.split("_")[-1]) for fname in os.listdir(folder)]
return max(saved_iters)
================================================
FILE: tools/termcolor.py
================================================
'''
-----------------------------------------------------------------------------
Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
NVIDIA CORPORATION and its licensors retain all intellectual property
and proprietary rights in and to this software, related documentation
and any modifications thereto. Any use, reproduction, disclosure or
distribution of this software and related documentation without an express
license agreement from NVIDIA CORPORATION is strictly prohibited.
-----------------------------------------------------------------------------
'''
import pprint
import termcolor
def red(x): return termcolor.colored(str(x), color="red")
def green(x): return termcolor.colored(str(x), color="green")
def blue(x): return termcolor.colored(str(x), color="blue")
def cyan(x): return termcolor.colored(str(x), color="cyan")
def yellow(x): return termcolor.colored(str(x), color="yellow")
def magenta(x): return termcolor.colored(str(x), color="magenta")
def grey(x): return termcolor.colored(str(x), color="grey")
COLORS = {
'red': red, 'green': green, 'blue': blue, 'cyan': cyan, 'yellow': yellow, 'magenta': magenta, 'grey': grey
}
def PP(x):
string = pprint.pformat(x, indent=2)
if isinstance(x, dict):
string = '{\n ' + string[1:-1] + '\n}'
return string
def alert(x, color='red'):
color = COLORS[color]
print(color('-' * 32))
print(color(f'* {x}'))
print(color('-' * 32))
================================================
FILE: tools/visualization.py
================================================
import wandb
import imageio
import torch
import torchvision
from matplotlib import pyplot as plt
from torchvision.transforms import functional as torchvision_F
PALETTE = [
(0, 0, 0),
(174, 199, 232), (152, 223, 138), (31, 119, 180), (255, 187, 120), (188, 189, 34),
(140, 86, 75), (255, 152, 150), (214, 39, 40), (197, 176, 213), (148, 103, 189),
(196, 156, 148), (23, 190, 207), (247, 182, 210), (219, 219, 141), (255, 127, 14),
(158, 218, 229), (44, 160, 44), (112, 128, 144), (227, 119, 194), (82, 84, 163),
]
PALETTE = torch.tensor(PALETTE, dtype=torch.uint8)
def wandb_image(images, from_range=(0, 1)):
images = preprocess_image(images, from_range=from_range)
wandb_image = wandb.Image(images)
return wandb_image
def preprocess_image(images, from_range=(0, 1), cmap="viridis"):
min, max = from_range
images = (images - min) / (max - min)
images = images.detach().cpu().float().clamp_(min=0, max=1)
if images.shape[0] == 1:
images = get_heatmap(images, cmap=cmap)
images = tensor2pil(images)
return images
def wandb_sem(image, palette=PALETTE):
image = image.detach().long().cpu()
image = PALETTE[image].float().permute(2, 0, 1)[None]
image = tensor2pil(image)
wandb_image = wandb.Image(image)
return wandb_image
def tensor2pil(images):
image_grid = torchvision.utils.make_grid(images, nrow=1, pad_value=1)
image_grid = torchvision_F.to_pil_image(image_grid)
return image_grid
def get_heatmap(gray, cmap): # [N,H,W]
color = plt.get_cmap(cmap)(gray.numpy())
color = torch.from_numpy(color[..., :3]).permute(0, 3, 1, 2).float() # [N,3,H,W]
return color
def save_render(render, path):
image = torch.clamp(render, 0.0, 1.0).detach().cpu()
image = (image.permute(1, 2, 0).numpy() * 255).astype('uint8') # [..., ::-1]
imageio.imsave(path, image)
================================================
FILE: tools/visualize.py
================================================
'''
-----------------------------------------------------------------------------
Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
NVIDIA CORPORATION and its licensors retain all intellectual property
and proprietary rights in and to this software, related documentation
and any modifications thereto. Any use, reproduction, disclosure or
distribution of this software and related documentation without an express
license agreement from NVIDIA CORPORATION is strictly prohibited.
-----------------------------------------------------------------------------
'''
import numpy as np
import torch
import matplotlib.pyplot as plt
import plotly.graph_objs as go
import k3d
from tools import camera
def get_camera_mesh(pose, depth=1):
vertices = torch.tensor([[-0.5, -0.5, 1],
[0.5, -0.5, 1],
[0.5, 0.5, 1],
[-0.5, 0.5, 1],
[0, 0, 0]]) * depth # [6,3]
faces = torch.tensor([[0, 1, 2],
[0, 2, 3],
[0, 1, 4],
[1, 2, 4],
[2, 3, 4],
[3, 0, 4]]) # [6,3]
vertices = camera.cam2world(vertices[None], pose) # [N,6,3]
wireframe = vertices[:, [0, 1, 2, 3, 0, 4, 1, 2, 4, 3]] # [N,10,3]
return vertices, faces, wireframe
def merge_meshes(vertices, faces):
mesh_N, vertex_N = vertices.shape[:2]
faces_merged = torch.cat([faces + i * vertex_N for i in range(mesh_N)], dim=0)
vertices_merged = vertices.view(-1, vertices.shape[-1])
return vertices_merged, faces_merged
def merge_wireframes_k3d(wireframe):
wf_first, wf_last, wf_dummy = wireframe[:, :1], wireframe[:, -1:], wireframe[:, :1] * np.nan
wireframe_merged = torch.cat([wf_first, wireframe, wf_last, wf_dummy], dim=1)
return wireframe_merged
def merge_wireframes_plotly(wireframe):
wf_dummy = wireframe[:, :1] * np.nan
wireframe_merged = torch.cat([wireframe, wf_dummy], dim=1).view(-1, 3)
return wireframe_merged
def get_xyz_indicators(pose, length=0.1):
xyz = torch.eye(4, 3)[None] * length
xyz = camera.cam2world(xyz, pose)
return xyz
def merge_xyz_indicators_k3d(xyz): # [N,4,3]
xyz = xyz[:, [[-1, 0], [-1, 1], [-1, 2]]] # [N,3,2,3]
xyz_0, xyz_1 = xyz.unbind(dim=2) # [N,3,3]
xyz_dummy = xyz_0 * np.nan
xyz_merged = torch.stack([xyz_0, xyz_0, xyz_1, xyz_1, xyz_dummy], dim=2) # [N,3,5,3]
return xyz_merged
def merge_xyz_indicators_plotly(xyz): # [N,4,3]
xyz = xyz[:, [[-1, 0], [-1, 1], [-1, 2]]] # [N,3,2,3]
xyz_0, xyz_1 = xyz.unbind(dim=2) # [N,3,3]
xyz_dummy = xyz_0 * np.nan
xyz_merged = torch.stack([xyz_0, xyz_1, xyz_dummy], dim=2) # [N,3,3,3]
xyz_merged = xyz_merged.view(-1, 3)
return xyz_merged
def k3d_visualize_pose(poses, vis_depth=0.5, xyz_length=0.1, center_size=0.1, xyz_width=0.02, mesh_opacity=0.05):
# poses has shape [N,3,4] potentially in sequential order
N = len(poses)
centers_cam = torch.zeros(N, 1, 3)
centers_world = camera.cam2world(centers_cam, poses)
centers_world = centers_world[:, 0]
# Get the camera wireframes.
vertices, faces, wireframe = get_camera_mesh(poses, depth=vis_depth)
xyz = get_xyz_indicators(poses, length=xyz_length)
vertices_merged, faces_merged = merge_meshes(vertices, faces)
wireframe_merged = merge_wireframes_k3d(wireframe)
xyz_merged = merge_xyz_indicators_k3d(xyz)
# Set the color map for the camera trajectory and the xyz indicators.
color_map = plt.get_cmap("gist_rainbow")
center_color = []
vertices_merged_color = []
wireframe_color = []
xyz_color = []
x_hex, y_hex, z_hex = int(255) << 16, int(255) << 8, int(255)
for i in range(N):
# Set the camera pose colors (with a smooth gradient color map).
r, g, b, _ = color_map(i / (N - 1))
r, g, b = r * 0.8, g * 0.8, b * 0.8
pose_rgb_hex = (int(r * 255) << 16) + (int(g * 255) << 8) + int(b * 255)
center_color += [pose_rgb_hex]
vertices_merged_color += [pose_rgb_hex] * 5
wireframe_color += [pose_rgb_hex] * 13
# Set the xyz indicator colors.
xyz_color += [x_hex] * 5 + [y_hex] * 5 + [z_hex] * 5
# Plot in K3D.
k3d_objects = [
k3d.points(centers_world, colors=center_color, point_size=center_size, shader="3d"),
k3d.mesh(vertices_merged, faces_merged, colors=vertices_merged_color, side="double", opacity=mesh_opacity),
k3d.line(wireframe_merged, colors=wireframe_color, shader="simple"),
k3d.line(xyz_merged, colors=xyz_color, shader="thick", width=xyz_width),
]
return k3d_objects
def plotly_visualize_pose(poses, vis_depth=0.5, xyz_length=0.5, center_size=2, xyz_width=5, mesh_opacity=0.05):
# poses has shape [N,3,4] potentially in sequential order
N = len(poses)
centers_cam = torch.zeros(N, 1, 3)
centers_world = camera.cam2world(centers_cam, poses)
centers_world = centers_world[:, 0]
# Get the camera wireframes.
vertices, faces, wireframe = get_camera_mesh(poses, depth=vis_depth)
xyz = get_xyz_indicators(poses, length=xyz_length)
vertices_merged, faces_merged = merge_meshes(vertices, faces)
wireframe_merged = merge_wireframes_plotly(wireframe)
xyz_merged = merge_xyz_indicators_plotly(xyz)
# Break up (x,y,z) coordinates.
wireframe_x, wireframe_y, wireframe_z = wireframe_merged.unbind(dim=-1)
xyz_x, xyz_y, xyz_z = xyz_merged.unbind(dim=-1)
centers_x, centers_y, centers_z = centers_world.unbind(dim=-1)
vertices_x, vertices_y, vertices_z = vertices_merged.unbind(dim=-1)
# Set the color map for the camera trajectory and the xyz indicators.
color_map = plt.get_cmap("gist_rainbow")
center_color = []
faces_merged_color = []
wireframe_color = []
xyz_color = []
x_color, y_color, z_color = *np.eye(3).T,
for i in range(N):
# Set the camera pose colors (with a smooth gradient color map).
r, g, b, _ = color_map(i / (N - 1))
rgb = np.array([r, g, b]) * 0.8
wireframe_color += [rgb] * 11
center_color += [rgb]
faces_merged_color += [rgb] * 6
xyz_color += [x_color] * 3 + [y_color] * 3 + [z_color] * 3
# Plot in plotly.
plotly_traces = [
go.Scatter3d(x=wireframe_x, y=wireframe_y, z=wireframe_z, mode="lines",
line=dict(color=wireframe_color, width=1)),
go.Scatter3d(x=xyz_x, y=xyz_y, z=xyz_z, mode="lines", line=dict(color=xyz_color, width=xyz_width)),
go.Scatter3d(x=centers_x, y=centers_y, z=centers_z, mode="markers",
marker=dict(color=center_color, size=center_size, opacity=1)),
go.Mesh3d(x=vertices_x, y=vertices_y, z=vertices_z,
i=[f[0] for f in faces_merged], j=[f[1] for f in faces_merged], k=[f[2] for f in faces_merged],
facecolor=faces_merged_color, opacity=mesh_opacity),
]
return plotly_traces
================================================
FILE: train.py
================================================
import os
import sys
import argparse
sys.path.append(os.getcwd())
from configs.config import Config, recursive_update_strict, parse_cmdline_arguments
from trainer import Trainer
def parse_args():
parser = argparse.ArgumentParser(description='Training')
parser.add_argument('--config', help='Path to the training config file.', required=True)
parser.add_argument('--wandb', action='store_true', help="Enable using Weights & Biases as the logger")
parser.add_argument('--wandb_name', default='default', type=str)
args, cfg_cmd = parser.parse_known_args()
return args, cfg_cmd
def main():
args, cfg_cmd = parse_args()
cfg = Config(args.config)
cfg_cmd = parse_cmdline_arguments(cfg_cmd)
recursive_update_strict(cfg, cfg_cmd)
trainer = Trainer(cfg)
cfg.save_config(cfg.logdir)
trainer.init_wandb(cfg,
project=args.wandb_name,
mode="disabled" if cfg.train.debug_from > -1 or not args.wandb else "online",
use_group=True)
trainer.train()
trainer.finalize()
return
if __name__ == "__main__":
main()
================================================
FILE: trainer.py
================================================
import os
import json
import uuid
import math
import wandb
import imageio
import numpy as np
from torch import nn
from tqdm import tqdm
import math
from random import randint
import torch.nn.functional as F
from argparse import Namespace
from pytorch3d.ops import knn_points
from torchmetrics import JaccardIndex
import torch
import matplotlib.pyplot as plt
from copy import deepcopy
from tools.loss_utils import l1_loss, ssim, cos_weight, entropy_loss, monosdf_normal_loss, ScaleAndShiftInvariantLoss
from gaussian_renderer import render, network_gui
from scene import Scene, GaussianModel
from tools.image_utils import psnr
from configs.config import Config
from tools.visualization import wandb_image, preprocess_image, wandb_sem
from tools.prune import prune_list, calculate_v_imp_score, get_visi_list
from tools.loss_utils import compute_normal_loss, L1_loss_appearance, normal2curv
from tools.camera_utils import bb_camera
from tools.general_utils import safe_state, set_random_seed
from scene.cameras import SampleCam
from tools.normal_utils import get_normal_sign, get_edge_aware_distortion_map
# from process_data.extract_mask import text_label_dict
try:
from torch.utils.tensorboard import SummaryWriter
TENSORBOARD_FOUND = True
except ImportError:
TENSORBOARD_FOUND = False
class Trainer(object):
def __init__(self, cfg):
self.cfg = cfg
set_random_seed(cfg.seed)
cfg.model.model_path = cfg.logdir
self.sphere = getattr(cfg.model, 'sphere', False)
cfg.model.load_normal = cfg.optim.loss_weight.mono_normal > 0 \
or cfg.optim.loss_weight.depth_normal > 0
cfg.model.load_depth = cfg.optim.loss_weight.mono_depth > 0
self.enable_semantic = getattr(cfg.optim.loss_weight, 'semantic', 0) > 0
cfg.model.enable_semantic = self.enable_semantic
cfg.model.load_mask = self.enable_semantic or cfg.model.load_mask
cfg.print_config()
safe_state(cfg.silent)
self.setup_model(cfg.model)
self.setup_dataset(cfg.model)
self.setup_optimizer(cfg.optim)
self.init_attributes()
self.init_losses()
# Start GUI server, configure and run training
if cfg.port > 0:
network_gui.init(cfg.ip, cfg.port)
torch.autograd.set_detect_anomaly(cfg.detect_anomaly)
def setup_model(self, cfg):
self.model = GaussianModel(cfg)
def setup_dataset(self, cfg):
os.makedirs(cfg.model_path, exist_ok = True)
self.scene = Scene(cfg, self.model)
self.model.trans = torch.from_numpy(self.scene.trans).cuda()
self.model.scale = torch.from_numpy(self.scene.scale).cuda()
self.model.extent = self.scene.cameras_extent
def init_writer(self, cfg):
if not cfg.model.model_path:
if os.getenv('OAR_JOB_ID'):
unique_str=os.getenv('OAR_JOB_ID')
else:
unique_str = str(uuid.uuid4())
cfg.model.model_path = os.path.join("./output/", unique_str[0:10])
# Set up output folder
print("Output folder: {}".format(cfg.model.model_path))
os.makedirs(cfg.model.model_path, exist_ok = True)
with open(os.path.join(cfg.model.model_path, "cfg_args"), 'w') as cfg_log_f:
cfg_log_f.write(str(Namespace(**vars(cfg))))
# Create Tensorboard writer
if TENSORBOARD_FOUND:
self.writer = SummaryWriter(cfg.model.model_path)
else:
print("Tensorboard not available: not logging progress")
def init_wandb(self, cfg, wandb_id=None, project="", run_name=None, mode="online", resume="allow", use_group=False):
r"""Initialize Weights & Biases (wandb) logger.
Args:
cfg (obj): Global configuration.
wandb_id (str): A unique ID for this run, used for resuming.
project (str): The name of the project where you're sending the new run.
If the project is not specified, the run is put in an "Uncategorized" project.
run_name (str): name for each wandb run (useful for logging changes)
mode (str): online/offline/disabled
"""
print('Initialize wandb')
if not wandb_id:
wandb_path = os.path.join(cfg.logdir, "wandb_id.txt")
if os.path.exists(wandb_path):
with open(wandb_path, "r") as f:
wandb_id = f.read()
else:
wandb_id = wandb.util.generate_id()
with open(wandb_path, "w") as f:
f.write(wandb_id)
if use_group:
group, name = cfg.logdir.split("/")[-2:]
else:
group, name = None, os.path.basename(cfg.logdir)
if run_name is not None:
name = run_name
wandb.init(id=wandb_id,
project=project,
config=cfg,
group=group,
name=name,
dir=cfg.logdir,
resume=resume,
settings=wandb.Settings(start_method="fork"),
mode=mode)
wandb.config.update({'dataset': cfg.data.name})
def init_losses(self):
r"""Initialize loss functions. All loss names have weights. Some have criterion modules."""
self.losses = dict()
self.weights = {key: value for key, value in self.cfg.optim.loss_weight.items() if value}
if 'mono_depth' in self.weights:
self.depth_loss = ScaleAndShiftInvariantLoss(alpha=0.5, scales=1)
def setup_optimizer(self, cfg):
self.model.training_setup(cfg)
def init_attributes(self):
self.iter_start = torch.cuda.Event(enable_timing = True)
self.iter_end = torch.cuda.Event(enable_timing = True)
self.viewpoint_stack = None
self.ema_loss_for_log = 0.0
self.current_iteration = 0
self.max_iters = self.cfg.optim.iterations
self.saving_iterations = self.cfg.train.save_iterations
self.testing_iterations = self.cfg.train.test_iterations
self.checkpoint_iterations = self.cfg.train.checkpoint_iterations
self.debug_from = self.cfg.train.debug_from
self.checkpoint = self.cfg.train.start_checkpoint
self.star_ft_iter = None
self.visi_list = None
self.first_iter = 0
if self.checkpoint:
(model_params, self.first_iter) = torch.load(self.checkpoint)
self.model.restore(model_params, self.cfg.optim)
bg_color = [1, 1, 1] if self.cfg.model.white_background else [0, 0, 0]
self.background = torch.tensor(bg_color, dtype=torch.float32, device="cuda")
self.writer = None
with open(os.path.join(self.cfg.model.model_path, "cfg_args"), 'w') as cfg_log_f:
cfg_log_f.write(str(Namespace(**vars(self.cfg))))
self.vis_path = os.path.join(self.cfg.logdir, "vis")
self.vis_color_path = os.path.join(self.vis_path, "color")
self.vis_depth_path = os.path.join(self.vis_path, "depth")
self.vis_normal_path = os.path.join(self.vis_path, "normal")
self.vis_dnormal_path = os.path.join(self.vis_path, "dnormal")
self.vis_cos_path = os.path.join(self.vis_path, "cos")
for mode in ['train', 'test']:
os.makedirs(os.path.join(self.vis_color_path, mode), exist_ok=True)
os.makedirs(os.path.join(self.vis_depth_path, mode), exist_ok=True)
os.makedirs(os.path.join(self.vis_normal_path, mode), exist_ok=True)
os.makedirs(os.path.join(self.vis_dnormal_path, mode), exist_ok=True)
os.makedirs(os.path.join(self.vis_cos_path, mode), exist_ok=True)
if self.enable_semantic:
self.calc_miou = JaccardIndex(num_classes=self.model.num_cls, task='multiclass').cuda()
def train(self):
progress_bar = tqdm(range(self.first_iter, self.max_iters), desc="Training progress")
self.current_iteration += self.first_iter
self.first_iter += 1
for iteration in range(self.first_iter, self.max_iters + 1):
self.current_iteration += 1
self.start_of_iteration()
output = self.train_step(mode='train')
self.end_of_iteration(output, render, progress_bar)
def get_center_scale(self):
meta_fname = f"{self.cfg.model.source_path}/meta.json"
with open(meta_fname) as file:
meta = json.load(file)
# center scene
trans = np.array(meta["trans"], dtype=np.float32)
trans = torch.from_numpy(trans.astype(np.float32)).to("cuda")
self.model.trans = torch.nn.parameter.Parameter(trans, requires_grad=False)
# scale scene
scale = np.array(meta["scale"], dtype=np.float32)
scale = torch.from_numpy(scale.astype(np.float32)).to("cuda")
self.model.scale = torch.nn.parameter.Parameter(scale, requires_grad=False)
def model_forward(self, data, mode):
render_pkg = render(data['viewpoint_cam'], self.model, self.cfg, data.pop('bg'), dirs=self.scene.dirs)
data.update(render_pkg)
self._compute_loss(data, mode)
loss = self._get_total_loss()
return loss
def _compute_loss(self, data, mode=None):
if mode == 'train':
gt_image = data['viewpoint_cam'].original_image.cuda()
self.losses['l1'] = l1_loss(data['render'], gt_image) if not self.cfg.model.use_decoupled_appearance \
else L1_loss_appearance(data['render'], gt_image, self.model, data['viewpoint_cam'].idx)
self.losses['ssim'] = 1.0 - ssim(data['render'], gt_image)
if 'l1_scale' in self.weights or 'entropy' in self.weights or 'proj' in self.weights or 'repul' in self.weights:
mask, _ = self.model.get_inside_gaus_normalized()
if 'l1_scale' in self.weights and not self.sphere:
scaling = self.model.get_scaling[mask].min(-1)[0]
self.losses['l1_scale'] = l1_loss(scaling, torch.zeros_like(scaling))
if 'entropy' in self.weights:
opacity = self.model.get_opacity[mask]
self.losses['entropy'] = entropy_loss(opacity)
if 'mono_depth' in self.weights:
render_depth = data['depth']
gt_depth = data['viewpoint_cam'].depth.cuda().float()
mask = None
if self.cfg.model.load_mask:
mask = data['viewpoint_cam'].mask
mask = render_depth > 0
self.losses['mono_depth'] = self.depth_loss(render_depth, gt_depth, mask)
if 'mono_normal' in self.weights and self.current_iteration > self.cfg.optim.normal_from_iter:
render_normal = data['normal']
gt_normal = data['viewpoint_cam'].normal.cuda()
self.losses['mono_normal'] = monosdf_normal_loss(render_normal, gt_normal)
if 'depth_normal' in self.weights and self.current_iteration > self.cfg.optim.dnormal_from_iter:
est_normal = data['est_normal']
gt_normal = data['viewpoint_cam'].normal.cuda()
render_normal = data['normal'].detach()
mask = data['mask']
with torch.no_grad():
weights = cos_weight(render_normal, gt_normal, self.cfg.optim.exp_t)
if mask.sum() != 0:
est_normal, gt_normal = est_normal[mask], gt_normal[mask]
render_normal = render_normal[mask]
weights = weights[mask]
self.losses['depth_normal'] = monosdf_normal_loss(est_normal, gt_normal, weights)
else: self.losses['depth_normal'] = 0
if 'curv' in self.weights and self.current_iteration > self.cfg.optim.curv_from_iter:
est_normal = data['est_normal'] # h, w, 3
mask = data['mask'][..., None].clone() # h, w, 1
mask = mask.float()
curv = normal2curv(est_normal, mask)
self.losses['curv'] = l1_loss(curv, 0)
if 'consistent_normal' in self.weights and self.current_iteration > self.cfg.optim.consistent_normal_from_iter:
est_normal = data['est_normal']
render_normal = data['normal']
mask = data['mask']
self.losses['consistent_normal'] = monosdf_normal_loss(est_normal, render_normal)
if 'distortion' in self.weights and self.current_iteration > self.cfg.optim.close_depth_from_iter:
distortion_map = data['distortion']
distortion_map = get_edge_aware_distortion_map(gt_image, distortion_map)
self.losses['distortion'] = distortion_map.mean()
if 'depth_var' in self.weights and self.current_iteration > self.cfg.optim.close_depth_from_iter:
depth_var = data['depth_var']
depth_var = get_edge_aware_distortion_map(gt_image, depth_var)
self.losses['depth_var'] = depth_var.mean()
if 'semantic' in self.weights:
sem_logits = data['render_sem']
sem_trg = data['viewpoint_cam'].mask.view(-1)
self.losses['semantic'] = F.cross_entropy(sem_logits.view(-1, self.model.num_cls), sem_trg) / torch.log(torch.tensor(self.model.num_cls)) # normalize to (0,1)
def _get_total_loss(self):
r"""Return the total loss to be backpropagated.
"""
total_loss = torch.tensor(0., device=torch.device('cuda'))
# Iterates over all possible losses.
for loss_name in self.weights:
if loss_name in self.losses:
# Multiply it with the corresponding weight and add it to the total loss.
total_loss += self.losses[loss_name] * self.weights[loss_name]
self.losses['total'] = total_loss # logging purpose
return total_loss
def train_step(self, mode='train'):
data = dict()
# Pick a random Camera
if not self.viewpoint_stack:
self.viewpoint_stack = self.scene.getTrainCameras().copy()
data['viewpoint_cam'] = self.viewpoint_stack.pop(randint(0, len(self.viewpoint_stack)-1))
# Render
if (self.current_iteration - 1) == self.debug_from:
self.cfg.pipline.debug = True
data['bg'] = torch.rand((3), device="cuda") if self.cfg.optim.random_background else self.background
loss = self.model_forward(data, mode)
loss.backward()
viewspace_point_tensor, visibility_filter, radii = data.pop("viewspace_points"), data.pop("visibility_filter"), data.pop("radii")
with torch.no_grad():
# Densification
if self.current_iteration < self.cfg.optim.densify_until_iter:
# Keep track of max radii in image-space for pruning
self.model.max_radii2D[visibility_filter] = torch.max(self.model.max_radii2D[visibility_filter], radii[visibility_filter])
viewspace_point_tensor_densify = data["viewspace_points_densify"]
self.model.add_densification_stats(viewspace_point_tensor_densify, visibility_filter)
# self.model.add_densification_stats(viewspace_point_tensor, visibility_filter)
if self.current_iteration > self.cfg.optim.densify_from_iter \
and hasattr(self.cfg.optim, 'densify_large'):
if 'countlist' in data:
visi_list_each = data['countlist']
self.visi_list = visi_list_each if self.visi_list is None else self.visi_list + visi_list_each
if self.current_iteration > self.cfg.optim.densify_from_iter and self.current_iteration % self.cfg.optim.densification_interval == 0:
size_threshold = 20 if self.current_iteration > self.cfg.optim.opacity_reset_interval else None
visi = None
if getattr(self.cfg.optim, 'densify_large', False) and self.cfg.optim.densify_large.sample_cams.num > 0 \
and getattr(self.cfg.optim.densify_large, 'percent_dense', 0):
visi = self.get_visi_mask_acc(self.cfg.optim.densify_large.sample_cams.num,
self.cfg.optim.densify_large.sample_cams.up,
self.cfg.optim.densify_large.sample_cams.around,
sample_mode='random')
if self.visi_list is not None:
visi = visi & self.visi_list > 0
self.model.densify_and_prune(self.cfg.optim.densify_grad_threshold, 0.005, self.scene.cameras_extent, size_threshold, visi)
self.visi_list = None
if self.current_iteration % self.cfg.optim.opacity_reset_interval == 0 or \
(self.cfg.model.white_background and self.current_iteration == self.cfg.optim.densify_from_iter):
self.model.reset_opacity()
if self.current_iteration in self.cfg.optim.prune.iterations:
# TODO Add prunning types
n = int(len(self.scene.getFullCameras()) * 1.2)
viewpoint_stack = self.scene.getFullCameras().copy()
gaussian_list, imp_list = prune_list(self.model, viewpoint_stack, self.cfg.pipline, self.background)
i = self.cfg.optim.prune.iterations.index(self.current_iteration)
v_list = calculate_v_imp_score(self.model, imp_list, self.cfg.optim.prune.v_pow)
self.model.prune_gaussians(
(self.cfg.optim.prune.decay**i) * self.cfg.optim.prune.percent, v_list
)
# Optimizer step
self.model.optimizer.step()
self.model.optimizer.zero_grad(set_to_none = True)
return data
def start_of_iteration(self):
self.iter_start.record()
# train or fine-tune
iter = self.current_iteration if self.star_ft_iter is None \
else self.current_iteration - self.star_ft_iter
self.model.update_learning_rate(iter)
# Every 1000 its we increase the levels of SH up to a maximum degree
if self.current_iteration % 1000 == 0:
self.model.oneupSHdegree()
def end_of_iteration(self, output, render, progress_bar):
self.iter_end.record()
with torch.no_grad():
# Progress bar
self.ema_loss_for_log = 0.4 * self.losses['total'].item() + 0.6 * self.ema_loss_for_log
if self.current_iteration % 10 == 0:
progress_bar.set_postfix({"Loss": f"{self.ema_loss_for_log:.{7}f}"})
progress_bar.update(10)
if self.current_iteration == self.max_iters :
progress_bar.close()
# Log and save
if self.writer:
self.log_writer(output, mode="train")
else:
output.update(self.test(render))
self.log_wandb_scalars(output, mode="train")
if (self.current_iteration in self.saving_iterations) or (self.current_iteration == self.max_iters):
self.save_gaussians()
if (self.current_iteration in self.checkpoint_iterations) or (self.current_iteration == self.max_iters):
print("\n[ITER {}] Saving Checkpoint".format(self.current_iteration))
torch.save((self.model.capture(), self.current_iteration), self.scene.model_path + "/chkpnt" + str(self.current_iteration) + ".pth")
if len(self.cfg.optim.prune.iterations) > 0 and self.current_iteration == self.max_iters:
viewpoint_stack = self.scene.getFullCameras().copy()
gaussian_list, imp_list = prune_list(self.model, viewpoint_stack, self.cfg.pipline, self.background)
v_list = calculate_v_imp_score(self.model, imp_list, self.cfg.optim.prune.v_pow)
np.savez(os.path.join(self.scene.model_path, "imp_score"), v_list.cpu().detach().numpy())
def log_wandb_scalars(self, output, mode=None):
scalars = dict()
if mode == "train":
for param_group in self.model.optimizer.param_groups:
scalars.update({"optim/lr_{}".format(param_group["name"]): param_group['lr']})
scalars.update({"time/iteration": self.iter_start.elapsed_time(self.iter_end)})
scalars.update({f"loss/{mode}_{key}": value for key, value in self.losses.items()})
scalars.update(iteration=self.current_iteration)
scalars.update({k: v for k, v in output.items() if isinstance(v, (int, float))})
wandb.log(scalars, step=self.current_iteration)
def log_wandb_images(self, data, mode=None):
image = torch.cat([data["rgb_map"], data["image"]], dim=1)
depth = data["depth_map"]
inv_depth = depth.max() - depth
images = {f'vis/{mode}': wandb_image(image),
f'vis/{mode}_depth': wandb_image(depth, from_range=(depth.min(), depth.max())),
f'vis/{mode}_inv_depth': wandb_image(inv_depth, from_range=(inv_depth.min(), inv_depth.max()))}
if 'depth_var' in data:
depth_var = data['depth_var']
images.update({f'vis/{mode}_depth_var': wandb_image(depth_var, from_range=(depth_var.min(), depth_var.max()))})
if 'depth' in data:
depth = data["depth"].detach().clone()
images.update({f'vis/{mode}_depth_gt': wandb_image(depth, from_range=(depth.min(), depth.max()))})
if 'mask' in data:
mask = data['mask'].detach().clone().float()
images.update({f'vis/{mode}_mask': wandb_image(mask)})
if 'normal_map' in data:
normal_map = data["normal_map"]
images.update({f'vis/{mode}_normal': wandb_image(normal_map.permute(2, 0, 1), from_range=(-1, 1))})
if 'normal' in data:
normal = data["normal"].detach().clone()
images.update({f'vis/{mode}_normal_gt': wandb_image(normal.permute(2, 0, 1), from_range=(-1, 1))})
cos = cos_weight(normal.cuda(), normal_map, self.cfg.optim.exp_t)
images.update({f'vis/{mode}_normal_cos': wandb_image(cos, from_range=(0, 1))})
if 'est_normal' in data:
est_normal = data["est_normal"].permute(2, 0, 1).detach().clone()
images.update({f'vis/{mode}_est_normal': wandb_image(est_normal, from_range=(-1, 1))})
if 'transformed_est_normal' in data:
transformed_est_normal = data["transformed_est_normal"].permute(2, 0, 1).detach().clone()
images.update({f'vis/{mode}_trans_est_normal': wandb_image(transformed_est_normal, from_range=(-1, 1))})
if 'sem' in data:
sem = data['sem']
images.update({f'vis/{mode}_sem': wandb_sem(sem)})
if 'distortion' in data:
distortion = data['distortion']
images.update({f'vis/{mode}_distortion': wandb_image(distortion, from_range=(distortion.min(), distortion.max()))})
if 'depth_var' in data:
depth_var = data['depth_var']
images.update({f'vis/{mode}_depth_var': wandb_image(depth_var, from_range=(depth_var.min(), depth_var.max()))})
if 'trans_image' in data:
trans_image = data['trans_image']
images.update({f'vis/{mode}_trans': wandb_image(trans_image)})
wandb.log(images, step=self.current_iteration)
def log_hist(self, tensor, name, num_bin=10):
counts, bins = np.histogram(tensor, bins=num_bin)
density = counts / counts.sum()
plt.stairs(density, bins)
plt.title('Histogram {}'.format(name))
wandb.log({f'statistic/{name}': wandb.Image(plt)}, step=self.current_iteration)
plt.close()
@torch.no_grad()
def test(self, renderFunc):
output = dict()
# Report test and samples of training set
if (self.current_iteration in self.testing_iterations) or (self.current_iteration == self.max_iters):
torch.cuda.empty_cache()
validation_configs = ({'name': 'test', 'cameras' : self.scene.getTestCameras()},
{'name': 'train', 'cameras' : self.scene.getTrainCameras()})
for config in validation_configs:
if config['cameras'] and len(config['cameras']) > 0:
l1_test = 0.0
psnr_test = 0.0
for idx, viewpoint in enumerate(config['cameras']):
out = renderFunc(viewpoint, self.model, self.cfg, self.background, dirs=self.scene.dirs)
image = torch.clamp(out["render"], 0.0, 1.0)
gt_image = torch.clamp(viewpoint.original_image.to("cuda"), 0.0, 1.0)
if config['name'] == 'train' and self.cfg.model.use_decoupled_appearance:
trans_image = L1_loss_appearance(image, gt_image, self.model, viewpoint.idx, return_transformed_image=True)
depth = out["depth"]
normal = out["normal"] if "normal" in out else None
est_normal = out["est_normal"] if "est_normal" in out else None
if 'render_sem' in out:
pred = self.model.logits_2_label(out['render_sem'])
sem_mask = viewpoint.mask.cuda()
self.calc_miou.update(pred, sem_mask)
if viewpoint.image_name == self.scene.first_name:
data = {"image": gt_image, "rgb_map": image, "depth_map": depth}
if config['name'] == 'train' and self.cfg.model.use_decoupled_appearance:
data['trans_image'] = trans_image
if 'mask' in out: data['mask'] = out['mask']
if viewpoint.depth is not None: data['depth'] = viewpoint.depth
if 'depth_var' in out: data['depth_var'] = out['depth_var']
if 'distortion' in out: data['distortion'] = out['distortion']
if normal is not None:
data["normal_map"] = normal
if viewpoint.normal is not None: data['normal'] = viewpoint.normal
if est_normal is not None:
data['est_normal'] = est_normal
if 'render_sem' in out:
pred = self.model.logits_2_label(out['render_sem']).to(torch.uint8)
data['sem'] = torch.cat([pred, sem_mask], dim=0)
self.log_wandb_images(data, mode=config['name'])
if False:
data = {"image": gt_image, "rgb_map": image, "depth_map": depth}
if 'mask' in out: data['mask'] = out['mask']
if viewpoint.depth is not None: data['depth'] = viewpoint.depth
if 'depth_var' in out: data['depth_var'] = out['depth_var']
if normal is not None:
data["normal_map"] = normal
if viewpoint.normal is not None: data['normal'] = viewpoint.normal
if est_normal is not None: data['est_normal'] = est_normal
cos = cos_weight(normal.cuda(), normal, self.cfg.optim.exp_t)
data['normal_cos'] = cos
self.save_vis(data, viewpoint.image_name, mode=config['name'])
l1_test += l1_loss(image, gt_image).mean().double()
psnr_test += psnr(image, gt_image).mean().double()
psnr_test /= len(config['cameras'])
l1_test /= len(config['cameras'])
if self.enable_semantic:
miou = self.calc_miou.compute()
self.calc_miou.reset()
output.update({
f'statistic/{config["name"]}_PSNR': psnr_test.item(),
f'loss/{config["name"]}_l1': l1_test.item(),
})
if self.enable_semantic:
output[f'statistic/{config["name"]}_mIoU'] = miou.item()
output.update({
'statistic/total_points': self.scene.gaussians.get_xyz.shape[0],
})
self.log_hist(self.model.get_opacity.cpu().numpy(), "opacity")
torch.cuda.empty_cache()
return output
def finalize(self):
# Finish the W&B logger.
wandb.finish()
def log_writer(self, mode=None):
if self.writer:
for key, value in self.losses.items():
self.writer.add_scalar(f"loss/{mode}_{key}", value, global_step=self.current_iteration)
def save_vis(self, data, name, mode='train'):
image = torch.clamp(data["rgb_map"], 0.0, 1.0).detach().cpu()
image = (image.permute(1, 2, 0).numpy() * 255).astype('uint8')
imageio.imsave(os.path.join(self.vis_color_path, mode, f"{name}.png"), image)
normal = preprocess_image(data["normal_map"].permute(2, 0, 1), from_range=(-1, 1))
normal.save(os.path.join(self.vis_normal_path, mode, f"{name}.png"))
if False:
normal_gt = preprocess_image(data["normal"].permute(2, 0, 1), from_range=(-1, 1))
gt_normal_path = os.path.join(self.vis_normal_path+'_gt', mode)
if not os.path.exists(gt_normal_path):
os.makedirs(gt_normal_path, exist_ok=True)
normal_gt.save(os.path.join(gt_normal_path, f"{name}.png"))
dnormal = preprocess_image(data["est_normal"].permute(2, 0, 1), from_range=(-1, 1))
dnormal.save(os.path.join(self.vis_dnormal_path, mode, f"{name}.png"))
cos = preprocess_image(data["normal_cos"], from_range=(0, 1))
cos.save(os.path.join(self.vis_cos_path, mode, f"{name}.png"))
return
def sample_cameras(self, n, up=False, around=True, look_mode='target', sample_mode='grid', bidirect=True): # direction target
cam_height = None
w2cs = bb_camera(n, self.model.trans, self.model.scale, cam_height, up=up, around=around, \
look_mode=look_mode, sample_mode=sample_mode, bidirect=bidirect)
FoVx = FoVy = 2.5
width = height = 1500
cams = []
for i in range(w2cs.shape[0]):
w2c = w2cs[i]
cam = SampleCam(w2c, width, height, FoVx, FoVy)
cams.append(cam)
return cams
@torch.no_grad()
def get_visi_mask(self, n=500, up=False, around=True, denoise_after=False, \
denoise_before=True, nb_points=10, viewpoint_stack=None, sample_mode='grid', cat_cams=False): # direction target
if viewpoint_stack is None:
if self.cfg.optim.densify_large.sample_cams.random:
viewpoint_stack = self.sample_cameras(n, up, around, sample_mode=sample_mode)
if cat_cams:
viewpoint_stack += self.scene.getTrainCameras().copy()
else:
viewpoint_stack = self.scene.getTrainCameras().copy()
model = deepcopy(self.model)
if denoise_before:
mask = torch.ones(model.get_xyz.shape[0], dtype=torch.bool, device="cuda")
valid = model.filter_points()
mask[valid] = False
model.prune_points(mask)
else:
mask = torch.zeros(model.get_xyz.shape[0], dtype=torch.bool, device="cuda")
xyz = model.get_xyz[None]
dist2 = knn_points(xyz, xyz, K=nb_points+1, return_sorted=True).dists # 1, N, K
dist2 = dist2[0, :, 1:]
dist2 = torch.clamp_min(dist2, 0.0000001)
dist = (torch.sqrt(dist2)).mean(-1)
scaling = dist
scales = torch.log(scaling)[...,None].repeat(1, 3)
idx = torch.argmin(model.get_scaling, dim=-1)
scales[torch.arange(scales.shape[0]), idx] = math.log(1e-7)
model._scaling = nn.Parameter(scales.requires_grad_(True))
out = get_visi_list(model, viewpoint_stack, self.cfg.pipline, self.background)
visi = out['visi']
valid = ~mask
if denoise_after:
model.prune_points(~visi)
filted = model.filter_points()
visi[visi.clone()] = filted
valid[~mask] = visi
del model
return valid
@torch.no_grad()
def get_visi_mask_acc(self, n=500, up=False, around=True, sample_mode='grid', viewpoint_stack=None):
if viewpoint_stack is None:
if self.cfg.optim.densify_large.sample_cams.random:
viewpoint_stack = self.sample_cameras(n, up, around, sample_mode=sample_mode)
else:
fullcam = self.scene.getTrainCameras().copy()
idx = torch.randint(0, len(fullcam), (n,))
viewpoint_stack = [fullcam[i] for i in idx]
out = get_visi_list(self.model, viewpoint_stack, self.cfg.pipline, self.background)
visi = out['visi']
inside = self.model.get_inside_gaus_normalized()[0]
valid = visi & inside
return valid
@torch.no_grad()
def save_gaussians(self):
print("\n[ITER {}] Saving Gaussians".format(self.current_iteration))
surfmask = None
visi = None
self.scene.save(self.current_iteration, visi=visi, surf=surfmask, save_splat=self.cfg.train.save_splat)
if __name__ == "__main__":
from configs.config import Config
import sys
sys.path.append(os.getcwd())
cfg_path = 'projects/gaussain_splatting/configs/base.yaml'
cfg = Config(cfg_path)
trainer = Trainer(cfg)
trainer.get_center_scale()
for thr in np.linspace(0.9, 1., 11):
trainer.save_pts_thr(thr)