Repository: hyzhou404/HUGS
Branch: main
Commit: dbb17df8c2b9
Files: 41
Total size: 139.6 KB
Directory structure:
gitextract_gn_l3ctt/
├── .gitignore
├── .gitmodules
├── LICENSE.md
├── README.md
├── arguments/
│ └── __init__.py
├── environment.yml
├── gaussian_renderer/
│ └── __init__.py
├── lpipsPyTorch/
│ ├── __init__.py
│ └── modules/
│ ├── lpips.py
│ ├── networks.py
│ └── utils.py
├── metrics.py
├── render.py
├── requirements.txt
├── scene/
│ ├── __init__.py
│ ├── cameras.py
│ ├── dataset_readers.py
│ └── gaussian_model.py
├── submodules/
│ └── simple-knn/
│ ├── ext.cpp
│ ├── setup.py
│ ├── simple_knn/
│ │ └── .gitkeep
│ ├── simple_knn.cu
│ ├── simple_knn.h
│ ├── spatial.cu
│ └── spatial.h
└── utils/
├── camera_utils.py
├── cmap.py
├── dynamic_utils.py
├── general_utils.py
├── graphics_utils.py
├── image_utils.py
├── iou_utils.py
├── loss_utils.py
├── nvseg_utils.py
├── semantic_utils.py
├── sh_utils.py
├── system_utils.py
└── vehicle_template/
├── benz_kitti.ply
├── benz_kitti360.ply
├── benz_pandaset.ply
└── benz_waymo.ply
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
*.pyc
.vscode
output
build
diff_rasterization/diff_rast.egg-info
diff_rasterization/dist
tensorboard_3d
screenshots
*.egg-info
external
shell
.DS_Store
================================================
FILE: .gitmodules
================================================
[submodule "submodules/simple-knn"]
path = submodules/simple-knn
url = https://gitlab.inria.fr/bkerbl/simple-knn.git
[submodule "submodules/hugs-rasterization"]
path = submodules/hugs-rasterization
url = https://github.com/hyzhou404/hugs-rasterization
================================================
FILE: LICENSE.md
================================================
HUGS License
===========================
**Zhejiang University** hold all the ownership rights on the *Software* named **HUGS**.
The *Software* is still being developed by the *Licensor*.
*Licensor*'s goal is to allow the research community to use, test and evaluate
the *Software*.
## 1. Definitions
*Licensee* means any person or entity that uses the *Software* and distributes
its *Work*.
*Licensor* means the owners of the *Software*, i.e Zhejiang University
*Software* means the original work of authorship made available under this
License ie HUGS.
*Work* means the *Software* and any additions to or derivative works of the
*Software* that are made available under this License.
## 2. Purpose
This license is intended to define the rights granted to the *Licensee* by
Licensors under the *Software*.
## 3. Rights granted
For the above reasons Licensors have decided to distribute the *Software*.
Licensors grant non-exclusive rights to use the *Software* for research purposes
to research users (both academic and industrial), free of charge, without right
to sublicense.. The *Software* may be used "non-commercially", i.e., for research
and/or evaluation purposes only.
Subject to the terms and conditions of this License, you are granted a
non-exclusive, royalty-free, license to reproduce, prepare derivative works of,
publicly display, publicly perform and distribute its *Work* and any resulting
derivative works in any form.
## 4. Limitations
**4.1 Redistribution.** You may reproduce or distribute the *Work* only if (a) you do
so under this License, (b) you include a complete copy of this License with
your distribution, and (c) you retain without modification any copyright,
patent, trademark, or attribution notices that are present in the *Work*.
**4.2 Derivative Works.** You may specify that additional or different terms apply
to the use, reproduction, and distribution of your derivative works of the *Work*
("Your Terms") only if (a) Your Terms provide that the use limitation in
Section 2 applies to your derivative works, and (b) you identify the specific
derivative works that are subject to Your Terms. Notwithstanding Your Terms,
this License (including the redistribution requirements in Section 3.1) will
continue to apply to the *Work* itself.
**4.3** Any other use without of prior consent of Licensors is prohibited. Research
users explicitly acknowledge having received from Licensors all information
allowing to appreciate the adequacy between of the *Software* and their needs and
to undertake all necessary precautions for its execution and use.
**4.4** The *Software* is provided both as a compiled library file and as source
code. In case of using the *Software* for a publication or other results obtained
through the use of the *Software*, users are strongly encouraged to cite the
corresponding publications as explained in the documentation of the *Software*.
## 5. Disclaimer
THE USER CANNOT USE, EXPLOIT OR DISTRIBUTE THE *SOFTWARE* FOR COMMERCIAL PURPOSES
WITHOUT PRIOR AND EXPLICIT CONSENT OF LICENSORS. YOU MUST CONTACT Zhejiang University FOR ANY
UNAUTHORIZED USE: yiyi.liao@zju.edu.cn. ANY SUCH ACTION WILL
CONSTITUTE A FORGERY. THIS *SOFTWARE* IS PROVIDED "AS IS" WITHOUT ANY WARRANTIES
OF ANY NATURE AND ANY EXPRESS OR IMPLIED WARRANTIES, WITH REGARDS TO COMMERCIAL
USE, PROFESSIONNAL USE, LEGAL OR NOT, OR OTHER, OR COMMERCIALISATION OR
ADAPTATION. UNLESS EXPLICITLY PROVIDED BY LAW, IN NO EVENT, SHALL Zhejiang University OR THE
AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES, LOSS OF USE, DATA, OR PROFITS OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING FROM, OUT OF OR
IN CONNECTION WITH THE *SOFTWARE* OR THE USE OR OTHER DEALINGS IN THE *SOFTWARE*.
================================================
FILE: README.md
================================================
# HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting
[Hongyu Zhou](https://github.com/hyzhou404), [Jiahao Shao](https://jhaoshao.github.io/), Lu Xu, Dongfeng Bai, [Weichao Qiu](https://weichaoqiu.com/), Bingbing Liu, [Yue Wang](https://ywang-zju.github.io/), [Andreas Geiger](https://www.cvlibs.net/) , [Yiyi Liao](https://yiyiliao.github.io/)<br>
| [Webpage](https://xdimlab.github.io/hugs_website/) | [Full Paper](https://openaccess.thecvf.com/content/CVPR2024/html/Zhou_HUGS_Holistic_Urban_3D_Scene_Understanding_via_Gaussian_Splatting_CVPR_2024_paper.html) | [Video](https://www.youtube.com/watch?v=DmPhL-8FeT4)
This repository contains the official authors implementation associated with the paper "HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting", which can be found [here](https://xdimlab.github.io/hugs_website/).

Abstract: *Holistic understanding of urban scenes based on RGB images is a challenging yet important problem. It encompasses understanding both the geometry and appearance to enable novel view synthesis, parsing semantic labels, and tracking moving objects. Despite considerable progress, existing approaches often focus on specific aspects of this task and require additional inputs such as LiDAR scans or manually annotated 3D bounding boxes. In this paper, we introduce a novel pipeline that utilizes 3D Gaussian Splatting for holistic urban scene understanding. Our main idea involves the joint optimization of geometry, appearance, semantics, and motion using a combination of static and dynamic 3D Gaussians, where moving object poses are regularized via physical constraints. Our approach offers the ability to render new viewpoints in real-time, yielding 2D and 3D semantic information with high accuracy, and reconstruct dynamic scenes, even in scenarios where 3D bounding box detection are highly noisy. Experimental results on KITTI, KITTI-360, and Virtual KITTI 2 demonstrate the effectiveness of our approach.*
## Cloning the Repository
The repository contains submodules, thus please check it out with
```shell
# SSH
git clone git@github.com:hyzhou404/hugs.git --recursive
```
or
```shell
# HTTPS
git clone https://github.com/hyzhou404/hugs --recursive
```
## Prepare Enviroments
Create conda environment:
```shell
conda create -n hugs python=3.10 -y
```
Please install [PyTorch](https://pytorch.org/), [tiny-cuda-nn](https://github.com/NVlabs/tiny-cuda-nn), [pytorch3d](https://github.com/facebookresearch/pytorch3d/tree/main) and [flow-vis-torch](https://github.com/ChristophReich1996/Optical-Flow-Visualization-PyTorch) by following official instructions.
Install submodules by running:
```shell
pip install submodules/simple-knn
pip install submodules/hugs-rasterization
```
Install remaining packages by running:
```shell
pip install -r requirements.txt
```
## Data & Checkpoints Download
we have made available two sequences from KITTI as indicated in our paper. Furthermore, three sequences from KITTI-360 and one sequence from Waymo has also been provided.
Download checkpoints from [here](https://huggingface.co/datasets/hyzhou404/hugs_release).
```python
unzip ${sequence}.zip
```
## Rendering
Render test views by running:
```shell
python render.py -m ${checkpoint_path} --data_type ${dataset_type} --iteration 30000 --affine
```
The variable **dataset_type** is a string, and its value can be one of the following: **kitti**, **kitti360**, or **waymo**.
## Evaluation
```
python metrics.py -m ${checkpoint_path}
```
## Training
This repository only includes the inference code of HUGS. The code for training will be released in future work.
<section class="section" id="BibTeX">
<div class="container is-max-desktop content">
<h2 class="title">BibTeX</h2>
<pre><code>@InProceedings{Zhou_2024_CVPR,
author = {Zhou, Hongyu and Shao, Jiahao and Xu, Lu and Bai, Dongfeng and Qiu, Weichao and Liu, Bingbing and Wang, Yue and Geiger, Andreas and Liao, Yiyi},
title = {HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {21336-21345}
}</code></pre>
</div>
</section>
================================================
FILE: arguments/__init__.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
from argparse import ArgumentParser, Namespace
import sys
import os
class GroupParams:
pass
class ParamGroup:
def __init__(self, parser: ArgumentParser, name : str, fill_none = False):
group = parser.add_argument_group(name)
for key, value in vars(self).items():
shorthand = False
if key.startswith("_"):
shorthand = True
key = key[1:]
t = type(value)
value = value if not fill_none else None
if shorthand:
if t == bool:
group.add_argument("--" + key, ("-" + key[0:1]), default=value, action="store_true")
else:
group.add_argument("--" + key, ("-" + key[0:1]), default=value, type=t)
else:
if t == bool:
group.add_argument("--" + key, default=value, action="store_true")
else:
group.add_argument("--" + key, default=value, type=t)
def extract(self, args):
group = GroupParams()
for arg in vars(args).items():
if arg[0] in vars(self) or ("_" + arg[0]) in vars(self):
setattr(group, arg[0], arg[1])
return group
class ModelParams(ParamGroup):
def __init__(self, parser, sentinel=False):
self.sh_degree = 3
self._source_path = ""
self._model_path = ""
self._images = "images"
self._resolution = -1
self._white_background = False
self.data_device = "cpu"
self.eval = False
super().__init__(parser, "Loading Parameters", sentinel)
def extract(self, args):
g = super().extract(args)
g.source_path = os.path.abspath(g.source_path)
return g
class PipelineParams(ParamGroup):
def __init__(self, parser):
self.convert_SHs_python = False
self.compute_cov3D_python = False
self.debug = False
super().__init__(parser, "Pipeline Parameters")
class OptimizationParams(ParamGroup):
def __init__(self, parser):
self.iterations = 30_000
self.position_lr_init = 0.00016
self.position_lr_final = 0.0000016
self.position_lr_delay_mult = 0.01
self.position_lr_max_steps = 30_000
self.feature_lr = 0.0025
self.opacity_lr = 0.05
self.scaling_lr = 0.001
self.rotation_lr = 0.001
self.percent_dense = 0.001
self.lambda_dssim = 0.2
self.densification_interval = 100
self.opacity_reset_interval = 3000
self.densify_from_iter = 500
self.densify_until_iter = 15_000
self.densify_grad_threshold = 0.0002
super().__init__(parser, "Optimization Parameters")
def get_combined_args(parser : ArgumentParser):
cmdlne_string = sys.argv[1:]
cfgfile_string = "Namespace()"
args_cmdline = parser.parse_args(cmdlne_string)
try:
cfgfilepath = os.path.join(args_cmdline.model_path, "cfg_args")
print("Looking for config file in", cfgfilepath)
with open(cfgfilepath) as cfg_file:
print("Config file found: {}".format(cfgfilepath))
cfgfile_string = cfg_file.read()
except TypeError:
print("Config file not found at")
pass
args_cfgfile = eval(cfgfile_string)
merged_dict = vars(args_cfgfile).copy()
for k,v in vars(args_cmdline).items():
if v != None:
merged_dict[k] = v
return Namespace(**merged_dict)
================================================
FILE: environment.yml
================================================
name: gaussian_splatting
channels:
- pytorch
- conda-forge
- defaults
dependencies:
- cudatoolkit=11.6
- plyfile=0.8.1
- python=3.7.13
- pip=22.3.1
- pytorch=1.12.1
- torchaudio=0.12.1
- torchvision=0.13.1
- tqdm
- pip:
- submodules/diff-gaussian-rasterization
- submodules/simple-knn
================================================
FILE: gaussian_renderer/__init__.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import torch
import math
from diff_gaussian_rasterization import GaussianRasterizationSettings, GaussianRasterizer
from scene.gaussian_model import GaussianModel
from utils.sh_utils import eval_sh, RGB2SH
from pytorch3d.transforms import quaternion_to_matrix, matrix_to_quaternion
def euler2matrix(yaw):
cos = torch.cos(-yaw)
sin = torch.sin(-yaw)
rot = torch.eye(3).float().cuda()
rot[0,0] = cos
rot[0,2] = sin
rot[2,0] = -sin
rot[2,2] = cos
return rot
def cat_bgfg(bg, fg, only_dynamic=False, only_xyz=False):
if only_xyz:
bg_feats = [bg.get_xyz]
else:
bg_feats = [bg.get_xyz, bg.get_opacity, bg.get_scaling, bg.get_rotation, bg.get_features, bg.get_3D_features]
output = []
for fg_feat, bg_feat in zip(fg, bg_feats):
if fg_feat is None:
output.append(bg_feat)
elif only_dynamic:
output.append(fg_feat)
else:
output.append(torch.cat((bg_feat, fg_feat), dim=0))
return output
def cat_all_fg(all_fg, next_fg):
output = []
for feat, next_feat in zip(all_fg, next_fg):
if feat is None:
feat = next_feat
else:
feat = torch.cat((feat, next_feat), dim=0)
output.append(feat)
return output
def proj_uv(xyz, cam):
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
intr = torch.as_tensor(cam.K[:3, :3]).float().to(device) # (3, 3)
w2c = torch.tensor(cam.w2c).float().to(device)[:3, :] # (3, 4)
c_xyz = (w2c[:3, :3] @ xyz.T).T + w2c[:3, 3]
i_xyz = (intr @ c_xyz.mT).mT # (N, 3)
uv = i_xyz[:, :2] / i_xyz[:, -1:].clip(1e-3) # (N, 2)
return uv
def unicycle_b2w(timestamp, model):
# model = unicycle_models[track_id]['model']
pred = model(timestamp)
if pred is None:
return None
pred_a, pred_b, pred_v, pred_phi, pred_h = pred
# r = euler_angles_to_matrix(torch.tensor([0, pred_phi-torch.pi, 0]), 'XYZ')
rt = torch.eye(4).float().cuda()
rt[:3,:3] = euler2matrix(pred_phi)
rt[1, 3], rt[0, 3], rt[2, 3] = pred_h, pred_a, pred_b
return rt
def render(viewpoint_camera, prev_viewpoint_camera, pc : GaussianModel, dynamic_gaussians : dict,
unicycles : dict, pipe, bg_color : torch.Tensor,
render_optical=False, scaling_modifier = 1.0, only_dynamic=False):
"""
Render the scene.
Background tensor (bg_color) must be on GPU!
"""
timestamp = viewpoint_camera.timestamp
all_fg = [None, None, None, None, None, None]
prev_all_fg = [None]
if len(unicycles) == 0:
track_dict = viewpoint_camera.dynamics
if prev_viewpoint_camera is not None:
prev_track_dict = prev_viewpoint_camera.dynamics
else:
track_dict, prev_track_dict = {}, {}
for track_id, uni_model in unicycles.items():
B2W = unicycle_b2w(timestamp, uni_model['model'])
track_dict[track_id] = B2W
if prev_viewpoint_camera is not None:
prev_B2W = unicycle_b2w(prev_viewpoint_camera.timestamp, uni_model['model'])
prev_track_dict[track_id] = prev_B2W
for track_id, B2W in track_dict.items():
w_dxyz = (B2W[:3, :3] @ dynamic_gaussians[track_id].get_xyz.T).T + B2W[:3, 3]
drot = quaternion_to_matrix(dynamic_gaussians[track_id].get_rotation)
w_drot = matrix_to_quaternion(B2W[:3, :3] @ drot)
next_fg = [w_dxyz,
dynamic_gaussians[track_id].get_opacity,
dynamic_gaussians[track_id].get_scaling,
w_drot,
dynamic_gaussians[track_id].get_features,
dynamic_gaussians[track_id].get_3D_features]
# next_fg = get_next_fg(dynamic_gaussians[track_id], B2W)
# w_dxyz = next_fg[0]
all_fg = cat_all_fg(all_fg, next_fg)
if render_optical and prev_viewpoint_camera is not None:
if track_id in prev_track_dict:
prev_B2W = prev_track_dict[track_id]
prev_w_dxyz = torch.mm(prev_B2W[:3, :3], dynamic_gaussians[track_id].get_xyz.T).T + prev_B2W[:3, 3]
prev_all_fg = cat_all_fg(prev_all_fg, [prev_w_dxyz])
else:
prev_all_fg = cat_all_fg(prev_all_fg, [w_dxyz])
xyz, opacity, scales, rotations, shs, feats3D = cat_bgfg(pc, all_fg)
if render_optical and prev_viewpoint_camera is not None:
prev_xyz = cat_bgfg(pc, prev_all_fg, only_xyz=True)[0]
uv = proj_uv(xyz, viewpoint_camera)
prev_uv = proj_uv(prev_xyz, prev_viewpoint_camera)
delta_uv = uv - prev_uv
delta_uv = torch.cat([delta_uv, torch.ones_like(delta_uv[:, :1], device=delta_uv.device)], dim=-1)
else:
delta_uv = torch.zeros_like(xyz)
# Create zero tensor. We will use it to make pytorch return gradients of the 2D (screen-space) means
screenspace_points = torch.zeros_like(xyz, dtype=xyz.dtype, requires_grad=True, device="cuda") + 0
try:
screenspace_points.retain_grad()
except:
pass
# Set up rasterization configuration
tanfovx = math.tan(viewpoint_camera.FoVx * 0.5)
tanfovy = math.tan(viewpoint_camera.FoVy * 0.5)
if pc.affine:
cam_xyz, cam_dir = viewpoint_camera.c2w[:3, 3].cuda(), viewpoint_camera.c2w[:3, 2].cuda()
o_enc = pc.pos_enc(cam_xyz[None, :] / 60)
d_enc = pc.dir_enc(cam_dir[None, :])
appearance = pc.appearance_model(torch.cat([o_enc, d_enc], dim=1)) * 1e-1
affine_weight, affine_bias = appearance[:, :9].view(3, 3), appearance[:, -3:]
affine_weight = affine_weight + torch.eye(3, device=appearance.device)
# bg_img = pc.sky_model(enc).view(*rays_d.shape).permute(2, 0, 1).float()
raster_settings = GaussianRasterizationSettings(
image_height=int(viewpoint_camera.image_height),
image_width=int(viewpoint_camera.image_width),
tanfovx=tanfovx,
tanfovy=tanfovy,
bg=bg_color,
scale_modifier=scaling_modifier,
viewmatrix=viewpoint_camera.world_view_transform,
projmatrix=viewpoint_camera.full_proj_transform,
sh_degree=pc.active_sh_degree,
campos=viewpoint_camera.camera_center,
prefiltered=False,
debug=pipe.debug
)
rasterizer = GaussianRasterizer(raster_settings=raster_settings)
means3D = xyz
means2D = screenspace_points
cov3D_precomp = None
colors_precomp = None
# Rasterize visible Gaussians to image, obtain their radii (on screen).
rendered_image, radii, feats, depth, flow = rasterizer(
means3D = means3D,
means2D = means2D,
shs = shs,
colors_precomp = colors_precomp,
opacities = opacity,
scales = scales,
rotations = rotations,
cov3D_precomp = cov3D_precomp,
feats3D = feats3D,
delta = delta_uv)
if pc.affine:
colors = rendered_image.view(3, -1).permute(1, 0) # (H*W, 3)
refined_image = (colors @ affine_weight + affine_bias).clip(0, 1).permute(1, 0).view(*rendered_image.shape)
else:
refined_image = rendered_image
# Those Gaussians that were frustum culled or had a radius of 0 were not visible.
# They will be excluded from value updates used in the splitting criteria.
return {"render": refined_image,
"feats": feats,
"depth": depth,
"opticalflow": flow,
"viewspace_points": screenspace_points,
"visibility_filter" : radii > 0,
"radii": radii}
================================================
FILE: lpipsPyTorch/__init__.py
================================================
import torch
from .modules.lpips import LPIPS
def lpips(x: torch.Tensor,
y: torch.Tensor,
net_type: str = 'alex',
version: str = '0.1'):
r"""Function that measures
Learned Perceptual Image Patch Similarity (LPIPS).
Arguments:
x, y (torch.Tensor): the input tensors to compare.
net_type (str): the network type to compare the features:
'alex' | 'squeeze' | 'vgg'. Default: 'alex'.
version (str): the version of LPIPS. Default: 0.1.
"""
device = x.device
criterion = LPIPS(net_type, version).to(device)
return criterion(x, y)
================================================
FILE: lpipsPyTorch/modules/lpips.py
================================================
import torch
import torch.nn as nn
from .networks import get_network, LinLayers
from .utils import get_state_dict
class LPIPS(nn.Module):
r"""Creates a criterion that measures
Learned Perceptual Image Patch Similarity (LPIPS).
Arguments:
net_type (str): the network type to compare the features:
'alex' | 'squeeze' | 'vgg'. Default: 'alex'.
version (str): the version of LPIPS. Default: 0.1.
"""
def __init__(self, net_type: str = 'alex', version: str = '0.1'):
assert version in ['0.1'], 'v0.1 is only supported now'
super(LPIPS, self).__init__()
# pretrained network
self.net = get_network(net_type)
# linear layers
self.lin = LinLayers(self.net.n_channels_list)
self.lin.load_state_dict(get_state_dict(net_type, version))
def forward(self, x: torch.Tensor, y: torch.Tensor):
feat_x, feat_y = self.net(x), self.net(y)
diff = [(fx - fy) ** 2 for fx, fy in zip(feat_x, feat_y)]
res = [l(d).mean((2, 3), True) for d, l in zip(diff, self.lin)]
return torch.sum(torch.cat(res, 0), 0, True)
================================================
FILE: lpipsPyTorch/modules/networks.py
================================================
from typing import Sequence
from itertools import chain
import torch
import torch.nn as nn
from torchvision import models
from .utils import normalize_activation
def get_network(net_type: str):
if net_type == 'alex':
return AlexNet()
elif net_type == 'squeeze':
return SqueezeNet()
elif net_type == 'vgg':
return VGG16()
else:
raise NotImplementedError('choose net_type from [alex, squeeze, vgg].')
class LinLayers(nn.ModuleList):
def __init__(self, n_channels_list: Sequence[int]):
super(LinLayers, self).__init__([
nn.Sequential(
nn.Identity(),
nn.Conv2d(nc, 1, 1, 1, 0, bias=False)
) for nc in n_channels_list
])
for param in self.parameters():
param.requires_grad = False
class BaseNet(nn.Module):
def __init__(self):
super(BaseNet, self).__init__()
# register buffer
self.register_buffer(
'mean', torch.Tensor([-.030, -.088, -.188])[None, :, None, None])
self.register_buffer(
'std', torch.Tensor([.458, .448, .450])[None, :, None, None])
def set_requires_grad(self, state: bool):
for param in chain(self.parameters(), self.buffers()):
param.requires_grad = state
def z_score(self, x: torch.Tensor):
return (x - self.mean) / self.std
def forward(self, x: torch.Tensor):
x = self.z_score(x)
output = []
for i, (_, layer) in enumerate(self.layers._modules.items(), 1):
x = layer(x)
if i in self.target_layers:
output.append(normalize_activation(x))
if len(output) == len(self.target_layers):
break
return output
class SqueezeNet(BaseNet):
def __init__(self):
super(SqueezeNet, self).__init__()
self.layers = models.squeezenet1_1(True).features
self.target_layers = [2, 5, 8, 10, 11, 12, 13]
self.n_channels_list = [64, 128, 256, 384, 384, 512, 512]
self.set_requires_grad(False)
class AlexNet(BaseNet):
def __init__(self):
super(AlexNet, self).__init__()
self.layers = models.alexnet(True).features
self.target_layers = [2, 5, 8, 10, 12]
self.n_channels_list = [64, 192, 384, 256, 256]
self.set_requires_grad(False)
class VGG16(BaseNet):
def __init__(self):
super(VGG16, self).__init__()
self.layers = models.vgg16(weights=models.VGG16_Weights.IMAGENET1K_V1).features
self.target_layers = [4, 9, 16, 23, 30]
self.n_channels_list = [64, 128, 256, 512, 512]
self.set_requires_grad(False)
================================================
FILE: lpipsPyTorch/modules/utils.py
================================================
from collections import OrderedDict
import torch
def normalize_activation(x, eps=1e-10):
norm_factor = torch.sqrt(torch.sum(x ** 2, dim=1, keepdim=True))
return x / (norm_factor + eps)
def get_state_dict(net_type: str = 'alex', version: str = '0.1'):
# build url
url = 'https://raw.githubusercontent.com/richzhang/PerceptualSimilarity/' \
+ f'master/lpips/weights/v{version}/{net_type}.pth'
# download
old_state_dict = torch.hub.load_state_dict_from_url(
url, progress=True,
map_location=None if torch.cuda.is_available() else torch.device('cpu')
)
# rename keys
new_state_dict = OrderedDict()
for key, val in old_state_dict.items():
new_key = key
new_key = new_key.replace('lin', '')
new_key = new_key.replace('model.', '')
new_state_dict[new_key] = val
return new_state_dict
================================================
FILE: metrics.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
from pathlib import Path
import os
from PIL import Image
import torch
import torchvision.transforms.functional as tf
from utils.loss_utils import ssim
from lpipsPyTorch import lpips
import json
from tqdm import tqdm
from utils.image_utils import psnr
from argparse import ArgumentParser
from collections import OrderedDict
def readImages(renders_dir, gt_dir):
renders = []
gts = []
image_names = []
for fname in os.listdir(renders_dir):
render = Image.open(renders_dir / fname)
gt = Image.open(gt_dir / fname)
renders.append(tf.to_tensor(render).unsqueeze(0)[:, :3, :, :].cuda())
gts.append(tf.to_tensor(gt).unsqueeze(0)[:, :3, :, :].cuda())
image_names.append(fname)
return renders, gts, image_names
def evaluate(model_paths, write):
# import ipdb; ipdb.set_trace()
full_dict = {}
per_view_dict = {}
full_dict_polytopeonly = {}
per_view_dict_polytopeonly = {}
print("")
scene_dir = model_paths[0]
print("Scene:", scene_dir)
for splits in ['test', 'train']:
full_dict[splits] = {}
per_view_dict[splits] = {}
dir_path = Path(scene_dir) / splits
for method in os.listdir(dir_path):
print("Method:", method)
full_dict[splits][method] = {}
per_view_dict[splits][method] = {}
method_dir = dir_path / method
gt_dir = method_dir/ "gt"
renders_dir = method_dir / "renders"
renders, gts, image_names = readImages(renders_dir, gt_dir)
ssims = []
psnrs = []
lpipss = []
for idx in tqdm(range(len(renders)), desc="Metric evaluation progress"):
ssims.append(ssim(renders[idx], gts[idx]))
psnrs.append(psnr(renders[idx], gts[idx]))
lpipss.append(lpips(renders[idx], gts[idx], net_type='alex'))
print(" SSIM : {:>12.7f}".format(torch.tensor(ssims).mean(), ".5"))
print(" PSNR : {:>12.7f}".format(torch.tensor(psnrs).mean(), ".5"))
print(" LPIPS: {:>12.7f}".format(torch.tensor(lpipss).mean(), ".5"))
print("")
full_dict[splits][method].update({"SSIM": torch.tensor(ssims).mean().item(),
"PSNR": torch.tensor(psnrs).mean().item(),
"LPIPS": torch.tensor(lpipss).mean().item()})
per_view_dict[splits][method].update({
"SSIM": OrderedDict(sorted({name: ssim for ssim, name in zip(torch.tensor(ssims).tolist(), image_names)}.items())),
"PSNR": OrderedDict(sorted({name: psnr for psnr, name in zip(torch.tensor(psnrs).tolist(), image_names)}.items())),
"LPIPS": OrderedDict(sorted({name: lp for lp, name in zip(torch.tensor(lpipss).tolist(), image_names)}.items()))
})
if write:
with open(scene_dir + "/metric_results.json", 'w') as fp:
json.dump(full_dict, fp, indent=True)
with open(scene_dir + "/per_view.json", 'w') as fp:
json.dump(per_view_dict, fp, indent=True)
if __name__ == "__main__":
device = torch.device("cuda:0")
torch.cuda.set_device(device)
# Set up command line argument parser
parser = ArgumentParser(description="Training script parameters")
parser.add_argument('--model_paths', '-m', required=True, nargs="+", type=str, default=[])
parser.add_argument('--write', action='store_false', default=True)
args = parser.parse_args()
evaluate(args.model_paths, args.write)
================================================
FILE: render.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import torch
from scene import Scene
import os
from tqdm import tqdm
from os import makedirs
from gaussian_renderer import render
import torchvision
from utils.general_utils import safe_state
from argparse import ArgumentParser
from arguments import ModelParams, PipelineParams, get_combined_args
from gaussian_renderer import GaussianModel
import numpy as np
from copy import deepcopy
from torchmetrics.functional import structural_similarity_index_measure as ssim
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
from matplotlib import cm
from utils.semantic_utils import colorize
import flow_vis_torch
from utils.cmap import color_depth_map
from imageio.v2 import imwrite
def to4x4(R, T):
RT = np.eye(4,4)
RT[:3, :3] = R
RT[:3, 3] = T
return RT
def apply_colormap(image, cmap="viridis"):
colormap = cm.get_cmap(cmap)
colormap = torch.tensor(colormap.colors).to(image.device) # type: ignore
image_long = (image * 255).long()
image_long_min = torch.min(image_long)
image_long_max = torch.max(image_long)
assert image_long_min >= 0, f"the min value is {image_long_min}"
assert image_long_max <= 255, f"the max value is {image_long_max}"
return colormap[image_long[0, ...]].permute(2, 0, 1)
def apply_depth_colormap(depth, near_plane=None, far_plane=None, cmap="turbo"):
near_plane = near_plane or float(torch.min(depth))
far_plane = far_plane or float(torch.max(depth))
depth = (depth - near_plane) / (far_plane - near_plane + 1e-10)
depth = torch.clip(depth, 0, 1)
colored_image = apply_colormap(depth, cmap=cmap)
return colored_image
def render_set(model_path, name, iteration, views, scene, pipeline, background, data_type):
render_path = os.path.join(model_path, name, "ours_{}".format(iteration), "renders")
semantic_path = os.path.join(model_path, name, "ours_{}".format(iteration), "semantic")
optical_path = os.path.join(model_path, name, "ours_{}".format(iteration), "optical")
gts_path = os.path.join(model_path, name, "ours_{}".format(iteration), "gt")
error_path = os.path.join(model_path, name, "ours_{}".format(iteration), "error_map")
depth_path = os.path.join(model_path, name, "ours_{}".format(iteration), "depth")
makedirs(render_path, exist_ok=True)
makedirs(semantic_path, exist_ok=True)
makedirs(optical_path, exist_ok=True)
makedirs(gts_path, exist_ok=True)
makedirs(error_path, exist_ok=True)
makedirs(depth_path, exist_ok=True)
for idx, view in enumerate(tqdm(views, desc="Rendering progress")):
if data_type == 'kitti':
gap = 2
elif data_type == 'kitti360':
gap = 4
elif data_type == 'waymo':
gap = 1
elif data_type == 'nuscenes' or data_type == 'pandaset':
gap = 6
if idx - gap < 0:
prev_view = None
else:
prev_view = views[idx-4]
render_pkg = render(
view, prev_view, scene.gaussians, scene.dynamic_gaussians, scene.unicycles, pipeline, background, True
)
rendering = render_pkg['render'].detach().cpu()
semantic = render_pkg['feats'].detach().cpu()
semantic = torch.argmax(semantic, dim=0)
semantic_rgb = colorize(semantic.detach().cpu().numpy())
depth = render_pkg['depth']
color_depth = color_depth_map(depth[0].detach().cpu().numpy())
color_depth[semantic == 10] = np.array([255.0, 255.0, 255.0])
gt = view.original_image[0:3, :, :]
# _, ssim_map = ssim(rendering[None, ...], gt[None, ...], return_full_image=True)
# ssim_map = torch.mean(ssim_map[0], dim=0).clip(0, 1)[None, ...]
# error_map = 1 - ssim_maps
error_map = torch.mean((rendering - gt) ** 2, dim=0)[None, ...]
fig = plt.figure(frameon=False)
fig.set_size_inches(1.408, 0.376)
ax = plt.Axes(fig, [0., 0., 1., 1.])
ax.set_axis_off()
fig.add_axes(ax)
ax.imshow((error_map.detach().cpu().numpy().transpose(1,2,0)), cmap='jet')
plt.savefig(os.path.join(error_path, view.image_name + ".png"), dpi=1000)
plt.close('all')
torchvision.utils.save_image(rendering, os.path.join(render_path, view.image_name + ".png"))
torchvision.utils.save_image(gt, os.path.join(gts_path, view.image_name + ".png"))
semantic_rgb.save(os.path.join(semantic_path, view.image_name + ".png"))
imwrite(os.path.join(depth_path, view.image_name + ".png"), color_depth)
opticalflow = render_pkg["opticalflow"]
opticalflow = opticalflow.permute(1,2,0)
opticalflow = opticalflow[..., :2]
pytorch_optic_rgb = flow_vis_torch.flow_to_color(opticalflow.permute(2, 0, 1)) # (2, h, w)
torchvision.utils.save_image(pytorch_optic_rgb.float(), os.path.join(optical_path, view.image_name + ".png"), normalize=True)
# torchvision.utils.save_image(error_map, os.path.join(error_path, '{0:05d}'.format(idx) + ".png"))
def render_sets(dataset : ModelParams, iteration : int, pipeline : PipelineParams,
skip_train : bool, skip_test : bool, data_type, affine, ignore_dynamic):
with torch.no_grad():
gaussians = GaussianModel(dataset.sh_degree, affine=affine)
scene = Scene(dataset, gaussians, load_iteration=iteration, shuffle=False, data_type=data_type, ignore_dynamic=ignore_dynamic)
bg_color = [1,1,1] if dataset.white_background else [0, 0, 0]
background = torch.tensor(bg_color, dtype=torch.float32, device="cuda")
if not skip_train:
render_set(dataset.model_path, "train", scene.loaded_iter, scene.getTrainCameras(), scene, pipeline, background, data_type)
if not skip_test:
render_set(dataset.model_path, "test", scene.loaded_iter, scene.getTestCameras(), scene, pipeline, background, data_type)
if __name__ == "__main__":
# Set up command line argument parser
parser = ArgumentParser(description="Testing script parameters")
model = ModelParams(parser, sentinel=True)
pipeline = PipelineParams(parser)
parser.add_argument("--iteration", default=-1, type=int)
parser.add_argument("--data_type", default='kitti360', type=str)
parser.add_argument("--affine", action="store_true")
parser.add_argument("--ignore_dynamic", action="store_true")
parser.add_argument("--skip_train", action="store_true")
parser.add_argument("--skip_test", action="store_true")
parser.add_argument("--quiet", action="store_true")
args = get_combined_args(parser)
print("Rendering " + args.model_path)
args.source_path = os.path.join(args.model_path, 'data')
# Initialize system state (RNG)
# safe_state(args.quiet)
render_sets(model.extract(args), args.iteration, pipeline.extract(args),
args.skip_train, args.skip_test, args.data_type, args.affine, args.ignore_dynamic)
================================================
FILE: requirements.txt
================================================
config==0.5.1
datasets==2.19.2
# flow_vis_torch==0.1
imageio==2.34.1
matplotlib==3.9.0
network==0.1
numpy==1.26.4
open3d==0.18.0
opencv_python==4.10.0.82
Pillow==10.3.0
plyfile==1.0.3
# pytorch3d==0.7.4
runx==0.0.11
scipy==1.13.1
setuptools==69.5.1
# torch==2.3.1+cu118
torchmetrics==1.4.0.post0
# torchvision==0.18.1+cu118
tqdm==4.66.4
================================================
FILE: scene/__init__.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import os
import random
import json
from utils.system_utils import searchForMaxIteration
from scene.dataset_readers import sceneLoadTypeCallbacks
from scene.gaussian_model import GaussianModel
from arguments import ModelParams
from utils.camera_utils import cameraList_from_camInfos, camera_to_JSON
import torch
import open3d as o3d
import numpy as np
from utils.dynamic_utils import create_unicycle_model
import shutil
class Scene:
gaussians : GaussianModel
def __init__(self, args : ModelParams, gaussians : GaussianModel, load_iteration=None, shuffle=True,
unicycle=False, uc_fit_iter=0, resolution_scales=[1.0], data_type='kitti360', ignore_dynamic=False):
"""b
:param path: Path to colmap scene main folder.
"""
self.model_path = args.model_path
self.loaded_iter = None
self.gaussians = gaussians
if load_iteration:
if load_iteration == -1:
self.loaded_iter = searchForMaxIteration(os.path.join(self.model_path, "ckpts"))
else:
self.loaded_iter = load_iteration
print("Loading trained model at iteration {}".format(self.loaded_iter))
self.train_cameras = {}
self.test_cameras = {}
if os.path.exists(os.path.join(args.source_path, "sparse")):
# scene_info = sceneLoadTypeCallbacks["Colmap"](args.source_path, args.images, args.eval)
raise NotImplementedError
elif os.path.exists(os.path.join(args.source_path, "transforms_train.json")):
print("Found transforms_train.json file, assuming Blender data set!")
# scene_info = sceneLoadTypeCallbacks["Blender"](args.source_path, args.white_background, args.eval)
raise NotImplementedError
elif os.path.exists(os.path.join(args.source_path, "meta_data.json")):
print("Found meta_data.json file, assuming Studio data set!")
scene_info = sceneLoadTypeCallbacks['Studio'](args.source_path, args.white_background, args.eval, data_type, ignore_dynamic)
else:
assert False, "Could not recognize scene type!"
self.dynamic_verts = scene_info.verts
self.dynamic_gaussians = {}
for track_id in scene_info.verts:
self.dynamic_gaussians[track_id] = GaussianModel(args.sh_degree, feat_mutable=False)
if unicycle:
self.unicycles = create_unicycle_model(scene_info.train_cameras, self.model_path, uc_fit_iter, data_type)
else:
self.unicycles = {}
if not self.loaded_iter:
with open(scene_info.ply_path, 'rb') as src_file, open(os.path.join(self.model_path, "input.ply") , 'wb') as dest_file:
dest_file.write(src_file.read())
json_cams = []
camlist = []
if scene_info.test_cameras:
camlist.extend(scene_info.test_cameras)
if scene_info.train_cameras:
camlist.extend(scene_info.train_cameras)
for id, cam in enumerate(camlist):
json_cams.append(camera_to_JSON(id, cam))
with open(os.path.join(self.model_path, "cameras.json"), 'w') as file:
json.dump(json_cams, file)
shutil.copyfile(os.path.join(args.source_path, 'meta_data.json'), os.path.join(self.model_path, 'meta_data.json'))
if shuffle:
random.shuffle(scene_info.train_cameras) # Multi-res consistent random shuffling
random.shuffle(scene_info.test_cameras) # Multi-res consistent random shuffling
self.cameras_extent = scene_info.nerf_normalization["radius"]
for resolution_scale in resolution_scales:
print("Loading Training Cameras")
self.train_cameras[resolution_scale] = cameraList_from_camInfos(scene_info.train_cameras, resolution_scale, args)
print("Loading Test Cameras")
self.test_cameras[resolution_scale] = cameraList_from_camInfos(scene_info.test_cameras, resolution_scale, args)
if self.loaded_iter:
(model_params, first_iter) = torch.load(os.path.join(self.model_path, "ckpts", f"chkpnt{self.loaded_iter}.pth"))
gaussians.restore(model_params, None)
for iid, dynamic_gaussian in self.dynamic_gaussians.items():
(model_params, first_iter) = torch.load(os.path.join(self.model_path, "ckpts", f"dynamic_{iid}_chkpnt{self.loaded_iter}.pth"))
dynamic_gaussian.restore(model_params, None)
for iid, unicycle_pkg in self.unicycles.items():
model_params = torch.load(os.path.join(self.model_path, "ckpts", f"unicycle_{iid}_chkpnt{self.loaded_iter}.pth"))
unicycle_pkg['model'].restore(model_params)
else:
self.gaussians.create_from_pcd(scene_info.point_cloud, self.cameras_extent)
for track_id in self.dynamic_gaussians.keys():
vertices = scene_info.verts[track_id]
# init from template
l, h, w = vertices[:, 0].max() - vertices[:, 0].min(), vertices[:, 1].max() - vertices[:, 1].min(), vertices[:, 2].max() - vertices[:, 2].min()
pcd = o3d.io.read_point_cloud(f"utils/vehicle_template/benz_{data_type}.ply")
points = np.array(pcd.points) * np.array([l, h, w])
pcd.points = o3d.utility.Vector3dVector(points)
pcd.colors = o3d.utility.Vector3dVector(np.ones_like(points) * 0.5)
self.dynamic_gaussians[track_id].create_from_pcd(pcd, self.cameras_extent)
def save(self, iteration):
# self.gaussians.save_ply(os.path.join(point_cloud_path, "point_cloud.ply"))
point_cloud_vis_path = os.path.join(self.model_path, "point_cloud_vis/iteration_{}".format(iteration))
self.gaussians.save_vis_ply(os.path.join(point_cloud_vis_path, "point.ply"))
for iid, dynamic_gaussian in self.dynamic_gaussians.items():
dynamic_gaussian.save_vis_ply(os.path.join(point_cloud_vis_path, f"dynamic_{iid}.ply"))
def getTrainCameras(self, scale=1.0):
return self.train_cameras[scale]
def getTestCameras(self, scale=1.0):
return self.test_cameras[scale]
================================================
FILE: scene/cameras.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import torch
from torch import nn
import numpy as np
from utils.graphics_utils import getWorld2View2, getProjectionMatrix, fov2focal
from utils.general_utils import decode_op
class Camera(nn.Module):
def __init__(self, colmap_id, R, T, K, FoVx, FoVy, image,
image_name, uid,
trans=np.array([0.0, 0.0, 0.0]), scale=1.0, data_device="cuda",
cx_ratio=None, cy_ratio=None, semantic2d=None, mask=None, timestamp=-1, optical_image=None, dynamics={}
):
super(Camera, self).__init__()
self.uid = uid
self.colmap_id = colmap_id
self.R = R
self.T = T
self.K = K
self.FoVx = FoVx
self.FoVy = FoVy
self.image_name = image_name
self.cx_ratio = cx_ratio
self.cy_ratio = cy_ratio
self.timestamp = timestamp
_, self.H, self.W = image.shape
self.w2c = np.eye(4)
self.w2c[:3, :3] = self.R.T
self.w2c[:3, 3] = self.T
self.c2w = torch.from_numpy(np.linalg.inv(self.w2c)).cuda()
self.fx = fov2focal(self.FoVx, self.W)
self.fy = fov2focal(self.FoVy, self.H)
self.dynamics = dynamics
try:
self.data_device = torch.device(data_device)
except Exception as e:
print(e)
print(f"[Warning] Custom device {data_device} failed, fallback to default cuda device" )
self.data_device = torch.device("cuda")
self.original_image = image.clamp(0.0, 1.0).to(self.data_device)
if semantic2d is not None:
self.semantic2d = semantic2d.to(self.data_device)
else:
self.semantic2d = None
if mask is not None:
self.mask = torch.from_numpy(mask).bool().to(self.data_device)
else:
self.mask = None
self.image_width = self.original_image.shape[2]
self.image_height = self.original_image.shape[1]
if optical_image is not None:
self.optical_gt = torch.from_numpy(optical_image).to(self.data_device)
else:
self.optical_gt = None
self.zfar = 100.0
self.znear = 0.01
self.trans = trans
self.scale = scale
self.world_view_transform = torch.tensor(getWorld2View2(R, T, trans, scale)).transpose(0, 1).cuda()
self.projection_matrix = getProjectionMatrix(znear=self.znear, zfar=self.zfar,
fovX=self.FoVx, fovY=self.FoVy, cx_ratio=cx_ratio, cy_ratio=cy_ratio).transpose(0,1).cuda()
self.full_proj_transform = (self.world_view_transform.unsqueeze(0).bmm(self.projection_matrix.unsqueeze(0))).squeeze(0)
self.camera_center = self.world_view_transform.inverse()[3, :3]
def get_rays(self):
i, j = torch.meshgrid(torch.linspace(0, self.W-1, self.W),
torch.linspace(0, self.H-1, self.H)) # pytorch's meshgrid has indexing='ij'
i = i.t()
j = j.t()
dirs = torch.stack([(i-self.cx_ratio)/self.fx, -(j-self.cy_ratio)/self.fy, -torch.ones_like(i)], -1)
rays_d = torch.sum(dirs[..., np.newaxis, :] * self.c2w[:3,:3], -1).to(self.data_device)
rays_o = self.c2w[:3,-1].expand(rays_d.shape).to(self.data_device)
rays_d = torch.nn.functional.normalize(rays_d, dim=-1)
return rays_o.permute(2,0,1), rays_d.permute(2,0,1)
class MiniCam:
def __init__(self, width, height, fovy, fovx, znear, zfar, world_view_transform, full_proj_transform):
self.image_width = width
self.image_height = height
self.FoVy = fovy
self.FoVx = fovx
self.znear = znear
self.zfar = zfar
self.world_view_transform = world_view_transform
self.full_proj_transform = full_proj_transform
view_inv = torch.inverse(self.world_view_transform)
self.camera_center = view_inv[3][:3]
================================================
FILE: scene/dataset_readers.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import os
import sys
from PIL import Image
from typing import NamedTuple
from utils.graphics_utils import getWorld2View2, focal2fov, fov2focal
import numpy as np
import json
from pathlib import Path
from plyfile import PlyData, PlyElement
from utils.sh_utils import SH2RGB
from scene.gaussian_model import BasicPointCloud
import torch.nn.functional as F
from imageio.v2 import imread
import torch
import random
class CameraInfo(NamedTuple):
uid: int
R: np.array
T: np.array
K: np.array
FovY: np.array
FovX: np.array
image: np.array
image_path: str
image_name: str
width: int
height: int
cx_ratio: float
cy_ratio: float
semantic2d: np.array
optical_image: np.array
mask: np.array
timestamp: int
dynamics: dict
class SceneInfo(NamedTuple):
point_cloud: BasicPointCloud
train_cameras: list
test_cameras: list
nerf_normalization: dict
ply_path: str
verts: dict
def getNerfppNorm(cam_info):
def get_center_and_diag(cam_centers):
cam_centers = np.hstack(cam_centers)
avg_cam_center = np.mean(cam_centers, axis=1, keepdims=True)
center = avg_cam_center
dist = np.linalg.norm(cam_centers - center, axis=0, keepdims=True)
diagonal = np.max(dist)
return center.flatten(), diagonal
cam_centers = []
for cam in cam_info:
W2C = getWorld2View2(cam.R, cam.T)
C2W = np.linalg.inv(W2C)
cam_centers.append(C2W[:3, 3:4]) # cam_centers in world coordinate
center, diagonal = get_center_and_diag(cam_centers)
# radius = diagonal * 1.1 + 30
radius = 10
translate = -center
return {"translate": translate, "radius": radius}
def fetchPly(path):
plydata = PlyData.read(path)
vertices = plydata['vertex']
positions = np.vstack([vertices['x'], vertices['y'], vertices['z']]).T
if 'red' in vertices:
colors = np.vstack([vertices['red'], vertices['green'], vertices['blue']]).T / 255.0
else:
print('Create random colors')
# shs = np.random.random((positions.shape[0], 3)) / 255.0
shs = np.ones((positions.shape[0], 3)) * 0.5
colors = SH2RGB(shs)
# shs = np.ones((positions.shape[0], 3)) * 0.5
# colors = SH2RGB(shs)
normals = np.zeros((positions.shape[0], 3))
return BasicPointCloud(points=positions, colors=colors, normals=normals)
def storePly(path, xyz, rgb):
# Define the dtype for the structured array
dtype = [('x', 'f4'), ('y', 'f4'), ('z', 'f4'),
('nx', 'f4'), ('ny', 'f4'), ('nz', 'f4'),
('red', 'u1'), ('green', 'u1'), ('blue', 'u1')]
normals = np.zeros_like(xyz)
elements = np.empty(xyz.shape[0], dtype=dtype)
attributes = np.concatenate((xyz, normals, rgb), axis=1)
elements[:] = list(map(tuple, attributes))
# Create the PlyData object and write to file
vertex_element = PlyElement.describe(elements, 'vertex')
ply_data = PlyData([vertex_element])
ply_data.write(path)
def readStudioCameras(path, white_background, data_type, ignore_dynamic):
train_cam_infos, test_cam_infos = [], []
with open(os.path.join(path, 'meta_data.json')) as json_file:
meta_data = json.load(json_file)
verts = {}
if 'verts' in meta_data and not ignore_dynamic:
verts_list = meta_data['verts']
for k, v in verts_list.items():
verts[k] = np.array(v)
frames = meta_data['frames']
for idx, frame in enumerate(frames):
matrix = np.linalg.inv(np.array(frame['camtoworld']))
R = matrix[:3, :3]
T = matrix[:3, 3]
R = np.transpose(R)
rgb_path = os.path.join(path, frame['rgb_path'].replace('./', ''))
rgb_split = rgb_path.split('/')
image_name = '_'.join([rgb_split[-2], rgb_split[-1][:-4]])
image = Image.open(rgb_path)
semantic_2d = None
semantic_pth = rgb_path.replace("images", "semantics").replace('.png', '.npy').replace('.jpg', '.npy')
if os.path.exists(semantic_pth):
semantic_2d = np.load(semantic_pth)
semantic_2d[(semantic_2d == 14) | (semantic_2d == 15)] = 13
optical_path = rgb_path.replace("images", "flow").replace('.png', '_flow.npy').replace('.jpg', '_flow.npy')
if os.path.exists(optical_path):
optical_image = np.load(optical_path)
else:
optical_image = None
mask = None
mask_path = rgb_path.replace("images", "masks").replace('.png', '.npy').replace('.jpg', '.npy')
if os.path.exists(mask_path):
mask = np.load(mask_path)
timestamp = frame.get('timestamp', -1)
intrinsic = np.array(frame['intrinsics'])
FovX = focal2fov(intrinsic[0, 0], image.size[0])
FovY = focal2fov(intrinsic[1, 1], image.size[1])
cx, cy = intrinsic[0, 2], intrinsic[1, 2]
w, h = image.size
dynamics = {}
if 'dynamics' in frame and not ignore_dynamic:
dynamics_list = frame['dynamics']
for iid in dynamics_list.keys():
dynamics[iid] = torch.tensor(dynamics_list[iid]).cuda()
cam_info = CameraInfo(uid=idx, R=R, T=T, K=intrinsic, FovY=FovY, FovX=FovX, image=image,
image_path=rgb_path, image_name=image_name, width=image.size[0],
height=image.size[1], cx_ratio=2*cx/w, cy_ratio=2*cy/h, semantic2d=semantic_2d,
optical_image=optical_image, mask=mask, timestamp=timestamp, dynamics=dynamics)
# kitti360
if data_type == 'kitti360':
# if 'cam_2' in cam_info.image_name or 'cam_3' in cam_info.image_name:
# train_cam_infos.append(cam_info)
# # continue
if idx < 20:
train_cam_infos.append(cam_info)
elif idx % 8 < 4:
train_cam_infos.append(cam_info)
elif idx % 8 >= 4:
test_cam_infos.append(cam_info)
else:
continue
elif data_type == 'kitti':
if idx < 10 or idx >= len(frames) - 4:
train_cam_infos.append(cam_info)
elif idx % 4 < 2:
train_cam_infos.append(cam_info)
elif idx % 4 == 2:
test_cam_infos.append(cam_info)
else:
continue
elif data_type == "nuscenes":
if idx < 600 or idx >= 1200:
continue
elif idx % 30 >= 24:
# print('test', cam_info.image_name)
test_cam_infos.append(cam_info)
else:
# print('train', cam_info.image_name)
train_cam_infos.append(cam_info)
elif data_type == "waymo":
if idx > 10 and idx % 10 >= 9:
test_cam_infos.append(cam_info)
else:
train_cam_infos.append(cam_info)
elif data_type == "pandaset":
# if idx >= 360:
# continue
if idx > 30 and idx % 30 >= 24:
test_cam_infos.append(cam_info)
else:
train_cam_infos.append(cam_info)
else:
raise NotImplementedError
return train_cam_infos, test_cam_infos, verts
def readStudioInfo(path, white_background, eval, data_type, ignore_dynamic):
train_cam_infos, test_cam_infos, verts = readStudioCameras(path, white_background, data_type, ignore_dynamic)
print(f'Loaded {len(train_cam_infos)} train cameras and {len(test_cam_infos)} test cameras')
nerf_normalization = getNerfppNorm(train_cam_infos)
ply_path = os.path.join(path, "points3d.ply")
# ply_path = os.path.join(path, 'lidar', 'cat.ply')
if not os.path.exists(ply_path):
# Since this data set has no colmap data, we start with random points
num_pts = 500_000
print(f"Generating random point cloud ({num_pts})...")
# We create random points inside the bounds of the synthetic Blender scenes
AABB = [[-20, -25, -20], [20, 5, 80]]
xyz = np.random.uniform(AABB[0], AABB[1], (500000, 3))
# xyz = np.load(os.path.join(path, 'lidar_point.npy'))
num_pts = xyz.shape[0]
shs = np.ones((num_pts, 3)) * 0.5
pcd = BasicPointCloud(points=xyz, colors=SH2RGB(shs), normals=np.zeros((num_pts, 3)))
storePly(ply_path, xyz, SH2RGB(shs) * 255)
try:
pcd = fetchPly(ply_path)
except Exception as e:
print('When loading point clound, meet error:', e)
exit(0)
scene_info = SceneInfo(point_cloud=pcd,
train_cameras=train_cam_infos,
test_cameras=test_cam_infos,
nerf_normalization=nerf_normalization,
ply_path=ply_path,
verts=verts)
return scene_info
sceneLoadTypeCallbacks = {
"Studio": readStudioInfo,
}
================================================
FILE: scene/gaussian_model.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import torch
import numpy as np
from utils.general_utils import inverse_sigmoid, get_expon_lr_func, build_rotation
from torch import nn
import os
from utils.system_utils import mkdir_p
from plyfile import PlyData, PlyElement
from utils.sh_utils import RGB2SH, SH2RGB
from simple_knn._C import distCUDA2
from utils.graphics_utils import BasicPointCloud
from utils.general_utils import strip_symmetric, build_scaling_rotation
import open3d as o3d
import tinycudann as tcnn
from math import sqrt
class CustomAdam(torch.optim.Optimizer):
def __init__(self, params, lr=0.001, betas=(0.9, 0.999), eps=1e-8):
defaults = dict(lr=lr, betas=betas, eps=eps)
super(CustomAdam, self).__init__(params, defaults)
def step(self, custom_lr=None, name=None):
for group in self.param_groups:
for p in group['params']:
if p.grad is None:
continue
grad = p.grad.data
if grad.is_sparse:
raise RuntimeError('Adam does not support sparse gradients')
state = self.state[p]
# State initialization
if len(state) == 0:
state['step'] = 0
# Exponential moving averages of gradient values
state['exp_avg'] = torch.zeros_like(p.data)
# Exponential moving averages of squared gradient values
state['exp_avg_sq'] = torch.zeros_like(p.data)
exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
beta1, beta2 = group['betas']
# Add op to update moving averages
state['step'] += 1
exp_avg.mul_(beta1).add_(grad, alpha=1.0 - beta1)
exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1.0 - beta2)
denom = exp_avg_sq.sqrt().add_(group['eps'])
bias_correction1 = 1.0 - beta1 ** state['step']
bias_correction2 = 1.0 - beta2 ** state['step']
if (custom_lr is not None) and (name is not None) and (group['name'] in name):
step_size = custom_lr[:, None] * group['lr'] * (sqrt(bias_correction2) / bias_correction1)
else:
step_size = group['lr'] * (sqrt(bias_correction2) / bias_correction1)
p.data -= step_size * exp_avg / denom
class GaussianModel:
def setup_functions(self):
def build_covariance_from_scaling_rotation(scaling, scaling_modifier, rotation):
L = build_scaling_rotation(scaling_modifier * scaling, rotation)
actual_covariance = L @ L.transpose(1, 2)
symm = strip_symmetric(actual_covariance)
return symm
self.scaling_activation = torch.exp
self.scaling_inverse_activation = torch.log
self.covariance_activation = build_covariance_from_scaling_rotation
self.opacity_activation = torch.sigmoid
self.inverse_opacity_activation = inverse_sigmoid
self.rotation_activation = torch.nn.functional.normalize
def __init__(self, sh_degree : int, feat_mutable=True, affine=False):
self.active_sh_degree = 0
self.max_sh_degree = sh_degree
self._xyz = torch.empty(0)
self._features_dc = torch.empty(0)
self._features_rest = torch.empty(0)
self._feats3D = torch.empty(0)
self._scaling = torch.empty(0)
self._rotation = torch.empty(0)
self._opacity = torch.empty(0)
self.max_radii2D = torch.empty(0)
self.xyz_gradient_accum = torch.empty(0)
self.denom = torch.empty(0)
self.optimizer = None
self.percent_dense = 0
self.spatial_lr_scale = 0
self.feat_mutable = feat_mutable
self.setup_functions()
self.pos_enc = tcnn.Encoding(
n_input_dims=3,
encoding_config={"otype": "Frequency", "n_frequencies": 2},
)
self.dir_enc = tcnn.Encoding(
n_input_dims=3,
encoding_config={
"otype": "SphericalHarmonics",
"degree": 3,
},
)
self.affine = affine
if affine:
self.appearance_model = tcnn.Network(
n_input_dims=self.pos_enc.n_output_dims + self.dir_enc.n_output_dims,
n_output_dims=12,
network_config={
"otype": "FullyFusedMLP",
"activation": "ReLU",
"output_activation": "None",
"n_neurons": 32,
"n_hidden_layers": 2,
}
)
else:
self.appearance_model = None
def capture(self):
return (
self.active_sh_degree,
self._xyz,
self._features_dc,
self._features_rest,
self._feats3D,
self._scaling,
self._rotation,
self._opacity,
self.max_radii2D,
self.xyz_gradient_accum,
self.denom,
self.optimizer.state_dict(),
self.spatial_lr_scale,
self.appearance_model,
)
def restore(self, model_args, training_args):
(self.active_sh_degree,
self._xyz,
self._features_dc,
self._features_rest,
self._feats3D,
self._scaling,
self._rotation,
self._opacity,
self.max_radii2D,
xyz_gradient_accum,
denom,
opt_dict,
self.spatial_lr_scale,
self.appearance_model,) = model_args
self.xyz_gradient_accum = xyz_gradient_accum
self.denom = denom
if training_args is not None:
self.training_setup(training_args)
self.optimizer.load_state_dict(opt_dict)
@property
def get_scaling(self):
return self.scaling_activation(self._scaling)
@property
def get_rotation(self):
return self.rotation_activation(self._rotation)
# TODO add get_xyz for dynamic car
@property
def get_xyz(self):
return self._xyz
@property
def get_features(self):
features_dc = self._features_dc
features_rest = self._features_rest
return torch.cat((features_dc, features_rest), dim=1)
@property
def get_3D_features(self):
return torch.softmax(self._feats3D, dim=-1)
@property
def get_opacity(self):
return self.opacity_activation(self._opacity)
def get_covariance(self, scaling_modifier = 1):
return self.covariance_activation(self.get_scaling, scaling_modifier, self._rotation)
def oneupSHdegree(self):
if self.active_sh_degree < self.max_sh_degree:
self.active_sh_degree += 1
def create_from_pcd(self, pcd : BasicPointCloud, spatial_lr_scale : float):
# self.spatial_lr_scale = 1
self.spatial_lr_scale = spatial_lr_scale
fused_point_cloud = torch.tensor(np.asarray(pcd.points)).float().cuda()
fused_color = RGB2SH(torch.tensor(np.asarray(pcd.colors)).float().cuda())
features = torch.zeros((fused_color.shape[0], 3, (self.max_sh_degree + 1) ** 2)).float().cuda()
features[:, :3, 0 ] = fused_color
features[:, 3:, 1:] = 0.0
if self.feat_mutable:
feats3D = torch.zeros(fused_color.shape[0], 20).float().cuda()
self._feats3D = nn.Parameter(feats3D.requires_grad_(True))
else:
feats3D = torch.zeros(fused_color.shape[0], 20).float().cuda()
feats3D[:, 13] = 1
self._feats3D = nn.Parameter(feats3D.requires_grad_(True))
print("Number of points at initialisation : ", fused_point_cloud.shape[0])
dist2 = torch.clamp_min(distCUDA2(torch.from_numpy(np.asarray(pcd.points)).float().cuda()), 0.0000001)
scales = torch.log(torch.sqrt(dist2))[...,None].repeat(1, 3)
rots = torch.zeros((fused_point_cloud.shape[0], 4), device="cuda")
rots[:, 0] = 1
opacities = inverse_sigmoid(0.1 * torch.ones((fused_point_cloud.shape[0], 1), dtype=torch.float, device="cuda"))
self._xyz = nn.Parameter(fused_point_cloud.requires_grad_(True))
self._features_dc = nn.Parameter(features[:,:,0:1].transpose(1, 2).contiguous().requires_grad_(True))
self._features_rest = nn.Parameter(features[:,:,1:].transpose(1, 2).contiguous().requires_grad_(True))
self._scaling = nn.Parameter(scales.requires_grad_(True))
self._rotation = nn.Parameter(rots.requires_grad_(True))
self._opacity = nn.Parameter(opacities.requires_grad_(True))
self.max_radii2D = torch.zeros((self.get_xyz.shape[0]), device="cuda")
def training_setup(self, training_args):
self.percent_dense = training_args.percent_dense
self.xyz_gradient_accum = torch.zeros((self.get_xyz.shape[0], 1), device="cuda")
self.denom = torch.zeros((self.get_xyz.shape[0], 1), device="cuda")
# self.spatial_lr_scale /= 3
l = [
{'params': [self._xyz], 'lr': training_args.position_lr_init*self.spatial_lr_scale, "name": "xyz"},
{'params': [self._features_dc], 'lr': training_args.feature_lr, "name": "f_dc"},
{'params': [self._features_rest], 'lr': training_args.feature_lr / 20.0, "name": "f_rest"},
{'params': [self._opacity], 'lr': training_args.opacity_lr, "name": "opacity"},
{'params': [self._scaling], 'lr': training_args.scaling_lr*self.spatial_lr_scale, "name": "scaling"},
{'params': [self._rotation], 'lr': training_args.rotation_lr, "name": "rotation"},
]
if self.affine:
l.append({'params': [*self.appearance_model.parameters()], 'lr': 1e-3, "name": "appearance_model"})
if self.feat_mutable:
l.append({'params': [self._feats3D], 'lr': 1e-2, "name": "feats3D"})
self.optimizer = torch.optim.Adam(l, lr=0.0, eps=1e-15)
# self.optimizer = CustomAdam(l, lr=0.0, eps=1e-15)
self.xyz_scheduler_args = get_expon_lr_func(lr_init=training_args.position_lr_init*self.spatial_lr_scale,
lr_final=training_args.position_lr_final*self.spatial_lr_scale,
lr_delay_mult=training_args.position_lr_delay_mult,
max_steps=training_args.position_lr_max_steps)
def update_learning_rate(self, iteration):
''' Learning rate scheduling per step '''
for param_group in self.optimizer.param_groups:
if param_group["name"] == "xyz":
lr = self.xyz_scheduler_args(iteration)
param_group['lr'] = lr
return lr
def construct_list_of_attributes(self):
l = ['x', 'y', 'z', 'nx', 'ny', 'nz']
# All channels except the 3 DC
for i in range(self._features_dc.shape[1]*self._features_dc.shape[2]):
l.append('f_dc_{}'.format(i))
for i in range(self._features_rest.shape[1]*self._features_rest.shape[2]):
l.append('f_rest_{}'.format(i))
for i in range(self._feats3D.shape[1]):
l.append('semantic_{}'.format(i))
l.append('opacity')
for i in range(self._scaling.shape[1]):
l.append('scale_{}'.format(i))
for i in range(self._rotation.shape[1]):
l.append('rot_{}'.format(i))
return l
def save_ply(self, path):
mkdir_p(os.path.dirname(path))
xyz = self._xyz.detach().cpu().numpy()
normals = np.zeros_like(xyz)
f_dc = self._features_dc.detach().transpose(1, 2).flatten(start_dim=1).contiguous().cpu().numpy()
f_rest = self._features_rest.detach().transpose(1, 2).flatten(start_dim=1).contiguous().cpu().numpy()
feats3D = self._feats3D.detach().cpu().numpy()
opacities = self._opacity.detach().cpu().numpy()
scale = self._scaling.detach().cpu().numpy()
rotation = self._rotation.detach().cpu().numpy()
dtype_full = [(attribute, 'f4') for attribute in self.construct_list_of_attributes()]
elements = np.empty(xyz.shape[0], dtype=dtype_full)
attributes = np.concatenate((xyz, normals, f_dc, f_rest, feats3D, opacities, scale, rotation), axis=1)
elements[:] = list(map(tuple, attributes))
el = PlyElement.describe(elements, 'vertex')
PlyData([el]).write(path)
def save_vis_ply(self, path):
mkdir_p(os.path.dirname(path))
xyz = self.get_xyz.detach().cpu().numpy()
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(xyz)
colors = SH2RGB(self._features_dc[:, 0, :].detach().cpu().numpy()).clip(0, 1)
pcd.colors = o3d.utility.Vector3dVector(colors)
o3d.io.write_point_cloud(path, pcd)
def reset_opacity(self):
opacities_new = inverse_sigmoid(torch.min(self.get_opacity, torch.ones_like(self.get_opacity)*0.01))
optimizable_tensors = self.replace_tensor_to_optimizer(opacities_new, "opacity")
self._opacity = optimizable_tensors["opacity"]
def load_ply(self, path):
plydata = PlyData.read(path)
xyz = np.stack((np.asarray(plydata.elements[0]["x"]),
np.asarray(plydata.elements[0]["y"]),
np.asarray(plydata.elements[0]["z"])), axis=1)
opacities = np.asarray(plydata.elements[0]["opacity"])[..., np.newaxis]
features_dc = np.zeros((xyz.shape[0], 3, 1))
features_dc[:, 0, 0] = np.asarray(plydata.elements[0]["f_dc_0"])
features_dc[:, 1, 0] = np.asarray(plydata.elements[0]["f_dc_1"])
features_dc[:, 2, 0] = np.asarray(plydata.elements[0]["f_dc_2"])
extra_f_names = [p.name for p in plydata.elements[0].properties if p.name.startswith("f_rest_")]
assert len(extra_f_names)==3*(self.max_sh_degree + 1) ** 2 - 3
features_extra = np.zeros((xyz.shape[0], len(extra_f_names)))
for idx, attr_name in enumerate(extra_f_names):
features_extra[:, idx] = np.asarray(plydata.elements[0][attr_name])
# Reshape (P,F*SH_coeffs) to (P, F, SH_coeffs except DC)
features_extra = features_extra.reshape((features_extra.shape[0], 3, (self.max_sh_degree + 1) ** 2 - 1))
scale_names = [p.name for p in plydata.elements[0].properties if p.name.startswith("scale_")]
scales = np.zeros((xyz.shape[0], len(scale_names)))
for idx, attr_name in enumerate(scale_names):
scales[:, idx] = np.asarray(plydata.elements[0][attr_name])
rot_names = [p.name for p in plydata.elements[0].properties if p.name.startswith("rot")]
rots = np.zeros((xyz.shape[0], len(rot_names)))
for idx, attr_name in enumerate(rot_names):
rots[:, idx] = np.asarray(plydata.elements[0][attr_name])
self._xyz = nn.Parameter(torch.tensor(xyz, dtype=torch.float, device="cuda").requires_grad_(True))
self._features_dc = nn.Parameter(torch.tensor(features_dc, dtype=torch.float, device="cuda").transpose(1, 2).contiguous().requires_grad_(True))
self._features_rest = nn.Parameter(torch.tensor(features_extra, dtype=torch.float, device="cuda").transpose(1, 2).contiguous().requires_grad_(True))
self._opacity = nn.Parameter(torch.tensor(opacities, dtype=torch.float, device="cuda").requires_grad_(True))
self._scaling = nn.Parameter(torch.tensor(scales, dtype=torch.float, device="cuda").requires_grad_(True))
self._rotation = nn.Parameter(torch.tensor(rots, dtype=torch.float, device="cuda").requires_grad_(True))
self.active_sh_degree = self.max_sh_degree
def replace_tensor_to_optimizer(self, tensor, name):
optimizable_tensors = {}
for group in self.optimizer.param_groups:
if group["name"] == name:
stored_state = self.optimizer.state.get(group['params'][0], None)
stored_state["exp_avg"] = torch.zeros_like(tensor)
stored_state["exp_avg_sq"] = torch.zeros_like(tensor)
del self.optimizer.state[group['params'][0]]
group["params"][0] = nn.Parameter(tensor.requires_grad_(True))
self.optimizer.state[group['params'][0]] = stored_state
optimizable_tensors[group["name"]] = group["params"][0]
return optimizable_tensors
def _prune_optimizer(self, mask):
optimizable_tensors = {}
for group in self.optimizer.param_groups:
if group['name'] == 'appearance_model':
continue
stored_state = self.optimizer.state.get(group['params'][0], None)
if stored_state is not None:
stored_state["exp_avg"] = stored_state["exp_avg"][mask]
stored_state["exp_avg_sq"] = stored_state["exp_avg_sq"][mask]
del self.optimizer.state[group['params'][0]]
group["params"][0] = nn.Parameter((group["params"][0][mask].requires_grad_(True)))
self.optimizer.state[group['params'][0]] = stored_state
optimizable_tensors[group["name"]] = group["params"][0]
else:
group["params"][0] = nn.Parameter(group["params"][0][mask].requires_grad_(True))
optimizable_tensors[group["name"]] = group["params"][0]
return optimizable_tensors
def prune_points(self, mask):
valid_points_mask = ~mask
optimizable_tensors = self._prune_optimizer(valid_points_mask)
self._xyz = optimizable_tensors["xyz"]
self._features_dc = optimizable_tensors["f_dc"]
self._features_rest = optimizable_tensors["f_rest"]
if self.feat_mutable:
self._feats3D = optimizable_tensors["feats3D"]
else:
self._feats3D = self._feats3D[1, :].repeat((self._xyz.shape[0], 1))
self._opacity = optimizable_tensors["opacity"]
self._scaling = optimizable_tensors["scaling"]
self._rotation = optimizable_tensors["rotation"]
self.xyz_gradient_accum = self.xyz_gradient_accum[valid_points_mask]
self.denom = self.denom[valid_points_mask]
self.max_radii2D = self.max_radii2D[valid_points_mask]
def cat_tensors_to_optimizer(self, tensors_dict):
optimizable_tensors = {}
for group in self.optimizer.param_groups:
if group['name'] not in tensors_dict:
continue
assert len(group["params"]) == 1
extension_tensor = tensors_dict[group["name"]]
stored_state = self.optimizer.state.get(group["params"][0], None)
if stored_state is not None:
stored_state["exp_avg"] = torch.cat((stored_state["exp_avg"], torch.zeros_like(extension_tensor)), dim=0)
stored_state["exp_avg_sq"] = torch.cat((stored_state["exp_avg_sq"], torch.zeros_like(extension_tensor)), dim=0)
del self.optimizer.state[group["params"][0]]
group["params"][0] = nn.Parameter(torch.cat((group["params"][0], extension_tensor), dim=0).requires_grad_(True))
self.optimizer.state[group["params"][0]] = stored_state
optimizable_tensors[group["name"]] = group["params"][0]
else:
group["params"][0] = nn.Parameter(torch.cat((group["params"][0], extension_tensor), dim=0).requires_grad_(True))
optimizable_tensors[group["name"]] = group["params"][0]
return optimizable_tensors
def densification_postfix(self, new_xyz, new_features_dc, new_features_rest, new_feats3D, new_opacities, new_scaling, new_rotation):
d = {"xyz": new_xyz,
"f_dc": new_features_dc,
"f_rest": new_features_rest,
"feats3D": new_feats3D,
"opacity": new_opacities,
"scaling" : new_scaling,
"rotation" : new_rotation}
optimizable_tensors = self.cat_tensors_to_optimizer(d)
self._xyz = optimizable_tensors["xyz"]
self._features_dc = optimizable_tensors["f_dc"]
if self.feat_mutable:
self._feats3D = optimizable_tensors["feats3D"]
else:
self._feats3D = self._feats3D[1, :].repeat((self._xyz.shape[0], 1))
self._features_rest = optimizable_tensors["f_rest"]
self._opacity = optimizable_tensors["opacity"]
self._scaling = optimizable_tensors["scaling"]
self._rotation = optimizable_tensors["rotation"]
self.xyz_gradient_accum = torch.zeros((self.get_xyz.shape[0], 1), device="cuda")
self.denom = torch.zeros((self.get_xyz.shape[0], 1), device="cuda")
self.max_radii2D = torch.zeros((self.get_xyz.shape[0]), device="cuda")
def densify_and_split(self, grads, grad_threshold, scene_extent, N=2):
n_init_points = self.get_xyz.shape[0]
# Extract points that satisfy the gradient condition
padded_grad = torch.zeros((n_init_points), device="cuda")
padded_grad[:grads.shape[0]] = grads.squeeze()
selected_pts_mask = torch.where(padded_grad >= grad_threshold, True, False)
selected_pts_mask = torch.logical_and(selected_pts_mask,
torch.max(self.get_scaling, dim=1).values > self.percent_dense*scene_extent)
stds = self.get_scaling[selected_pts_mask].repeat(N,1)
means =torch.zeros((stds.size(0), 3),device="cuda")
samples = torch.normal(mean=means, std=stds)
rots = build_rotation(self._rotation[selected_pts_mask]).repeat(N,1,1)
new_xyz = torch.bmm(rots, samples.unsqueeze(-1)).squeeze(-1) + self.get_xyz[selected_pts_mask].repeat(N, 1)
new_scaling = self.scaling_inverse_activation(self.get_scaling[selected_pts_mask].repeat(N,1) / (0.8*N))
new_rotation = self._rotation[selected_pts_mask].repeat(N,1)
new_features_dc = self._features_dc[selected_pts_mask].repeat(N,1,1)
new_features_rest = self._features_rest[selected_pts_mask].repeat(N,1,1)
new_feats3D = self._feats3D[selected_pts_mask].repeat(N,1)
new_opacity = self._opacity[selected_pts_mask].repeat(N,1)
self.densification_postfix(new_xyz, new_features_dc, new_features_rest, new_feats3D, new_opacity, new_scaling, new_rotation)
prune_filter = torch.cat((selected_pts_mask, torch.zeros(N * selected_pts_mask.sum(), device="cuda", dtype=bool)))
self.prune_points(prune_filter)
def densify_and_clone(self, grads, grad_threshold, scene_extent):
# Extract points that satisfy the gradient condition
selected_pts_mask = torch.where(torch.norm(grads, dim=-1) >= grad_threshold, True, False)
selected_pts_mask = torch.logical_and(selected_pts_mask,
torch.max(self.get_scaling, dim=1).values <= self.percent_dense*scene_extent)
new_xyz = self._xyz[selected_pts_mask]
new_features_dc = self._features_dc[selected_pts_mask]
new_features_rest = self._features_rest[selected_pts_mask]
new_feats3D = self._feats3D[selected_pts_mask]
new_opacities = self._opacity[selected_pts_mask]
new_scaling = self._scaling[selected_pts_mask]
new_rotation = self._rotation[selected_pts_mask]
self.densification_postfix(new_xyz, new_features_dc, new_features_rest, new_feats3D, new_opacities, new_scaling, new_rotation)
def densify_and_prune(self, max_grad, min_opacity, extent, max_screen_size):
grads = self.xyz_gradient_accum / self.denom
grads[grads.isnan()] = 0.0
self.densify_and_clone(grads, max_grad, extent)
self.densify_and_split(grads, max_grad, extent)
prune_mask = (self.get_opacity < min_opacity).squeeze()
if max_screen_size:
big_points_vs = self.max_radii2D > max_screen_size
big_points_ws = self.get_scaling.max(dim=1).values > 0.1 * extent * 10
prune_mask = torch.logical_or(torch.logical_or(prune_mask, big_points_vs), big_points_ws)
self.prune_points(prune_mask)
torch.cuda.empty_cache()
def add_densification_stats(self, viewspace_point_tensor, update_filter):
self.xyz_gradient_accum[update_filter] += torch.norm(viewspace_point_tensor.grad[update_filter,:2], dim=-1, keepdim=True)
self.denom[update_filter] += 1
def add_densification_stats_grad(self, tensor_grad, update_filter):
self.xyz_gradient_accum[update_filter] += torch.norm(tensor_grad[update_filter,:2], dim=-1, keepdim=True)
self.denom[update_filter] += 1
================================================
FILE: submodules/simple-knn/ext.cpp
================================================
/*
* Copyright (C) 2023, Inria
* GRAPHDECO research group, https://team.inria.fr/graphdeco
* All rights reserved.
*
* This software is free for non-commercial, research and evaluation use
* under the terms of the LICENSE.md file.
*
* For inquiries contact george.drettakis@inria.fr
*/
#include <torch/extension.h>
#include "spatial.h"
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
m.def("distCUDA2", &distCUDA2);
}
================================================
FILE: submodules/simple-knn/setup.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
from setuptools import setup
from torch.utils.cpp_extension import CUDAExtension, BuildExtension
import os
cxx_compiler_flags = []
if os.name == 'nt':
cxx_compiler_flags.append("/wd4624")
setup(
name="simple_knn",
ext_modules=[
CUDAExtension(
name="simple_knn._C",
sources=[
"spatial.cu",
"simple_knn.cu",
"ext.cpp"],
extra_compile_args={"nvcc": [], "cxx": cxx_compiler_flags})
],
cmdclass={
'build_ext': BuildExtension
}
)
================================================
FILE: submodules/simple-knn/simple_knn/.gitkeep
================================================
================================================
FILE: submodules/simple-knn/simple_knn.cu
================================================
/*
* Copyright (C) 2023, Inria
* GRAPHDECO research group, https://team.inria.fr/graphdeco
* All rights reserved.
*
* This software is free for non-commercial, research and evaluation use
* under the terms of the LICENSE.md file.
*
* For inquiries contact george.drettakis@inria.fr
*/
#define BOX_SIZE 1024
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include "simple_knn.h"
#include <cub/cub.cuh>
#include <cub/device/device_radix_sort.cuh>
#include <vector>
#include <cuda_runtime_api.h>
#include <thrust/device_vector.h>
#include <thrust/sequence.h>
#define __CUDACC__
#include <cooperative_groups.h>
#include <cooperative_groups/reduce.h>
namespace cg = cooperative_groups;
struct CustomMin
{
__device__ __forceinline__
float3 operator()(const float3& a, const float3& b) const {
return { min(a.x, b.x), min(a.y, b.y), min(a.z, b.z) };
}
};
struct CustomMax
{
__device__ __forceinline__
float3 operator()(const float3& a, const float3& b) const {
return { max(a.x, b.x), max(a.y, b.y), max(a.z, b.z) };
}
};
__host__ __device__ uint32_t prepMorton(uint32_t x)
{
x = (x | (x << 16)) & 0x030000FF;
x = (x | (x << 8)) & 0x0300F00F;
x = (x | (x << 4)) & 0x030C30C3;
x = (x | (x << 2)) & 0x09249249;
return x;
}
__host__ __device__ uint32_t coord2Morton(float3 coord, float3 minn, float3 maxx)
{
uint32_t x = prepMorton(((coord.x - minn.x) / (maxx.x - minn.x)) * ((1 << 10) - 1));
uint32_t y = prepMorton(((coord.y - minn.y) / (maxx.y - minn.y)) * ((1 << 10) - 1));
uint32_t z = prepMorton(((coord.z - minn.z) / (maxx.z - minn.z)) * ((1 << 10) - 1));
return x | (y << 1) | (z << 2);
}
__global__ void coord2Morton(int P, const float3* points, float3 minn, float3 maxx, uint32_t* codes)
{
auto idx = cg::this_grid().thread_rank();
if (idx >= P)
return;
codes[idx] = coord2Morton(points[idx], minn, maxx);
}
struct MinMax
{
float3 minn;
float3 maxx;
};
__global__ void boxMinMax(uint32_t P, float3* points, uint32_t* indices, MinMax* boxes)
{
auto idx = cg::this_grid().thread_rank();
MinMax me;
if (idx < P)
{
me.minn = points[indices[idx]];
me.maxx = points[indices[idx]];
}
else
{
me.minn = { FLT_MAX, FLT_MAX, FLT_MAX };
me.maxx = { -FLT_MAX,-FLT_MAX,-FLT_MAX };
}
__shared__ MinMax redResult[BOX_SIZE];
for (int off = BOX_SIZE / 2; off >= 1; off /= 2)
{
if (threadIdx.x < 2 * off)
redResult[threadIdx.x] = me;
__syncthreads();
if (threadIdx.x < off)
{
MinMax other = redResult[threadIdx.x + off];
me.minn.x = min(me.minn.x, other.minn.x);
me.minn.y = min(me.minn.y, other.minn.y);
me.minn.z = min(me.minn.z, other.minn.z);
me.maxx.x = max(me.maxx.x, other.maxx.x);
me.maxx.y = max(me.maxx.y, other.maxx.y);
me.maxx.z = max(me.maxx.z, other.maxx.z);
}
__syncthreads();
}
if (threadIdx.x == 0)
boxes[blockIdx.x] = me;
}
__device__ __host__ float distBoxPoint(const MinMax& box, const float3& p)
{
float3 diff = { 0, 0, 0 };
if (p.x < box.minn.x || p.x > box.maxx.x)
diff.x = min(abs(p.x - box.minn.x), abs(p.x - box.maxx.x));
if (p.y < box.minn.y || p.y > box.maxx.y)
diff.y = min(abs(p.y - box.minn.y), abs(p.y - box.maxx.y));
if (p.z < box.minn.z || p.z > box.maxx.z)
diff.z = min(abs(p.z - box.minn.z), abs(p.z - box.maxx.z));
return diff.x * diff.x + diff.y * diff.y + diff.z * diff.z;
}
template<int K>
__device__ void updateKBest(const float3& ref, const float3& point, float* knn)
{
float3 d = { point.x - ref.x, point.y - ref.y, point.z - ref.z };
float dist = d.x * d.x + d.y * d.y + d.z * d.z;
for (int j = 0; j < K; j++)
{
if (knn[j] > dist)
{
float t = knn[j];
knn[j] = dist;
dist = t;
}
}
}
__global__ void boxMeanDist(uint32_t P, float3* points, uint32_t* indices, MinMax* boxes, float* dists)
{
int idx = cg::this_grid().thread_rank();
if (idx >= P)
return;
float3 point = points[indices[idx]];
float best[3] = { FLT_MAX, FLT_MAX, FLT_MAX };
for (int i = max(0, idx - 3); i <= min(P - 1, idx + 3); i++)
{
if (i == idx)
continue;
updateKBest<3>(point, points[indices[i]], best);
}
float reject = best[2];
best[0] = FLT_MAX;
best[1] = FLT_MAX;
best[2] = FLT_MAX;
for (int b = 0; b < (P + BOX_SIZE - 1) / BOX_SIZE; b++)
{
MinMax box = boxes[b];
float dist = distBoxPoint(box, point);
if (dist > reject || dist > best[2])
continue;
for (int i = b * BOX_SIZE; i < min(P, (b + 1) * BOX_SIZE); i++)
{
if (i == idx)
continue;
updateKBest<3>(point, points[indices[i]], best);
}
}
dists[indices[idx]] = (best[0] + best[1] + best[2]) / 3.0f;
}
void SimpleKNN::knn(int P, float3* points, float* meanDists)
{
float3* result;
cudaMalloc(&result, sizeof(float3));
size_t temp_storage_bytes;
float3 init = { 0, 0, 0 }, minn, maxx;
cub::DeviceReduce::Reduce(nullptr, temp_storage_bytes, points, result, P, CustomMin(), init);
thrust::device_vector<char> temp_storage(temp_storage_bytes);
cub::DeviceReduce::Reduce(temp_storage.data().get(), temp_storage_bytes, points, result, P, CustomMin(), init);
cudaMemcpy(&minn, result, sizeof(float3), cudaMemcpyDeviceToHost);
cub::DeviceReduce::Reduce(temp_storage.data().get(), temp_storage_bytes, points, result, P, CustomMax(), init);
cudaMemcpy(&maxx, result, sizeof(float3), cudaMemcpyDeviceToHost);
thrust::device_vector<uint32_t> morton(P);
thrust::device_vector<uint32_t> morton_sorted(P);
coord2Morton << <(P + 255) / 256, 256 >> > (P, points, minn, maxx, morton.data().get());
thrust::device_vector<uint32_t> indices(P);
thrust::sequence(indices.begin(), indices.end());
thrust::device_vector<uint32_t> indices_sorted(P);
cub::DeviceRadixSort::SortPairs(nullptr, temp_storage_bytes, morton.data().get(), morton_sorted.data().get(), indices.data().get(), indices_sorted.data().get(), P);
temp_storage.resize(temp_storage_bytes);
cub::DeviceRadixSort::SortPairs(temp_storage.data().get(), temp_storage_bytes, morton.data().get(), morton_sorted.data().get(), indices.data().get(), indices_sorted.data().get(), P);
uint32_t num_boxes = (P + BOX_SIZE - 1) / BOX_SIZE;
thrust::device_vector<MinMax> boxes(num_boxes);
boxMinMax << <num_boxes, BOX_SIZE >> > (P, points, indices_sorted.data().get(), boxes.data().get());
boxMeanDist << <num_boxes, BOX_SIZE >> > (P, points, indices_sorted.data().get(), boxes.data().get(), meanDists);
cudaFree(result);
}
================================================
FILE: submodules/simple-knn/simple_knn.h
================================================
/*
* Copyright (C) 2023, Inria
* GRAPHDECO research group, https://team.inria.fr/graphdeco
* All rights reserved.
*
* This software is free for non-commercial, research and evaluation use
* under the terms of the LICENSE.md file.
*
* For inquiries contact george.drettakis@inria.fr
*/
#ifndef SIMPLEKNN_H_INCLUDED
#define SIMPLEKNN_H_INCLUDED
class SimpleKNN
{
public:
static void knn(int P, float3* points, float* meanDists);
};
#endif
================================================
FILE: submodules/simple-knn/spatial.cu
================================================
/*
* Copyright (C) 2023, Inria
* GRAPHDECO research group, https://team.inria.fr/graphdeco
* All rights reserved.
*
* This software is free for non-commercial, research and evaluation use
* under the terms of the LICENSE.md file.
*
* For inquiries contact george.drettakis@inria.fr
*/
#include "spatial.h"
#include "simple_knn.h"
torch::Tensor
distCUDA2(const torch::Tensor& points)
{
const int P = points.size(0);
auto float_opts = points.options().dtype(torch::kFloat32);
torch::Tensor means = torch::full({P}, 0.0, float_opts);
SimpleKNN::knn(P, (float3*)points.contiguous().data<float>(), means.contiguous().data<float>());
return means;
}
================================================
FILE: submodules/simple-knn/spatial.h
================================================
/*
* Copyright (C) 2023, Inria
* GRAPHDECO research group, https://team.inria.fr/graphdeco
* All rights reserved.
*
* This software is free for non-commercial, research and evaluation use
* under the terms of the LICENSE.md file.
*
* For inquiries contact george.drettakis@inria.fr
*/
#include <torch/extension.h>
torch::Tensor distCUDA2(const torch::Tensor& points);
================================================
FILE: utils/camera_utils.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
from scene.cameras import Camera
import numpy as np
from utils.general_utils import PILtoTorch, PIL2toTorch
from utils.graphics_utils import fov2focal
import torch
WARNED = False
def loadCam(args, id, cam_info, resolution_scale):
orig_w, orig_h = cam_info.image.size
if args.resolution in [1, 2, 4, 8]:
resolution = round(orig_w/(resolution_scale * args.resolution)), round(orig_h/(resolution_scale * args.resolution))
else: # should be a type that converts to float
if args.resolution == -1:
if orig_w > 1600:
global WARNED
if not WARNED:
print("[ INFO ] Encountered quite large input images (>1.6K pixels width), rescaling to 1.6K.\n "
"If this is not desired, please explicitly specify '--resolution/-r' as 1")
WARNED = True
global_down = orig_w / 1600
else:
global_down = 1
else:
global_down = orig_w / args.resolution
scale = float(global_down) * float(resolution_scale)
resolution = (int(orig_w / scale), int(orig_h / scale))
resized_image_rgb = PILtoTorch(cam_info.image, resolution)
if cam_info.semantic2d is not None:
semantic2d = torch.from_numpy(cam_info.semantic2d).long()[None, ...]
else:
semantic2d = None
optical_image = cam_info.optical_image
mask = cam_info.mask
gt_image = resized_image_rgb[:3, ...]
return Camera(colmap_id=cam_info.uid, R=cam_info.R, T=cam_info.T, K=cam_info.K,
FoVx=cam_info.FovX, FoVy=cam_info.FovY,
image=gt_image, image_name=cam_info.image_name, uid=id, data_device=args.data_device,
cx_ratio=cam_info.cx_ratio, cy_ratio=cam_info.cy_ratio, semantic2d=semantic2d, mask=mask,
timestamp=cam_info.timestamp, optical_image=optical_image, dynamics=cam_info.dynamics)
def cameraList_from_camInfos(cam_infos, resolution_scale, args):
camera_list = []
for id, c in enumerate(cam_infos):
camera_list.append(loadCam(args, id, c, resolution_scale))
return camera_list
def camera_to_JSON(id, camera : Camera):
Rt = np.zeros((4, 4))
Rt[:3, :3] = camera.R.transpose()
Rt[:3, 3] = camera.T
Rt[3, 3] = 1.0
W2C = np.linalg.inv(Rt)
pos = W2C[:3, 3]
rot = W2C[:3, :3]
serializable_array_2d = [x.tolist() for x in rot]
camera_entry = {
'id' : id,
'img_name' : camera.image_name,
'width' : camera.width,
'height' : camera.height,
'position': pos.tolist(),
'rotation': serializable_array_2d,
'fy' : fov2focal(camera.FovY, camera.height),
'fx' : fov2focal(camera.FovX, camera.width),
}
return camera_entry
================================================
FILE: utils/cmap.py
================================================
import numpy as np
_color_map_errors = np.array([
[149, 54, 49], #0: log2(x) = -infinity
[180, 117, 69], #0.0625: log2(x) = -4
[209, 173, 116], #0.125: log2(x) = -3
[233, 217, 171], #0.25: log2(x) = -2
[248, 243, 224], #0.5: log2(x) = -1
[144, 224, 254], #1.0: log2(x) = 0
[97, 174, 253], #2.0: log2(x) = 1
[67, 109, 244], #4.0: log2(x) = 2
[39, 48, 215], #8.0: log2(x) = 3
[38, 0, 165], #16.0: log2(x) = 4
[38, 0, 165] #inf: log2(x) = inf
]).astype(float)
def color_error_image(errors, scale=1, mask=None, BGR=True):
"""
Color an input error map.
Arguments:
errors -- HxW numpy array of errors
[scale=1] -- scaling the error map (color change at unit error)
[mask=None] -- zero-pixels are masked white in the result
[BGR=True] -- toggle between BGR and RGB
Returns:
colored_errors -- HxWx3 numpy array visualizing the errors
"""
errors_flat = errors.flatten()
errors_color_indices = np.clip(np.log2(errors_flat / scale + 1e-5) + 5, 0, 9)
i0 = np.floor(errors_color_indices).astype(int)
f1 = errors_color_indices - i0.astype(float)
colored_errors_flat = _color_map_errors[i0, :] * (1-f1).reshape(-1,1) + _color_map_errors[i0+1, :] * f1.reshape(-1,1)
if mask is not None:
colored_errors_flat[mask.flatten() == 0] = 255
if not BGR:
colored_errors_flat = colored_errors_flat[:,[2,1,0]]
return colored_errors_flat.reshape(errors.shape[0], errors.shape[1], 3).astype(np.int)
_color_map_depths = np.array([
[0, 0, 0], # 0.000
[0, 0, 255], # 0.114
[255, 0, 0], # 0.299
[255, 0, 255], # 0.413
[0, 255, 0], # 0.587
[0, 255, 255], # 0.701
[255, 255, 0], # 0.886
[255, 255, 255], # 1.000
[255, 255, 255], # 1.000
]).astype(float)
_color_map_bincenters = np.array([
0.0,
0.114,
0.299,
0.413,
0.587,
0.701,
0.886,
1.000,
2.000, # doesn't make a difference, just strictly higher than 1
])
def color_depth_map(depths, scale=None):
"""
Color an input depth map.
Arguments:
depths -- HxW numpy array of depths
[scale=None] -- scaling the values (defaults to the maximum depth)
Returns:
colored_depths -- HxWx3 numpy array visualizing the depths
"""
# if scale is None:
# scale = depths.max() / 1.5
scale = 50
values = np.clip(depths.flatten() / scale, 0, 1)
# for each value, figure out where they fit in in the bincenters: what is the last bincenter smaller than this value?
lower_bin = ((values.reshape(-1, 1) >= _color_map_bincenters.reshape(1,-1)) * np.arange(0,9)).max(axis=1)
lower_bin_value = _color_map_bincenters[lower_bin]
higher_bin_value = _color_map_bincenters[lower_bin + 1]
alphas = (values - lower_bin_value) / (higher_bin_value - lower_bin_value)
colors = _color_map_depths[lower_bin] * (1-alphas).reshape(-1,1) + _color_map_depths[lower_bin + 1] * alphas.reshape(-1,1)
return colors.reshape(depths.shape[0], depths.shape[1], 3).astype(np.uint8)
================================================
FILE: utils/dynamic_utils.py
================================================
import numpy as np
import torch
from torch import optim
from torch import nn
from tqdm import tqdm
from matplotlib import pyplot as plt
import torch.nn.functional as F
from collections import defaultdict
import os
def rot2Euler(R):
sy = torch.sqrt(R[0,0] * R[0,0] + R[1,0] * R[1,0])
singular = sy < 1e-6
if not singular:
x = torch.atan2(R[2,1] , R[2,2])
y = torch.atan2(-R[2,0], sy)
z = torch.atan2(R[1,0], R[0,0])
else:
x = torch.atan2(-R[1,2], R[1,1])
y = torch.atan2(-R[2,0], sy)
z = 0
return torch.stack([x,y,z])
class unicycle(torch.nn.Module):
def __init__(self, train_timestamp, centers=None, heights=None, phis=None):
super(unicycle, self).__init__()
self.train_timestamp = train_timestamp
self.delta = torch.diff(self.train_timestamp)
self.input_a = centers[:, 0].clone()
self.input_b = centers[:, 1].clone()
if centers is None:
self.a = nn.Parameter(torch.zeros_like(train_timestamp).float())
self.b = nn.Parameter(torch.zeros_like(train_timestamp).float())
else:
self.a = nn.Parameter(centers[:, 0])
self.b = nn.Parameter(centers[:, 1])
diff_a = torch.diff(centers[:, 0]) / self.delta
diff_b = torch.diff(centers[:, 1]) / self.delta
v = torch.sqrt(diff_a ** 2 + diff_b**2)
self.v = nn.Parameter(F.pad(v, (0, 1), 'constant', v[-1].item()))
self.phi = nn.Parameter(phis)
if heights is None:
self.h = nn.Parameter(torch.zeros_like(train_timestamp).float())
else:
self.h = nn.Parameter(heights)
def acc_omega(self):
acc = torch.diff(self.v) / self.delta
omega = torch.diff(self.phi) / self.delta
acc = F.pad(acc, (0, 1), 'constant', acc[-1].item())
omega = F.pad(omega, (0, 1), 'constant', omega[-1].item())
return acc, omega
def forward(self, timestamps):
idx = torch.searchsorted(self.train_timestamp, timestamps, side='left')
invalid = (idx == self.train_timestamp.shape[0])
idx[invalid] -= 1
idx[self.train_timestamp[idx] != timestamps] -= 1
idx[invalid] += 1
prev_timestamps = self.train_timestamp[idx]
delta_t = timestamps - prev_timestamps
prev_a, prev_b = self.a[idx], self.b[idx]
prev_v, prev_phi = self.v[idx], self.phi[idx]
acc, omega = self.acc_omega()
v = prev_v + acc[idx] * delta_t
phi = prev_phi + omega[idx] * delta_t
a = prev_a + prev_v * ((torch.sin(phi) - torch.sin(prev_phi)) / (omega[idx] + 1e-6))
b = prev_b - prev_v * ((torch.cos(phi) - torch.cos(prev_phi)) / (omega[idx] + 1e-6))
h = self.h[idx]
return a, b, v, phi, h
def capture(self):
return (
self.a,
self.b,
self.v,
self.phi,
self.h,
self.train_timestamp,
self.delta
)
def restore(self, model_args):
(
self.a,
self.b,
self.v,
self.phi,
self.h,
self.train_timestamp,
self.delta
) = model_args
def visualize(self, save_path, noise_centers=None, gt_centers=None):
a, b, _, phi, _ = self.forward(self.train_timestamp)
a = a.detach().cpu().numpy()
b = b.detach().cpu().numpy()
phi = phi.detach().cpu().numpy()
plt.scatter(a, b, marker='x', color='b')
plt.quiver(a, b, np.ones_like(a) * np.cos(phi), np.ones_like(b) * np.sin(phi), scale=20, width=0.005)
if noise_centers is not None:
noise_centers = noise_centers.detach().cpu().numpy()
plt.scatter(noise_centers[:, 0], noise_centers[:, 1], marker='o', color='gray')
if gt_centers is not None:
gt_centers = gt_centers.detach().cpu().numpy()
plt.scatter(gt_centers[:, 0], gt_centers[:, 1], marker='v', color='g')
plt.axis('equal')
plt.savefig(save_path)
plt.close()
def reg_loss(self):
reg = 0
acc, omega = self.acc_omega()
reg += torch.mean(torch.abs(torch.diff(acc))) * 1
reg += torch.mean(torch.abs(torch.diff(omega))) * 1
reg_a_motion = self.v[:-1] * ((torch.sin(self.phi[1:]) - torch.sin(self.phi[:-1])) / (omega[:-1] + 1e-6))
reg_b_motion = -self.v[:-1] * ((torch.cos(self.phi[1:]) - torch.cos(self.phi[:-1])) / (omega[:-1] + 1e-6))
reg_a = self.a[:-1] + reg_a_motion
reg_b = self.b[:-1] + reg_b_motion
reg += torch.mean((reg_a - self.a[1:])**2 + (reg_b - self.b[1:])**2) * 1
return reg
def pos_loss(self):
# a, b, _, _, _ = self.forward(self.train_timestamp)
return torch.mean((self.a - self.input_a) ** 2 + (self.b - self.input_b) ** 2) * 10
def create_unicycle_model(train_cams, model_path, opt_iter=0, data_type='kitti'):
unicycle_models = {}
if data_type == 'kitti':
cameras = [cam for cam in train_cams if 'cam_0' in cam.image_name]
elif data_type == 'waymo':
cameras = [cam for cam in train_cams if 'cam_1' in cam.image_name]
else:
raise NotImplementedError
all_centers, all_heights, all_phis, all_timestamps = defaultdict(list), defaultdict(list), defaultdict(list), defaultdict(list)
seq_timestamps = []
for cam in cameras:
t = cam.timestamp
seq_timestamps.append(t)
for track_id, b2w in cam.dynamics.items():
all_centers[track_id].append(b2w[[0, 2], 3])
all_heights[track_id].append(b2w[1, 3])
eulers = rot2Euler(b2w[:3, :3])
all_phis[track_id].append(eulers[1])
all_timestamps[track_id].append(t)
for track_id in all_centers.keys():
centers = torch.stack(all_centers[track_id], dim=0).cuda()
timestamps = torch.tensor(all_timestamps[track_id]).cuda()
heights = torch.tensor(all_heights[track_id]).cuda()
phis = torch.tensor(all_phis[track_id]).cuda() + torch.pi
model = unicycle(timestamps, centers.clone(), heights.clone(), phis.clone())
l = [
{'params': [model.a], 'lr': 1e-2, "name": "a"},
{'params': [model.b], 'lr': 1e-2, "name": "b"},
{'params': [model.v], 'lr': 1e-3, "name": "v"},
{'params': [model.phi], 'lr': 1e-4, "name": "phi"},
{'params': [model.h], 'lr': 0, "name": "h"}
]
optimizer = optim.Adam(l, lr=0.0)
t_range = tqdm(range(opt_iter), desc=f"Fitting {track_id}")
for iter in t_range:
loss = 0.2 * model.pos_loss() + model.reg_loss()
t_range.set_postfix({'loss': loss.item()})
optimizer.zero_grad()
loss.backward()
optimizer.step()
unicycle_models[track_id] = {'model': model,
'optimizer': optimizer,
'input_centers': centers}
os.makedirs(os.path.join(model_path, "unicycle"), exist_ok=True)
for track_id, unicycle_pkg in unicycle_models.items():
model = unicycle_pkg['model']
optimizer = unicycle_pkg['optimizer']
model.visualize(os.path.join(model_path, "unicycle", f"{track_id}_init.png"),
# noise_centers=unicycle_pkg['input_centers']
)
# gt_centers=gt_centers)
return unicycle_models
================================================
FILE: utils/general_utils.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import torch
import sys
from datetime import datetime
import numpy as np
import random
import os
import cv2
def inverse_sigmoid(x):
return torch.log(x/(1-x))
def PILtoTorch(pil_image, resolution):
resized_image_PIL = pil_image.resize(resolution)
resized_image = torch.from_numpy(np.array(resized_image_PIL)) / 255.0
if len(resized_image.shape) == 3:
return resized_image.permute(2, 0, 1)
else:
return resized_image.unsqueeze(dim=-1).permute(2, 0, 1)
def PIL2toTorch(pil_image, resolution):
resized_image_PIL = pil_image.resize(resolution)
resized_image = torch.from_numpy(np.array(resized_image_PIL)) / 255.0 * (2.0 ** 16 - 1.0)
return resized_image
def decode_op(optical_png):
# use 'PIL Image.Open' to READ
"Convert from .png (h, w, 3-rgb) -> (h,w,2)(flow_x, flow_y) .. float32 array"
optical_png = optical_png[..., [2, 1, 0]] # bgr -> rgb
h, w, _c = optical_png.shape
assert optical_png.dtype == np.uint16 and _c == 3
"invalid flow flag: b == 0 for sky or other invalid flow"
invalid_points = np.where(optical_png[..., 2] == 0)
out_flow = torch.empty((h, w, 2))
decoded = 2.0 / (2**16 - 1.0) * optical_png.astype('f4') - 1
out_flow[..., 0] = torch.tensor(decoded[:, :, 0] * (w - 1)) # (pixel) delta_x : R
out_flow[..., 1] = torch.tensor(decoded[:, :, 1] * (h - 1)) # delta_y : G
out_flow[invalid_points[0], invalid_points[1], :] = 0 # B=0 for invalid flow
return out_flow
def get_expon_lr_func(
lr_init, lr_final, lr_delay_steps=0, lr_delay_mult=1.0, max_steps=1000000
):
"""
Copied from Plenoxels
Continuous learning rate decay function. Adapted from JaxNeRF
The returned rate is lr_init when step=0 and lr_final when step=max_steps, and
is log-linearly interpolated elsewhere (equivalent to exponential decay).
If lr_delay_steps>0 then the learning rate will be scaled by some smooth
function of lr_delay_mult, such that the initial learning rate is
lr_init*lr_delay_mult at the beginning of optimization but will be eased back
to the normal learning rate when steps>lr_delay_steps.
:param conf: config subtree 'lr' or similar
:param max_steps: int, the number of steps during optimization.
:return HoF which takes step as input
"""
def helper(step):
if step < 0 or (lr_init == 0.0 and lr_final == 0.0):
# Disable this parameter
return 0.0
if lr_delay_steps > 0:
# A kind of reverse cosine decay.
delay_rate = lr_delay_mult + (1 - lr_delay_mult) * np.sin(
0.5 * np.pi * np.clip(step / lr_delay_steps, 0, 1)
)
else:
delay_rate = 1.0
t = np.clip(step / max_steps, 0, 1)
log_lerp = np.exp(np.log(lr_init) * (1 - t) + np.log(lr_final) * t)
return delay_rate * log_lerp
return helper
def strip_lowerdiag(L):
uncertainty = torch.zeros((L.shape[0], 6), dtype=torch.float, device="cuda")
uncertainty[:, 0] = L[:, 0, 0]
uncertainty[:, 1] = L[:, 0, 1]
uncertainty[:, 2] = L[:, 0, 2]
uncertainty[:, 3] = L[:, 1, 1]
uncertainty[:, 4] = L[:, 1, 2]
uncertainty[:, 5] = L[:, 2, 2]
return uncertainty
def strip_symmetric(sym):
return strip_lowerdiag(sym)
def build_rotation(r):
norm = torch.sqrt(r[:,0]*r[:,0] + r[:,1]*r[:,1] + r[:,2]*r[:,2] + r[:,3]*r[:,3])
q = r / norm[:, None]
R = torch.zeros((q.size(0), 3, 3), device='cuda')
r = q[:, 0]
x = q[:, 1]
y = q[:, 2]
z = q[:, 3]
R[:, 0, 0] = 1 - 2 * (y*y + z*z)
R[:, 0, 1] = 2 * (x*y - r*z)
R[:, 0, 2] = 2 * (x*z + r*y)
R[:, 1, 0] = 2 * (x*y + r*z)
R[:, 1, 1] = 1 - 2 * (x*x + z*z)
R[:, 1, 2] = 2 * (y*z - r*x)
R[:, 2, 0] = 2 * (x*z - r*y)
R[:, 2, 1] = 2 * (y*z + r*x)
R[:, 2, 2] = 1 - 2 * (x*x + y*y)
return R
def build_scaling_rotation(s, r):
L = torch.zeros((s.shape[0], 3, 3), dtype=torch.float, device="cuda")
R = build_rotation(r)
L[:,0,0] = s[:,0]
L[:,1,1] = s[:,1]
L[:,2,2] = s[:,2]
L = R @ L
return L
DEFAULT_RANDOM_SEED = 0
def seedBasic(seed=DEFAULT_RANDOM_SEED):
random.seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
np.random.seed(seed)
def seedTorch(seed=DEFAULT_RANDOM_SEED):
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
# basic + tensorflow + torch
def seedEverything(seed=DEFAULT_RANDOM_SEED):
seedBasic(seed)
seedTorch(seed)
def safe_state(silent):
old_f = sys.stdout
class F:
def __init__(self, silent):
self.silent = silent
def write(self, x):
if not self.silent:
if x.endswith("\n"):
old_f.write(x.replace("\n", " [{}]\n".format(str(datetime.now().strftime("%d/%m %H:%M:%S")))))
else:
old_f.write(x)
def flush(self):
old_f.flush()
sys.stdout = F(silent)
random.seed(DEFAULT_RANDOM_SEED)
np.random.seed(DEFAULT_RANDOM_SEED)
torch.manual_seed(DEFAULT_RANDOM_SEED)
torch.cuda.set_device(torch.device("cuda:0"))
# sys.stdout = old_f
================================================
FILE: utils/graphics_utils.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import torch
import math
import numpy as np
from typing import NamedTuple
class BasicPointCloud(NamedTuple):
points : np.array
colors : np.array
normals : np.array
# feats3D : np.array
def geom_transform_points(points, transf_matrix):
P, _ = points.shape
ones = torch.ones(P, 1, dtype=points.dtype, device=points.device)
points_hom = torch.cat([points, ones], dim=1)
points_out = torch.matmul(points_hom, transf_matrix.unsqueeze(0))
denom = points_out[..., 3:] + 0.0000001
return (points_out[..., :3] / denom).squeeze(dim=0)
def getWorld2View(R, t):
Rt = np.zeros((4, 4))
Rt[:3, :3] = R.transpose()
Rt[:3, 3] = t
Rt[3, 3] = 1.0
return np.float32(Rt)
def getWorld2View2(R, t, translate=np.array([.0, .0, .0]), scale=1.0):
Rt = np.zeros((4, 4))
Rt[:3, :3] = R.transpose()
Rt[:3, 3] = t
Rt[3, 3] = 1.0
C2W = np.linalg.inv(Rt)
cam_center = C2W[:3, 3]
cam_center = (cam_center + translate) * scale
C2W[:3, 3] = cam_center
Rt = np.linalg.inv(C2W)
return np.float32(Rt)
def getProjectionMatrix(znear, zfar, fovX, fovY, cx_ratio, cy_ratio):
tanHalfFovY = math.tan((fovY / 2))
tanHalfFovX = math.tan((fovX / 2))
top = tanHalfFovY * znear
bottom = -top
right = tanHalfFovX * znear
left = -right
P = torch.zeros(4, 4)
z_sign = 1.0
P[0, 0] = 2.0 * znear / (right - left)
P[1, 1] = 2.0 * znear / (top - bottom)
P[0, 2] = (right + left) / (right - left) - 1 + cx_ratio
P[1, 2] = (top + bottom) / (top - bottom) - 1 + cy_ratio
P[3, 2] = z_sign
P[2, 2] = z_sign * (zfar + znear) / (zfar - znear)
P[2, 3] = -(2 * zfar * znear) / (zfar - znear)
# P[0, 0] = 2.0 * znear / (right - left)
# P[1, 1] = 2.0 * znear / (top - bottom)
# P[0, 2] = (right + left) / (right - left)
# P[1, 2] = (top + bottom) / (top - bottom)
# P[3, 2] = z_sign
# P[2, 2] = z_sign * zfar / (zfar - znear)
# P[2, 3] = -(zfar * znear) / (zfar - znear)
return P
def fov2focal(fov, pixels):
return pixels / (2 * math.tan(fov / 2))
def focal2fov(focal, pixels):
return 2*math.atan(pixels/(2*focal))
================================================
FILE: utils/image_utils.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import torch
def mse(img1, img2):
return (((img1 - img2)) ** 2).view(img1.shape[0], -1).mean(1, keepdim=True)
def psnr(img1, img2):
mse = (((img1 - img2)) ** 2).view(img1.shape[0], -1).mean(1, keepdim=True)
return 20 * torch.log10(1.0 / torch.sqrt(mse))
================================================
FILE: utils/iou_utils.py
================================================
# 3D IoU caculate code for 3D object detection
# Kent 2018/12
import numpy as np
from scipy.spatial import ConvexHull
from numpy import *
def polygon_clip(subjectPolygon, clipPolygon):
""" Clip a polygon with another polygon.
Ref: https://rosettacode.org/wiki/Sutherland-Hodgman_polygon_clipping#Python
Args:
subjectPolygon: a list of (x,y) 2d points, any polygon.
clipPolygon: a list of (x,y) 2d points, has to be *convex*
Note:
**points have to be counter-clockwise ordered**
Return:
a list of (x,y) vertex point for the intersection polygon.
"""
def inside(p):
return(cp2[0]-cp1[0])*(p[1]-cp1[1]) > (cp2[1]-cp1[1])*(p[0]-cp1[0])
def computeIntersection():
dc = [ cp1[0] - cp2[0], cp1[1] - cp2[1] ]
dp = [ s[0] - e[0], s[1] - e[1] ]
n1 = cp1[0] * cp2[1] - cp1[1] * cp2[0]
n2 = s[0] * e[1] - s[1] * e[0]
n3 = 1.0 / (dc[0] * dp[1] - dc[1] * dp[0])
return [(n1*dp[0] - n2*dc[0]) * n3, (n1*dp[1] - n2*dc[1]) * n3]
outputList = subjectPolygon
cp1 = clipPolygon[-1]
for clipVertex in clipPolygon:
cp2 = clipVertex
inputList = outputList
outputList = []
s = inputList[-1]
for subjectVertex in inputList:
e = subjectVertex
if inside(e):
if not inside(s):
outputList.append(computeIntersection())
outputList.append(e)
elif inside(s):
outputList.append(computeIntersection())
s = e
cp1 = cp2
if len(outputList) == 0:
return None
return(outputList)
def poly_area(x,y):
""" Ref: http://stackoverflow.com/questions/24467972/calculate-area-of-polygon-given-x-y-coordinates """
return 0.5*np.abs(np.dot(x,np.roll(y,1))-np.dot(y,np.roll(x,1)))
def convex_hull_intersection(p1, p2):
""" Compute area of two convex hull's intersection area.
p1,p2 are a list of (x,y) tuples of hull vertices.
return a list of (x,y) for the intersection and its volume
"""
inter_p = polygon_clip(p1,p2)
if inter_p is not None:
hull_inter = ConvexHull(inter_p)
return inter_p, hull_inter.volume
else:
return None, 0.0
def box3d_vol(corners):
''' corners: (8,3) no assumption on axis direction '''
a = np.sqrt(np.sum((corners[0,:] - corners[1,:])**2))
b = np.sqrt(np.sum((corners[1,:] - corners[2,:])**2))
c = np.sqrt(np.sum((corners[0,:] - corners[4,:])**2))
return a*b*c
def is_clockwise(p):
x = p[:,0]
y = p[:,1]
return np.dot(x,np.roll(y,1))-np.dot(y,np.roll(x,1)) > 0
def box3d_iou(corners1, corners2):
''' Compute 3D bounding box IoU.
Input:
corners1: numpy array (8,3), assume up direction is negative Y
corners2: numpy array (8,3), assume up direction is negative Y
Output:
iou: 3D bounding box IoU
iou_2d: bird's eye view 2D bounding box IoU
todo (kent): add more description on corner points' orders.
'''
# corner points are in counter clockwise order
rect1 = [(corners1[i,0], corners1[i,2]) for i in [4,5,1,0]]
rect2 = [(corners2[i,0], corners2[i,2]) for i in [4,5,1,0]]
area1 = poly_area(np.array(rect1)[:,0], np.array(rect1)[:,1])
area2 = poly_area(np.array(rect2)[:,0], np.array(rect2)[:,1])
inter, inter_area = convex_hull_intersection(rect1, rect2)
iou_2d = inter_area/(area1+area2-inter_area)
# if iou_2d < 0:
# print(inter_area, area1, area2)
# ymax = min(corners1[0,1], corners2[0,1])
# ymin = max(corners1[4,1], corners2[4,1])
# inter_vol = inter_area * max(0.0, ymax-ymin)
# vol1 = box3d_vol(corners1)
# vol2 = box3d_vol(corners2)
# iou = inter_vol / (vol1 + vol2 - inter_vol)
# return iou, iou_2d
return 0, iou_2d
# ----------------------------------
# Helper functions for evaluation
# ----------------------------------
def get_3d_box(box_size, heading_angle, center):
''' Calculate 3D bounding box corners from its parameterization.
Input:
box_size: tuple of (length,wide,height)
heading_angle: rad scalar, clockwise from pos x axis
center: tuple of (x,y,z)
Output:
corners_3d: numpy array of shape (8,3) for 3D box cornders
'''
def roty(t):
c = np.cos(t)
s = np.sin(t)
return np.array([[c, 0, s],
[0, 1, 0],
[-s, 0, c]])
R = roty(heading_angle)
l,w,h = box_size
x_corners = [l/2,l/2,-l/2,-l/2,l/2,l/2,-l/2,-l/2];
y_corners = [h/2,h/2,h/2,h/2,-h/2,-h/2,-h/2,-h/2];
z_corners = [w/2,-w/2,-w/2,w/2,w/2,-w/2,-w/2,w/2];
corners_3d = np.dot(R, np.vstack([x_corners,y_corners,z_corners]))
corners_3d[0,:] = corners_3d[0,:] + center[0];
corners_3d[1,:] = corners_3d[1,:] + center[1];
corners_3d[2,:] = corners_3d[2,:] + center[2];
corners_3d = np.transpose(corners_3d)
return corners_3d
if __name__=='__main__':
print('------------------')
# get_3d_box(box_size, heading_angle, center)
corners_3d_ground = get_3d_box((1.497255,1.644981, 3.628938), -1.531692, (2.882992 ,1.698800 ,20.785644))
corners_3d_predict = get_3d_box((1.458242, 1.604773, 3.707947), -1.549553, (2.756923, 1.661275, 20.943280 ))
(IOU_3d,IOU_2d)=box3d_iou(corners_3d_predict,corners_3d_ground)
print (IOU_3d,IOU_2d) #3d IoU/ 2d IoU of BEV(bird eye's view)
================================================
FILE: utils/loss_utils.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
import torch
import torch.nn.functional as F
from torch.autograd import Variable
from math import exp
def l1_loss(network_output, gt, mask=None):
l1 = torch.abs((network_output - gt))
if mask is not None:
l1 = l1[:, mask]
return l1.mean()
def l2_loss(network_output, gt):
return ((network_output - gt) ** 2).mean()
def gaussian(window_size, sigma):
gauss = torch.Tensor([exp(-(x - window_size // 2) ** 2 / float(2 * sigma ** 2)) for x in range(window_size)])
return gauss / gauss.sum()
def create_window(window_size, channel):
_1D_window = gaussian(window_size, 1.5).unsqueeze(1)
_2D_window = _1D_window.mm(_1D_window.t()).float().unsqueeze(0).unsqueeze(0)
window = Variable(_2D_window.expand(channel, 1, window_size, window_size).contiguous())
return window
def ssim(img1, img2, window_size=11, size_average=True):
channel = img1.size(-3)
window = create_window(window_size, channel)
if img1.is_cuda:
window = window.cuda(img1.get_device())
window = window.type_as(img1)
return _ssim(img1, img2, window, window_size, channel, size_average)
def _ssim(img1, img2, window, window_size, channel, size_average=True):
mu1 = F.conv2d(img1, window, padding=window_size // 2, groups=channel)
mu2 = F.conv2d(img2, window, padding=window_size // 2, groups=channel)
mu1_sq = mu1.pow(2)
mu2_sq = mu2.pow(2)
mu1_mu2 = mu1 * mu2
sigma1_sq = F.conv2d(img1 * img1, window, padding=window_size // 2, groups=channel) - mu1_sq
sigma2_sq = F.conv2d(img2 * img2, window, padding=window_size // 2, groups=channel) - mu2_sq
sigma12 = F.conv2d(img1 * img2, window, padding=window_size // 2, groups=channel) - mu1_mu2
C1 = 0.01 ** 2
C2 = 0.03 ** 2
ssim_map = ((2 * mu1_mu2 + C1) * (2 * sigma12 + C2)) / ((mu1_sq + mu2_sq + C1) * (sigma1_sq + sigma2_sq + C2))
if size_average:
return ssim_map.mean()
else:
return ssim_map.mean(1).mean(1).mean(1)
def ssim_loss(img1, img2, window_size=11, size_average=True, mask=None):
channel = img1.size(-3)
window = create_window(window_size, channel)
if img1.is_cuda:
window = window.cuda(img1.get_device())
window = window.type_as(img1)
return _ssim_loss(img1, img2, window, window_size, channel, size_average, mask)
def _ssim_loss(img1, img2, window, window_size, channel, size_average=True, mask=None):
mu1 = F.conv2d(img1, window, padding=window_size // 2, groups=channel)
mu2 = F.conv2d(img2, window, padding=window_size // 2, groups=channel)
mu1_sq = mu1.pow(2)
mu2_sq = mu2.pow(2)
mu1_mu2 = mu1 * mu2
sigma1_sq = F.conv2d(img1 * img1, window, padding=window_size // 2, groups=channel) - mu1_sq
sigma2_sq = F.conv2d(img2 * img2, window, padding=window_size // 2, groups=channel) - mu2_sq
sigma12 = F.conv2d(img1 * img2, window, padding=window_size // 2, groups=channel) - mu1_mu2
C1 = 0.01 ** 2
C2 = 0.03 ** 2
ssim_map = ((2 * mu1_mu2 + C1) * (2 * sigma12 + C2)) / ((mu1_sq + mu2_sq + C1) * (sigma1_sq + sigma2_sq + C2))
ssim_map = 1 - ssim_map
if mask is not None:
ssim_map = ssim_map[:, mask]
if size_average:
return ssim_map.mean()
else:
return ssim_map.mean(1).mean(1).mean(1)
================================================
FILE: utils/nvseg_utils.py
================================================
import sys
sys.path.append("/data0/hyzhou/workspace/nv_seg")
from network import get_model
from config import cfg, torch_version_float
from datasets.cityscapes import Loader as dataset_cls
from runx.logx import logx
import cv2
import torch
from imageio.v2 import imread, imwrite
import os
import numpy as np
from glob import glob
from tqdm import tqdm
from torchvision.utils import save_image
def restore_net(net, checkpoint):
assert 'state_dict' in checkpoint, 'cant find state_dict in checkpoint'
forgiving_state_restore(net, checkpoint['state_dict'])
def forgiving_state_restore(net, loaded_dict):
"""
Handle partial loading when some tensors don't match up in size.
Because we want to use models that were trained off a different
number of classes.
"""
net_state_dict = net.state_dict()
new_loaded_dict = {}
for k in net_state_dict:
new_k = k
if new_k in loaded_dict and net_state_dict[k].size() == loaded_dict[new_k].size():
new_loaded_dict[k] = loaded_dict[new_k]
else:
logx.msg("Skipped loading parameter {}".format(k))
net_state_dict.update(new_loaded_dict)
net.load_state_dict(net_state_dict)
return net
def get_nvseg_model():
logx.initialize(logdir="./results",
global_rank=0)
cfg.immutable(False)
cfg.DATASET.NUM_CLASSES = dataset_cls.num_classes
cfg.DATASET.IGNORE_LABEL = dataset_cls.ignore_label
cfg.MODEL.MSCALE = True
cfg.MODEL.N_SCALES = [0.5,1.0,2.0]
cfg.MODEL.BNFUNC = torch.nn.BatchNorm2d
cfg.OPTIONS.TORCH_VERSION = torch_version_float()
cfg.DATASET_INST = dataset_cls('folder')
cfg.immutable(True)
colorize_mask_fn = cfg.DATASET_INST.colorize_mask
net = get_model(network='network.ocrnet.HRNet_Mscale',
num_classes=cfg.DATASET.NUM_CLASSES,
criterion=None)
snapshot = "ASSETS_PATH/seg_weights/cityscapes_trainval_ocr.HRNet_Mscale_nimble-chihuahua.pth".replace('ASSETS_PATH', cfg.ASSETS_PATH)
checkpoint = torch.load(snapshot, map_location=torch.device('cpu'))
renamed_ckpt = {'state_dict': {}}
for k, v in checkpoint['state_dict'].items():
renamed_ckpt['state_dict'][k.replace('module.', '')] = v
restore_net(net, renamed_ckpt)
net = net.eval().cuda()
return net
================================================
FILE: utils/semantic_utils.py
================================================
#!/usr/bin/python
#
# KITTI-360 labels
#
from collections import namedtuple
from PIL import Image
import numpy as np
#--------------------------------------------------------------------------------
# Definitions
#--------------------------------------------------------------------------------
# a label and all meta information
Label = namedtuple( 'Label' , [
'name' , # The identifier of this label, e.g. 'car', 'person', ... .
# We use them to uniquely name a class
'id' , # An integer ID that is associated with this label.
# The IDs are used to represent the label in ground truth images
# An ID of -1 means that this label does not have an ID and thus
# is ignored when creating ground truth images (e.g. license plate).
# Do not modify these IDs, since exactly these IDs are expected by the
# evaluation server.
'trainId' , # Feel free to modify these IDs as suitable for your method. Then create
# ground truth images with train IDs, using the tools provided in the
# 'preparation' folder. However, make sure to validate or submit results
# to our evaluation server using the regular IDs above!
# For trainIds, multiple labels might have the same ID. Then, these labels
# are mapped to the same class in the ground truth images. For the inverse
# mapping, we use the label that is defined first in the list below.
# For example, mapping all void-type classes to the same ID in training,
# might make sense for some approaches.
# Max value is 255!
'category' , # The name of the category that this label belongs to
'categoryId' , # The ID of this category. Used to create ground truth images
# on category level.
'hasInstances', # Whether this label distinguishes between single instances or not
'ignoreInEval', # Whether pixels having this class as ground truth label are ignored
# during evaluations or not
'color' , # The color of this label
] )
#--------------------------------------------------------------------------------
# A list of all labels
#--------------------------------------------------------------------------------
# Please adapt the train IDs as appropriate for your approach.
# Note that you might want to ignore labels with ID 255 during training.
# Further note that the current train IDs are only a suggestion. You can use whatever you like.
# Make sure to provide your results using the original IDs and not the training IDs.
# Note that many IDs are ignored in evaluation and thus you never need to predict these!
labels = [
# name id trainId category catId hasInstances ignoreInEval color
Label( 'unlabeled' , 0 , 255 , 'void' , 0 , False , True , ( 0, 0, 0) ),
Label( 'ego vehicle' , 1 , 255 , 'void' , 0 , False , True , ( 0, 0, 0) ),
Label( 'rectification border' , 2 , 255 , 'void' , 0 , False , True , ( 0, 0, 0) ),
Label( 'out of roi' , 3 , 255 , 'void' , 0 , False , True , ( 0, 0, 0) ),
Label( 'static' , 4 , 255 , 'void' , 0 , False , True , ( 0, 0, 0) ),
Label( 'dynamic' , 5 , 255 , 'void' , 0 , False , True , (111, 74, 0) ),
Label( 'ground' , 6 , 255 , 'void' , 0 , False , True , ( 81, 0, 81) ),
Label( 'road' , 7 , 0 , 'flat' , 1 , False , False , (128, 64,128) ),
Label( 'sidewalk' , 8 , 1 , 'flat' , 1 , False , False , (244, 35,232) ),
Label( 'parking' , 9 , 255 , 'flat' , 1 , False , True , (250,170,160) ),
Label( 'rail track' , 10 , 255 , 'flat' , 1 , False , True , (230,150,140) ),
Label( 'building' , 11 , 2 , 'construction' , 2 , False , False , ( 70, 70, 70) ),
Label( 'wall' , 12 , 3 , 'construction' , 2 , False , False , (102,102,156) ),
Label( 'fence' , 13 , 4 , 'construction' , 2 , False , False , (190,153,153) ),
Label( 'guard rail' , 14 , 255 , 'construction' , 2 , False , True , (180,165,180) ),
Label( 'bridge' , 15 , 255 , 'construction' , 2 , False , True , (150,100,100) ),
Label( 'tunnel' , 16 , 255 , 'construction' , 2 , False , True , (150,120, 90) ),
Label( 'pole' , 17 , 5 , 'object' , 3 , False , False , (153,153,153) ),
Label( 'polegroup' , 18 , 255 , 'object' , 3 , False , True , (153,153,153) ),
Label( 'traffic light' , 19 , 6 , 'object' , 3 , False , False , (250,170, 30) ),
Label( 'traffic sign' , 20 , 7 , 'object' , 3 , False , False , (220,220, 0) ),
Label( 'vegetation' , 21 , 8 , 'nature' , 4 , False , False , (107,142, 35) ),
Label( 'terrain' , 22 , 9 , 'nature' , 4 , False , False , (152,251,152) ),
Label( 'sky' , 23 , 10 , 'sky' , 5 , False , False , ( 70,130,180) ),
Label( 'person' , 24 , 11 , 'human' , 6 , True , False , (220, 20, 60) ),
Label( 'rider' , 25 , 12 , 'human' , 6 , True , False , (255, 0, 0) ),
Label( 'car' , 26 , 13 , 'vehicle' , 7 , True , False , ( 0, 0,142) ),
Label( 'truck' , 27 , 14 , 'vehicle' , 7 , True , False , ( 0, 0, 70) ),
Label( 'bus' , 28 , 15 , 'vehicle' , 7 , True , False , ( 0, 60,100) ),
Label( 'caravan' , 29 , 255 , 'vehicle' , 7 , True , True , ( 0, 0, 90) ),
Label( 'trailer' , 30 , 255 , 'vehicle' , 7 , True , True , ( 0, 0,110) ),
Label( 'train' , 31 , 16 , 'vehicle' , 7 , True , False , ( 0, 80,100) ),
Label( 'motorcycle' , 32 , 17 , 'vehicle' , 7 , True , False , ( 0, 0,230) ),
Label( 'bicycle' , 33 , 18 , 'vehicle' , 7 , True , False , (119, 11, 32) ),
Label( 'license plate' , -1 , -1 , 'vehicle' , 7 , False , True , ( 0, 0,142) ),
]
#--------------------------------------------------------------------------------
# Create dictionaries for a fast lookup
#--------------------------------------------------------------------------------
# Please refer to the main method below for example usages!
# name to label object
name2label = { label.name : label for label in labels }
# id to label object
id2label = { label.id : label for label in labels }
# trainId to label object
trainId2label = { label.trainId : label for label in reversed(labels) }
# label2trainid
label2trainid = { label.id : label.trainId for label in labels }
# trainId to label object
trainId2name = { label.trainId : label.name for label in labels }
trainId2color = { label.trainId : label.color for label in labels }
# category to list of label objects
category2labels = {}
for label in labels:
category = label.category
if category in category2labels:
category2labels[category].append(label)
else:
category2labels[category] = [label]
#--------------------------------------------------------------------------------
# color mapping
#--------------------------------------------------------------------------------
palette = [128, 64, 128,
244, 35, 232,
70, 70, 70,
102, 102, 156,
190, 153, 153,
153, 153, 153,
250, 170, 30,
220, 220, 0,
107, 142, 35,
152, 251, 152,
70, 130, 180,
220, 20, 60,
255, 0, 0,
0, 0, 142,
0, 0, 70,
0, 60, 100,
0, 80, 100,
0, 0, 230,
119, 11, 32]
zero_pad = 256 * 3 - len(palette)
for i in range(zero_pad):
palette.append(0)
color_mapping = palette
def colorize(image_array):
new_mask = Image.fromarray(image_array.astype(np.uint8)).convert('P')
new_mask.putpalette(color_mapping)
return new_mask
#--------------------------------------------------------------------------------
# Assure single instance name
#--------------------------------------------------------------------------------
# returns the label name that describes a single instance (if possible)
# e.g. input | output
# ----------------------
# car | car
# cargroup | car
# foo | None
# foogroup | None
# skygroup | None
def assureSingleInstanceName( name ):
# if the name is known, it is not a group
if name in name2label:
return name
# test if the name actually denotes a group
if not name.endswith("group"):
return None
# remove group
name = name[:-len("group")]
# test if the new name exists
if not name in name2label:
return None
# test if the new name denotes a label that actually has instances
if not name2label[name].hasInstances:
return None
# all good then
return name
#--------------------------------------------------------------------------------
# Main for testing
#--------------------------------------------------------------------------------
# just a dummy main
if __name__ == "__main__":
# Print all the labels
print("List of KITTI-360 labels:")
print("")
print(" {:>21} | {:>3} | {:>7} | {:>14} | {:>10} | {:>12} | {:>12}".format( 'name', 'id', 'trainId', 'category', 'categoryId', 'hasInstances', 'ignoreInEval' ))
print(" " + ('-' * 98))
for label in labels:
# print(" {:>21} | {:>3} | {:>7} | {:>14} | {:>10} | {:>12} | {:>12}".format( label.name, label.id, label.trainId, label.category, label.categoryId, label.hasInstances, label.ignoreInEval ))
print(" \"{:}\"".format(label.name))
print("")
print("Example usages:")
# Map from name to label
name = 'car'
id = name2label[name].id
print("ID of label '{name}': {id}".format( name=name, id=id ))
# Map from ID to label
category = id2label[id].category
print("Category of label with ID '{id}': {category}".format( id=id, category=category ))
# Map from trainID to label
trainId = 0
name = trainId2label[trainId].name
print("Name of label with trainID '{id}': {name}".format( id=trainId, name=name ))
================================================
FILE: utils/sh_utils.py
================================================
# Copyright 2021 The PlenOctree Authors.
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
import torch
C0 = 0.28209479177387814
C1 = 0.4886025119029199
C2 = [
1.0925484305920792,
-1.0925484305920792,
0.31539156525252005,
-1.0925484305920792,
0.5462742152960396
]
C3 = [
-0.5900435899266435,
2.890611442640554,
-0.4570457994644658,
0.3731763325901154,
-0.4570457994644658,
1.445305721320277,
-0.5900435899266435
]
C4 = [
2.5033429417967046,
-1.7701307697799304,
0.9461746957575601,
-0.6690465435572892,
0.10578554691520431,
-0.6690465435572892,
0.47308734787878004,
-1.7701307697799304,
0.6258357354491761,
]
def eval_sh(deg, sh, dirs):
"""
Evaluate spherical harmonics at unit directions
using hardcoded SH polynomials.
Works with torch/np/jnp.
... Can be 0 or more batch dimensions.
Args:
deg: int SH deg. Currently, 0-3 supported
sh: jnp.ndarray SH coeffs [..., C, (deg + 1) ** 2]
dirs: jnp.ndarray unit directions [..., 3]
Returns:
[..., C]
"""
assert deg <= 4 and deg >= 0
coeff = (deg + 1) ** 2
assert sh.shape[-1] >= coeff
result = C0 * sh[..., 0]
if deg > 0:
x, y, z = dirs[..., 0:1], dirs[..., 1:2], dirs[..., 2:3]
result = (result -
C1 * y * sh[..., 1] +
C1 * z * sh[..., 2] -
C1 * x * sh[..., 3])
if deg > 1:
xx, yy, zz = x * x, y * y, z * z
xy, yz, xz = x * y, y * z, x * z
result = (result +
C2[0] * xy * sh[..., 4] +
C2[1] * yz * sh[..., 5] +
C2[2] * (2.0 * zz - xx - yy) * sh[..., 6] +
C2[3] * xz * sh[..., 7] +
C2[4] * (xx - yy) * sh[..., 8])
if deg > 2:
result = (result +
C3[0] * y * (3 * xx - yy) * sh[..., 9] +
C3[1] * xy * z * sh[..., 10] +
C3[2] * y * (4 * zz - xx - yy)* sh[..., 11] +
C3[3] * z * (2 * zz - 3 * xx - 3 * yy) * sh[..., 12] +
C3[4] * x * (4 * zz - xx - yy) * sh[..., 13] +
C3[5] * z * (xx - yy) * sh[..., 14] +
C3[6] * x * (xx - 3 * yy) * sh[..., 15])
if deg > 3:
result = (result + C4[0] * xy * (xx - yy) * sh[..., 16] +
C4[1] * yz * (3 * xx - yy) * sh[..., 17] +
C4[2] * xy * (7 * zz - 1) * sh[..., 18] +
C4[3] * yz * (7 * zz - 3) * sh[..., 19] +
C4[4] * (zz * (35 * zz - 30) + 3) * sh[..., 20] +
C4[5] * xz * (7 * zz - 3) * sh[..., 21] +
C4[6] * (xx - yy) * (7 * zz - 1) * sh[..., 22] +
C4[7] * xz * (xx - 3 * yy) * sh[..., 23] +
C4[8] * (xx * (xx - 3 * yy) - yy * (3 * xx - yy)) * sh[..., 24])
return result
def RGB2SH(rgb):
return (rgb - 0.5) / C0
def SH2RGB(sh):
return sh * C0 + 0.5
================================================
FILE: utils/system_utils.py
================================================
#
# Copyright (C) 2023, Inria
# GRAPHDECO research group, https://team.inria.fr/graphdeco
# All rights reserved.
#
# This software is free for non-commercial, research and evaluation use
# under the terms of the LICENSE.md file.
#
# For inquiries contact george.drettakis@inria.fr
#
from errno import EEXIST
from os import makedirs, path
import os
def mkdir_p(folder_path):
# Creates a directory. equivalent to using mkdir -p on the command line
try:
makedirs(folder_path)
except OSError as exc: # Python >2.5
if exc.errno == EEXIST and path.isdir(folder_path):
pass
else:
raise
def searchForMaxIteration(folder):
saved_iters = [int(fname.split("_")[-1]) for fname in os.listdir(folder)]
return max(saved_iters)
gitextract_gn_l3ctt/
├── .gitignore
├── .gitmodules
├── LICENSE.md
├── README.md
├── arguments/
│ └── __init__.py
├── environment.yml
├── gaussian_renderer/
│ └── __init__.py
├── lpipsPyTorch/
│ ├── __init__.py
│ └── modules/
│ ├── lpips.py
│ ├── networks.py
│ └── utils.py
├── metrics.py
├── render.py
├── requirements.txt
├── scene/
│ ├── __init__.py
│ ├── cameras.py
│ ├── dataset_readers.py
│ └── gaussian_model.py
├── submodules/
│ └── simple-knn/
│ ├── ext.cpp
│ ├── setup.py
│ ├── simple_knn/
│ │ └── .gitkeep
│ ├── simple_knn.cu
│ ├── simple_knn.h
│ ├── spatial.cu
│ └── spatial.h
└── utils/
├── camera_utils.py
├── cmap.py
├── dynamic_utils.py
├── general_utils.py
├── graphics_utils.py
├── image_utils.py
├── iou_utils.py
├── loss_utils.py
├── nvseg_utils.py
├── semantic_utils.py
├── sh_utils.py
├── system_utils.py
└── vehicle_template/
├── benz_kitti.ply
├── benz_kitti360.ply
├── benz_pandaset.ply
└── benz_waymo.ply
SYMBOL INDEX (161 symbols across 26 files)
FILE: arguments/__init__.py
class GroupParams (line 16) | class GroupParams:
class ParamGroup (line 19) | class ParamGroup:
method __init__ (line 20) | def __init__(self, parser: ArgumentParser, name : str, fill_none = Fal...
method extract (line 40) | def extract(self, args):
class ModelParams (line 47) | class ModelParams(ParamGroup):
method __init__ (line 48) | def __init__(self, parser, sentinel=False):
method extract (line 59) | def extract(self, args):
class PipelineParams (line 64) | class PipelineParams(ParamGroup):
method __init__ (line 65) | def __init__(self, parser):
class OptimizationParams (line 71) | class OptimizationParams(ParamGroup):
method __init__ (line 72) | def __init__(self, parser):
function get_combined_args (line 91) | def get_combined_args(parser : ArgumentParser):
FILE: gaussian_renderer/__init__.py
function euler2matrix (line 19) | def euler2matrix(yaw):
function cat_bgfg (line 29) | def cat_bgfg(bg, fg, only_dynamic=False, only_xyz=False):
function cat_all_fg (line 47) | def cat_all_fg(all_fg, next_fg):
function proj_uv (line 58) | def proj_uv(xyz, cam):
function unicycle_b2w (line 69) | def unicycle_b2w(timestamp, model):
function render (line 81) | def render(viewpoint_camera, prev_viewpoint_camera, pc : GaussianModel, ...
FILE: lpipsPyTorch/__init__.py
function lpips (line 6) | def lpips(x: torch.Tensor,
FILE: lpipsPyTorch/modules/lpips.py
class LPIPS (line 8) | class LPIPS(nn.Module):
method __init__ (line 17) | def __init__(self, net_type: str = 'alex', version: str = '0.1'):
method forward (line 30) | def forward(self, x: torch.Tensor, y: torch.Tensor):
FILE: lpipsPyTorch/modules/networks.py
function get_network (line 12) | def get_network(net_type: str):
class LinLayers (line 23) | class LinLayers(nn.ModuleList):
method __init__ (line 24) | def __init__(self, n_channels_list: Sequence[int]):
class BaseNet (line 36) | class BaseNet(nn.Module):
method __init__ (line 37) | def __init__(self):
method set_requires_grad (line 46) | def set_requires_grad(self, state: bool):
method z_score (line 50) | def z_score(self, x: torch.Tensor):
method forward (line 53) | def forward(self, x: torch.Tensor):
class SqueezeNet (line 66) | class SqueezeNet(BaseNet):
method __init__ (line 67) | def __init__(self):
class AlexNet (line 77) | class AlexNet(BaseNet):
method __init__ (line 78) | def __init__(self):
class VGG16 (line 88) | class VGG16(BaseNet):
method __init__ (line 89) | def __init__(self):
FILE: lpipsPyTorch/modules/utils.py
function normalize_activation (line 6) | def normalize_activation(x, eps=1e-10):
function get_state_dict (line 11) | def get_state_dict(net_type: str = 'alex', version: str = '0.1'):
FILE: metrics.py
function readImages (line 25) | def readImages(renders_dir, gt_dir):
function evaluate (line 37) | def evaluate(model_paths, write):
FILE: render.py
function to4x4 (line 34) | def to4x4(R, T):
function apply_colormap (line 40) | def apply_colormap(image, cmap="viridis"):
function apply_depth_colormap (line 51) | def apply_depth_colormap(depth, near_plane=None, far_plane=None, cmap="t...
function render_set (line 61) | def render_set(model_path, name, iteration, views, scene, pipeline, back...
function render_sets (line 129) | def render_sets(dataset : ModelParams, iteration : int, pipeline : Pipel...
FILE: scene/__init__.py
class Scene (line 26) | class Scene:
method __init__ (line 30) | def __init__(self, args : ModelParams, gaussians : GaussianModel, load...
method save (line 121) | def save(self, iteration):
method getTrainCameras (line 128) | def getTrainCameras(self, scale=1.0):
method getTestCameras (line 131) | def getTestCameras(self, scale=1.0):
FILE: scene/cameras.py
class Camera (line 18) | class Camera(nn.Module):
method __init__ (line 19) | def __init__(self, colmap_id, R, T, K, FoVx, FoVy, image,
method get_rays (line 80) | def get_rays(self):
class MiniCam (line 91) | class MiniCam:
method __init__ (line 92) | def __init__(self, width, height, fovy, fovx, znear, zfar, world_view_...
FILE: scene/dataset_readers.py
class CameraInfo (line 29) | class CameraInfo(NamedTuple):
class SceneInfo (line 49) | class SceneInfo(NamedTuple):
function getNerfppNorm (line 57) | def getNerfppNorm(cam_info):
function fetchPly (line 81) | def fetchPly(path):
function storePly (line 97) | def storePly(path, xyz, rgb):
function readStudioCameras (line 114) | def readStudioCameras(path, white_background, data_type, ignore_dynamic):
function readStudioInfo (line 227) | def readStudioInfo(path, white_background, eval, data_type, ignore_dynam...
FILE: scene/gaussian_model.py
class CustomAdam (line 27) | class CustomAdam(torch.optim.Optimizer):
method __init__ (line 28) | def __init__(self, params, lr=0.001, betas=(0.9, 0.999), eps=1e-8):
method step (line 32) | def step(self, custom_lr=None, name=None):
class GaussianModel (line 72) | class GaussianModel:
method setup_functions (line 74) | def setup_functions(self):
method __init__ (line 92) | def __init__(self, sh_degree : int, feat_mutable=True, affine=False):
method capture (line 139) | def capture(self):
method restore (line 157) | def restore(self, model_args, training_args):
method get_scaling (line 179) | def get_scaling(self):
method get_rotation (line 183) | def get_rotation(self):
method get_xyz (line 188) | def get_xyz(self):
method get_features (line 192) | def get_features(self):
method get_3D_features (line 198) | def get_3D_features(self):
method get_opacity (line 202) | def get_opacity(self):
method get_covariance (line 205) | def get_covariance(self, scaling_modifier = 1):
method oneupSHdegree (line 208) | def oneupSHdegree(self):
method create_from_pcd (line 212) | def create_from_pcd(self, pcd : BasicPointCloud, spatial_lr_scale : fl...
method training_setup (line 246) | def training_setup(self, training_args):
method update_learning_rate (line 275) | def update_learning_rate(self, iteration):
method construct_list_of_attributes (line 283) | def construct_list_of_attributes(self):
method save_ply (line 299) | def save_ply(self, path):
method save_vis_ply (line 319) | def save_vis_ply(self, path):
method reset_opacity (line 328) | def reset_opacity(self):
method load_ply (line 333) | def load_ply(self, path):
method replace_tensor_to_optimizer (line 373) | def replace_tensor_to_optimizer(self, tensor, name):
method _prune_optimizer (line 388) | def _prune_optimizer(self, mask):
method prune_points (line 408) | def prune_points(self, mask):
method cat_tensors_to_optimizer (line 428) | def cat_tensors_to_optimizer(self, tensors_dict):
method densification_postfix (line 452) | def densification_postfix(self, new_xyz, new_features_dc, new_features...
method densify_and_split (line 477) | def densify_and_split(self, grads, grad_threshold, scene_extent, N=2):
method densify_and_clone (line 503) | def densify_and_clone(self, grads, grad_threshold, scene_extent):
method densify_and_prune (line 519) | def densify_and_prune(self, max_grad, min_opacity, extent, max_screen_...
method add_densification_stats (line 535) | def add_densification_stats(self, viewspace_point_tensor, update_filter):
method add_densification_stats_grad (line 539) | def add_densification_stats_grad(self, tensor_grad, update_filter):
FILE: submodules/simple-knn/ext.cpp
function PYBIND11_MODULE (line 15) | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
FILE: submodules/simple-knn/simple_knn.h
function class (line 15) | class SimpleKNN
FILE: utils/camera_utils.py
function loadCam (line 20) | def loadCam(args, id, cam_info, resolution_scale):
function cameraList_from_camInfos (line 60) | def cameraList_from_camInfos(cam_infos, resolution_scale, args):
function camera_to_JSON (line 68) | def camera_to_JSON(id, camera : Camera):
FILE: utils/cmap.py
function color_error_image (line 17) | def color_error_image(errors, scale=1, mask=None, BGR=True):
function color_depth_map (line 68) | def color_depth_map(depths, scale=None):
FILE: utils/dynamic_utils.py
function rot2Euler (line 12) | def rot2Euler(R):
class unicycle (line 27) | class unicycle(torch.nn.Module):
method __init__ (line 29) | def __init__(self, train_timestamp, centers=None, heights=None, phis=N...
method acc_omega (line 55) | def acc_omega(self):
method forward (line 62) | def forward(self, timestamps):
method capture (line 81) | def capture(self):
method restore (line 92) | def restore(self, model_args):
method visualize (line 103) | def visualize(self, save_path, noise_centers=None, gt_centers=None):
method reg_loss (line 120) | def reg_loss(self):
method pos_loss (line 132) | def pos_loss(self):
function create_unicycle_model (line 137) | def create_unicycle_model(train_cams, model_path, opt_iter=0, data_type=...
FILE: utils/general_utils.py
function inverse_sigmoid (line 20) | def inverse_sigmoid(x):
function PILtoTorch (line 23) | def PILtoTorch(pil_image, resolution):
function PIL2toTorch (line 31) | def PIL2toTorch(pil_image, resolution):
function decode_op (line 36) | def decode_op(optical_png):
function get_expon_lr_func (line 51) | def get_expon_lr_func(
function strip_lowerdiag (line 86) | def strip_lowerdiag(L):
function strip_symmetric (line 97) | def strip_symmetric(sym):
function build_rotation (line 100) | def build_rotation(r):
function build_scaling_rotation (line 123) | def build_scaling_rotation(s, r):
function seedBasic (line 136) | def seedBasic(seed=DEFAULT_RANDOM_SEED):
function seedTorch (line 141) | def seedTorch(seed=DEFAULT_RANDOM_SEED):
function seedEverything (line 148) | def seedEverything(seed=DEFAULT_RANDOM_SEED):
function safe_state (line 152) | def safe_state(silent):
FILE: utils/graphics_utils.py
class BasicPointCloud (line 17) | class BasicPointCloud(NamedTuple):
function geom_transform_points (line 23) | def geom_transform_points(points, transf_matrix):
function getWorld2View (line 32) | def getWorld2View(R, t):
function getWorld2View2 (line 39) | def getWorld2View2(R, t, translate=np.array([.0, .0, .0]), scale=1.0):
function getProjectionMatrix (line 52) | def getProjectionMatrix(znear, zfar, fovX, fovY, cx_ratio, cy_ratio):
function fov2focal (line 82) | def fov2focal(fov, pixels):
function focal2fov (line 85) | def focal2fov(focal, pixels):
FILE: utils/image_utils.py
function mse (line 14) | def mse(img1, img2):
function psnr (line 17) | def psnr(img1, img2):
FILE: utils/iou_utils.py
function polygon_clip (line 8) | def polygon_clip(subjectPolygon, clipPolygon):
function poly_area (line 56) | def poly_area(x,y):
function convex_hull_intersection (line 60) | def convex_hull_intersection(p1, p2):
function box3d_vol (line 72) | def box3d_vol(corners):
function is_clockwise (line 79) | def is_clockwise(p):
function box3d_iou (line 84) | def box3d_iou(corners1, corners2):
function get_3d_box (line 122) | def get_3d_box(box_size, heading_angle, center):
FILE: utils/loss_utils.py
function l1_loss (line 17) | def l1_loss(network_output, gt, mask=None):
function l2_loss (line 23) | def l2_loss(network_output, gt):
function gaussian (line 26) | def gaussian(window_size, sigma):
function create_window (line 30) | def create_window(window_size, channel):
function ssim (line 36) | def ssim(img1, img2, window_size=11, size_average=True):
function _ssim (line 46) | def _ssim(img1, img2, window, window_size, channel, size_average=True):
function ssim_loss (line 68) | def ssim_loss(img1, img2, window_size=11, size_average=True, mask=None):
function _ssim_loss (line 78) | def _ssim_loss(img1, img2, window, window_size, channel, size_average=Tr...
FILE: utils/nvseg_utils.py
function restore_net (line 16) | def restore_net(net, checkpoint):
function forgiving_state_restore (line 21) | def forgiving_state_restore(net, loaded_dict):
function get_nvseg_model (line 40) | def get_nvseg_model():
FILE: utils/semantic_utils.py
function colorize (line 157) | def colorize(image_array):
function assureSingleInstanceName (line 174) | def assureSingleInstanceName( name ):
FILE: utils/sh_utils.py
function eval_sh (line 57) | def eval_sh(deg, sh, dirs):
function RGB2SH (line 114) | def RGB2SH(rgb):
function SH2RGB (line 117) | def SH2RGB(sh):
FILE: utils/system_utils.py
function mkdir_p (line 16) | def mkdir_p(folder_path):
function searchForMaxIteration (line 26) | def searchForMaxIteration(folder):
Condensed preview — 41 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (149K chars).
[
{
"path": ".gitignore",
"chars": 151,
"preview": "*.pyc\n.vscode\noutput\nbuild\ndiff_rasterization/diff_rast.egg-info\ndiff_rasterization/dist\ntensorboard_3d\nscreenshots\n*.eg"
},
{
"path": ".gitmodules",
"chars": 256,
"preview": "[submodule \"submodules/simple-knn\"]\n\tpath = submodules/simple-knn\n\turl = https://gitlab.inria.fr/bkerbl/simple-knn.git\n["
},
{
"path": "LICENSE.md",
"chars": 4099,
"preview": "HUGS License \n=========================== \n\n**Zhejiang University** hold all the ownership rights on the *Software* na"
},
{
"path": "README.md",
"chars": 4343,
"preview": "# HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting\n\n[Hongyu Zhou](https://github.com/hyzhou404), [Jiah"
},
{
"path": "arguments/__init__.py",
"chars": 3782,
"preview": "#\n# Copyright (C) 2023, Inria\n# GRAPHDECO research group, https://team.inria.fr/graphdeco\n# All rights reserved.\n#\n# Thi"
},
{
"path": "environment.yml",
"chars": 316,
"preview": "name: gaussian_splatting\nchannels:\n - pytorch\n - conda-forge\n - defaults\ndependencies:\n - cudatoolkit=11.6\n - plyfi"
},
{
"path": "gaussian_renderer/__init__.py",
"chars": 7908,
"preview": "#\n# Copyright (C) 2023, Inria\n# GRAPHDECO research group, https://team.inria.fr/graphdeco\n# All rights reserved.\n#\n# Thi"
},
{
"path": "lpipsPyTorch/__init__.py",
"chars": 635,
"preview": "import torch\n\nfrom .modules.lpips import LPIPS\n\n\ndef lpips(x: torch.Tensor,\n y: torch.Tensor,\n net_typ"
},
{
"path": "lpipsPyTorch/modules/lpips.py",
"chars": 1151,
"preview": "import torch\nimport torch.nn as nn\n\nfrom .networks import get_network, LinLayers\nfrom .utils import get_state_dict\n\n\ncla"
},
{
"path": "lpipsPyTorch/modules/networks.py",
"chars": 2692,
"preview": "from typing import Sequence\n\nfrom itertools import chain\n\nimport torch\nimport torch.nn as nn\nfrom torchvision import mod"
},
{
"path": "lpipsPyTorch/modules/utils.py",
"chars": 885,
"preview": "from collections import OrderedDict\n\nimport torch\n\n\ndef normalize_activation(x, eps=1e-10):\n norm_factor = torch.sqrt"
},
{
"path": "metrics.py",
"chars": 3886,
"preview": "#\n# Copyright (C) 2023, Inria\n# GRAPHDECO research group, https://team.inria.fr/graphdeco\n# All rights reserved.\n#\n# Thi"
},
{
"path": "render.py",
"chars": 7257,
"preview": "#\n# Copyright (C) 2023, Inria\n# GRAPHDECO research group, https://team.inria.fr/graphdeco\n# All rights reserved.\n#\n# Thi"
},
{
"path": "requirements.txt",
"chars": 336,
"preview": "config==0.5.1\ndatasets==2.19.2\n# flow_vis_torch==0.1\nimageio==2.34.1\nmatplotlib==3.9.0\nnetwork==0.1\nnumpy==1.26.4\nopen3d"
},
{
"path": "scene/__init__.py",
"chars": 6534,
"preview": "#\n# Copyright (C) 2023, Inria\n# GRAPHDECO research group, https://team.inria.fr/graphdeco\n# All rights reserved.\n#\n# Thi"
},
{
"path": "scene/cameras.py",
"chars": 4211,
"preview": "#\n# Copyright (C) 2023, Inria\n# GRAPHDECO research group, https://team.inria.fr/graphdeco\n# All rights reserved.\n#\n# Thi"
},
{
"path": "scene/dataset_readers.py",
"chars": 9661,
"preview": "#\n# Copyright (C) 2023, Inria\n# GRAPHDECO research group, https://team.inria.fr/graphdeco\n# All rights reserved.\n#\n# Thi"
},
{
"path": "scene/gaussian_model.py",
"chars": 25034,
"preview": "#\n# Copyright (C) 2023, Inria\n# GRAPHDECO research group, https://team.inria.fr/graphdeco\n# All rights reserved.\n#\n# Thi"
},
{
"path": "submodules/simple-knn/ext.cpp",
"chars": 427,
"preview": "/*\n * Copyright (C) 2023, Inria\n * GRAPHDECO research group, https://team.inria.fr/graphdeco\n * All rights reserved.\n *\n"
},
{
"path": "submodules/simple-knn/setup.py",
"chars": 830,
"preview": "#\n# Copyright (C) 2023, Inria\n# GRAPHDECO research group, https://team.inria.fr/graphdeco\n# All rights reserved.\n#\n# Thi"
},
{
"path": "submodules/simple-knn/simple_knn/.gitkeep",
"chars": 0,
"preview": ""
},
{
"path": "submodules/simple-knn/simple_knn.cu",
"chars": 6352,
"preview": "/*\n * Copyright (C) 2023, Inria\n * GRAPHDECO research group, https://team.inria.fr/graphdeco\n * All rights reserved.\n *\n"
},
{
"path": "submodules/simple-knn/simple_knn.h",
"chars": 451,
"preview": "/*\n * Copyright (C) 2023, Inria\n * GRAPHDECO research group, https://team.inria.fr/graphdeco\n * All rights reserved.\n *\n"
},
{
"path": "submodules/simple-knn/spatial.cu",
"chars": 671,
"preview": "/*\n * Copyright (C) 2023, Inria\n * GRAPHDECO research group, https://team.inria.fr/graphdeco\n * All rights reserved.\n *\n"
},
{
"path": "submodules/simple-knn/spatial.h",
"chars": 380,
"preview": "/*\n * Copyright (C) 2023, Inria\n * GRAPHDECO research group, https://team.inria.fr/graphdeco\n * All rights reserved.\n *\n"
},
{
"path": "utils/camera_utils.py",
"chars": 3084,
"preview": "#\n# Copyright (C) 2023, Inria\n# GRAPHDECO research group, https://team.inria.fr/graphdeco\n# All rights reserved.\n#\n# Thi"
},
{
"path": "utils/cmap.py",
"chars": 3183,
"preview": "import numpy as np\n\n_color_map_errors = np.array([\n [149, 54, 49], #0: log2(x) = -infinity\n [180, 117, 69], "
},
{
"path": "utils/dynamic_utils.py",
"chars": 7539,
"preview": "import numpy as np\nimport torch\nfrom torch import optim\nfrom torch import nn\nfrom tqdm import tqdm\nfrom matplotlib impor"
},
{
"path": "utils/general_utils.py",
"chars": 5546,
"preview": "#\n# Copyright (C) 2023, Inria\n# GRAPHDECO research group, https://team.inria.fr/graphdeco\n# All rights reserved.\n#\n# Thi"
},
{
"path": "utils/graphics_utils.py",
"chars": 2447,
"preview": "#\n# Copyright (C) 2023, Inria\n# GRAPHDECO research group, https://team.inria.fr/graphdeco\n# All rights reserved.\n#\n# Thi"
},
{
"path": "utils/image_utils.py",
"chars": 554,
"preview": "#\n# Copyright (C) 2023, Inria\n# GRAPHDECO research group, https://team.inria.fr/graphdeco\n# All rights reserved.\n#\n# Thi"
},
{
"path": "utils/iou_utils.py",
"chars": 5427,
"preview": "# 3D IoU caculate code for 3D object detection \n# Kent 2018/12\n\nimport numpy as np\nfrom scipy.spatial import ConvexHull\n"
},
{
"path": "utils/loss_utils.py",
"chars": 3561,
"preview": "#\n# Copyright (C) 2023, Inria\n# GRAPHDECO research group, https://team.inria.fr/graphdeco\n# All rights reserved.\n#\n# Thi"
},
{
"path": "utils/nvseg_utils.py",
"chars": 2347,
"preview": "import sys\nsys.path.append(\"/data0/hyzhou/workspace/nv_seg\")\nfrom network import get_model\nfrom config import cfg, torch"
},
{
"path": "utils/semantic_utils.py",
"chars": 11867,
"preview": "#!/usr/bin/python\n#\n# KITTI-360 labels\n#\n\nfrom collections import namedtuple\nfrom PIL import Image\nimport numpy as np\n\n\n"
},
{
"path": "utils/sh_utils.py",
"chars": 4371,
"preview": "# Copyright 2021 The PlenOctree Authors.\n# Redistribution and use in source and binary forms, with or without\n# modif"
},
{
"path": "utils/system_utils.py",
"chars": 784,
"preview": "#\n# Copyright (C) 2023, Inria\n# GRAPHDECO research group, https://team.inria.fr/graphdeco\n# All rights reserved.\n#\n# Thi"
}
]
// ... and 4 more files (download for full content)
About this extraction
This page contains the full source code of the hyzhou404/HUGS GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 41 files (139.6 KB), approximately 39.6k tokens, and a symbol index with 161 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.