Repository: hyzhou404/HUGS Branch: main Commit: dbb17df8c2b9 Files: 41 Total size: 139.6 KB Directory structure: gitextract_gn_l3ctt/ ├── .gitignore ├── .gitmodules ├── LICENSE.md ├── README.md ├── arguments/ │ └── __init__.py ├── environment.yml ├── gaussian_renderer/ │ └── __init__.py ├── lpipsPyTorch/ │ ├── __init__.py │ └── modules/ │ ├── lpips.py │ ├── networks.py │ └── utils.py ├── metrics.py ├── render.py ├── requirements.txt ├── scene/ │ ├── __init__.py │ ├── cameras.py │ ├── dataset_readers.py │ └── gaussian_model.py ├── submodules/ │ └── simple-knn/ │ ├── ext.cpp │ ├── setup.py │ ├── simple_knn/ │ │ └── .gitkeep │ ├── simple_knn.cu │ ├── simple_knn.h │ ├── spatial.cu │ └── spatial.h └── utils/ ├── camera_utils.py ├── cmap.py ├── dynamic_utils.py ├── general_utils.py ├── graphics_utils.py ├── image_utils.py ├── iou_utils.py ├── loss_utils.py ├── nvseg_utils.py ├── semantic_utils.py ├── sh_utils.py ├── system_utils.py └── vehicle_template/ ├── benz_kitti.ply ├── benz_kitti360.ply ├── benz_pandaset.ply └── benz_waymo.ply ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ *.pyc .vscode output build diff_rasterization/diff_rast.egg-info diff_rasterization/dist tensorboard_3d screenshots *.egg-info external shell .DS_Store ================================================ FILE: .gitmodules ================================================ [submodule "submodules/simple-knn"] path = submodules/simple-knn url = https://gitlab.inria.fr/bkerbl/simple-knn.git [submodule "submodules/hugs-rasterization"] path = submodules/hugs-rasterization url = https://github.com/hyzhou404/hugs-rasterization ================================================ FILE: LICENSE.md ================================================ HUGS License =========================== **Zhejiang University** hold all the ownership rights on the *Software* named **HUGS**. The *Software* is still being developed by the *Licensor*. *Licensor*'s goal is to allow the research community to use, test and evaluate the *Software*. ## 1. Definitions *Licensee* means any person or entity that uses the *Software* and distributes its *Work*. *Licensor* means the owners of the *Software*, i.e Zhejiang University *Software* means the original work of authorship made available under this License ie HUGS. *Work* means the *Software* and any additions to or derivative works of the *Software* that are made available under this License. ## 2. Purpose This license is intended to define the rights granted to the *Licensee* by Licensors under the *Software*. ## 3. Rights granted For the above reasons Licensors have decided to distribute the *Software*. Licensors grant non-exclusive rights to use the *Software* for research purposes to research users (both academic and industrial), free of charge, without right to sublicense.. The *Software* may be used "non-commercially", i.e., for research and/or evaluation purposes only. Subject to the terms and conditions of this License, you are granted a non-exclusive, royalty-free, license to reproduce, prepare derivative works of, publicly display, publicly perform and distribute its *Work* and any resulting derivative works in any form. ## 4. Limitations **4.1 Redistribution.** You may reproduce or distribute the *Work* only if (a) you do so under this License, (b) you include a complete copy of this License with your distribution, and (c) you retain without modification any copyright, patent, trademark, or attribution notices that are present in the *Work*. **4.2 Derivative Works.** You may specify that additional or different terms apply to the use, reproduction, and distribution of your derivative works of the *Work* ("Your Terms") only if (a) Your Terms provide that the use limitation in Section 2 applies to your derivative works, and (b) you identify the specific derivative works that are subject to Your Terms. Notwithstanding Your Terms, this License (including the redistribution requirements in Section 3.1) will continue to apply to the *Work* itself. **4.3** Any other use without of prior consent of Licensors is prohibited. Research users explicitly acknowledge having received from Licensors all information allowing to appreciate the adequacy between of the *Software* and their needs and to undertake all necessary precautions for its execution and use. **4.4** The *Software* is provided both as a compiled library file and as source code. In case of using the *Software* for a publication or other results obtained through the use of the *Software*, users are strongly encouraged to cite the corresponding publications as explained in the documentation of the *Software*. ## 5. Disclaimer THE USER CANNOT USE, EXPLOIT OR DISTRIBUTE THE *SOFTWARE* FOR COMMERCIAL PURPOSES WITHOUT PRIOR AND EXPLICIT CONSENT OF LICENSORS. YOU MUST CONTACT Zhejiang University FOR ANY UNAUTHORIZED USE: yiyi.liao@zju.edu.cn. ANY SUCH ACTION WILL CONSTITUTE A FORGERY. THIS *SOFTWARE* IS PROVIDED "AS IS" WITHOUT ANY WARRANTIES OF ANY NATURE AND ANY EXPRESS OR IMPLIED WARRANTIES, WITH REGARDS TO COMMERCIAL USE, PROFESSIONNAL USE, LEGAL OR NOT, OR OTHER, OR COMMERCIALISATION OR ADAPTATION. UNLESS EXPLICITLY PROVIDED BY LAW, IN NO EVENT, SHALL Zhejiang University OR THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES, LOSS OF USE, DATA, OR PROFITS OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING FROM, OUT OF OR IN CONNECTION WITH THE *SOFTWARE* OR THE USE OR OTHER DEALINGS IN THE *SOFTWARE*. ================================================ FILE: README.md ================================================ # HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting [Hongyu Zhou](https://github.com/hyzhou404), [Jiahao Shao](https://jhaoshao.github.io/), Lu Xu, Dongfeng Bai, [Weichao Qiu](https://weichaoqiu.com/), Bingbing Liu, [Yue Wang](https://ywang-zju.github.io/), [Andreas Geiger](https://www.cvlibs.net/) , [Yiyi Liao](https://yiyiliao.github.io/)
| [Webpage](https://xdimlab.github.io/hugs_website/) | [Full Paper](https://openaccess.thecvf.com/content/CVPR2024/html/Zhou_HUGS_Holistic_Urban_3D_Scene_Understanding_via_Gaussian_Splatting_CVPR_2024_paper.html) | [Video](https://www.youtube.com/watch?v=DmPhL-8FeT4) This repository contains the official authors implementation associated with the paper "HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting", which can be found [here](https://xdimlab.github.io/hugs_website/). ![image teaser](./assets/teaser.png) Abstract: *Holistic understanding of urban scenes based on RGB images is a challenging yet important problem. It encompasses understanding both the geometry and appearance to enable novel view synthesis, parsing semantic labels, and tracking moving objects. Despite considerable progress, existing approaches often focus on specific aspects of this task and require additional inputs such as LiDAR scans or manually annotated 3D bounding boxes. In this paper, we introduce a novel pipeline that utilizes 3D Gaussian Splatting for holistic urban scene understanding. Our main idea involves the joint optimization of geometry, appearance, semantics, and motion using a combination of static and dynamic 3D Gaussians, where moving object poses are regularized via physical constraints. Our approach offers the ability to render new viewpoints in real-time, yielding 2D and 3D semantic information with high accuracy, and reconstruct dynamic scenes, even in scenarios where 3D bounding box detection are highly noisy. Experimental results on KITTI, KITTI-360, and Virtual KITTI 2 demonstrate the effectiveness of our approach.* ## Cloning the Repository The repository contains submodules, thus please check it out with ```shell # SSH git clone git@github.com:hyzhou404/hugs.git --recursive ``` or ```shell # HTTPS git clone https://github.com/hyzhou404/hugs --recursive ``` ## Prepare Enviroments Create conda environment: ```shell conda create -n hugs python=3.10 -y ``` Please install [PyTorch](https://pytorch.org/), [tiny-cuda-nn](https://github.com/NVlabs/tiny-cuda-nn), [pytorch3d](https://github.com/facebookresearch/pytorch3d/tree/main) and [flow-vis-torch](https://github.com/ChristophReich1996/Optical-Flow-Visualization-PyTorch) by following official instructions. Install submodules by running: ```shell pip install submodules/simple-knn pip install submodules/hugs-rasterization ``` Install remaining packages by running: ```shell pip install -r requirements.txt ``` ## Data & Checkpoints Download we have made available two sequences from KITTI as indicated in our paper. Furthermore, three sequences from KITTI-360 and one sequence from Waymo has also been provided. Download checkpoints from [here](https://huggingface.co/datasets/hyzhou404/hugs_release). ```python unzip ${sequence}.zip ``` ## Rendering Render test views by running: ```shell python render.py -m ${checkpoint_path} --data_type ${dataset_type} --iteration 30000 --affine ``` The variable **dataset_type** is a string, and its value can be one of the following: **kitti**, **kitti360**, or **waymo**. ## Evaluation ``` python metrics.py -m ${checkpoint_path} ``` ## Training This repository only includes the inference code of HUGS. The code for training will be released in future work.

BibTeX

@InProceedings{Zhou_2024_CVPR,
    author    = {Zhou, Hongyu and Shao, Jiahao and Xu, Lu and Bai, Dongfeng and Qiu, Weichao and Liu, Bingbing and Wang, Yue and Geiger, Andreas and Liao, Yiyi},
    title     = {HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {21336-21345}
    }
================================================ FILE: arguments/__init__.py ================================================ # # Copyright (C) 2023, Inria # GRAPHDECO research group, https://team.inria.fr/graphdeco # All rights reserved. # # This software is free for non-commercial, research and evaluation use # under the terms of the LICENSE.md file. # # For inquiries contact george.drettakis@inria.fr # from argparse import ArgumentParser, Namespace import sys import os class GroupParams: pass class ParamGroup: def __init__(self, parser: ArgumentParser, name : str, fill_none = False): group = parser.add_argument_group(name) for key, value in vars(self).items(): shorthand = False if key.startswith("_"): shorthand = True key = key[1:] t = type(value) value = value if not fill_none else None if shorthand: if t == bool: group.add_argument("--" + key, ("-" + key[0:1]), default=value, action="store_true") else: group.add_argument("--" + key, ("-" + key[0:1]), default=value, type=t) else: if t == bool: group.add_argument("--" + key, default=value, action="store_true") else: group.add_argument("--" + key, default=value, type=t) def extract(self, args): group = GroupParams() for arg in vars(args).items(): if arg[0] in vars(self) or ("_" + arg[0]) in vars(self): setattr(group, arg[0], arg[1]) return group class ModelParams(ParamGroup): def __init__(self, parser, sentinel=False): self.sh_degree = 3 self._source_path = "" self._model_path = "" self._images = "images" self._resolution = -1 self._white_background = False self.data_device = "cpu" self.eval = False super().__init__(parser, "Loading Parameters", sentinel) def extract(self, args): g = super().extract(args) g.source_path = os.path.abspath(g.source_path) return g class PipelineParams(ParamGroup): def __init__(self, parser): self.convert_SHs_python = False self.compute_cov3D_python = False self.debug = False super().__init__(parser, "Pipeline Parameters") class OptimizationParams(ParamGroup): def __init__(self, parser): self.iterations = 30_000 self.position_lr_init = 0.00016 self.position_lr_final = 0.0000016 self.position_lr_delay_mult = 0.01 self.position_lr_max_steps = 30_000 self.feature_lr = 0.0025 self.opacity_lr = 0.05 self.scaling_lr = 0.001 self.rotation_lr = 0.001 self.percent_dense = 0.001 self.lambda_dssim = 0.2 self.densification_interval = 100 self.opacity_reset_interval = 3000 self.densify_from_iter = 500 self.densify_until_iter = 15_000 self.densify_grad_threshold = 0.0002 super().__init__(parser, "Optimization Parameters") def get_combined_args(parser : ArgumentParser): cmdlne_string = sys.argv[1:] cfgfile_string = "Namespace()" args_cmdline = parser.parse_args(cmdlne_string) try: cfgfilepath = os.path.join(args_cmdline.model_path, "cfg_args") print("Looking for config file in", cfgfilepath) with open(cfgfilepath) as cfg_file: print("Config file found: {}".format(cfgfilepath)) cfgfile_string = cfg_file.read() except TypeError: print("Config file not found at") pass args_cfgfile = eval(cfgfile_string) merged_dict = vars(args_cfgfile).copy() for k,v in vars(args_cmdline).items(): if v != None: merged_dict[k] = v return Namespace(**merged_dict) ================================================ FILE: environment.yml ================================================ name: gaussian_splatting channels: - pytorch - conda-forge - defaults dependencies: - cudatoolkit=11.6 - plyfile=0.8.1 - python=3.7.13 - pip=22.3.1 - pytorch=1.12.1 - torchaudio=0.12.1 - torchvision=0.13.1 - tqdm - pip: - submodules/diff-gaussian-rasterization - submodules/simple-knn ================================================ FILE: gaussian_renderer/__init__.py ================================================ # # Copyright (C) 2023, Inria # GRAPHDECO research group, https://team.inria.fr/graphdeco # All rights reserved. # # This software is free for non-commercial, research and evaluation use # under the terms of the LICENSE.md file. # # For inquiries contact george.drettakis@inria.fr # import torch import math from diff_gaussian_rasterization import GaussianRasterizationSettings, GaussianRasterizer from scene.gaussian_model import GaussianModel from utils.sh_utils import eval_sh, RGB2SH from pytorch3d.transforms import quaternion_to_matrix, matrix_to_quaternion def euler2matrix(yaw): cos = torch.cos(-yaw) sin = torch.sin(-yaw) rot = torch.eye(3).float().cuda() rot[0,0] = cos rot[0,2] = sin rot[2,0] = -sin rot[2,2] = cos return rot def cat_bgfg(bg, fg, only_dynamic=False, only_xyz=False): if only_xyz: bg_feats = [bg.get_xyz] else: bg_feats = [bg.get_xyz, bg.get_opacity, bg.get_scaling, bg.get_rotation, bg.get_features, bg.get_3D_features] output = [] for fg_feat, bg_feat in zip(fg, bg_feats): if fg_feat is None: output.append(bg_feat) elif only_dynamic: output.append(fg_feat) else: output.append(torch.cat((bg_feat, fg_feat), dim=0)) return output def cat_all_fg(all_fg, next_fg): output = [] for feat, next_feat in zip(all_fg, next_fg): if feat is None: feat = next_feat else: feat = torch.cat((feat, next_feat), dim=0) output.append(feat) return output def proj_uv(xyz, cam): device = torch.device("cuda" if torch.cuda.is_available() else "cpu") intr = torch.as_tensor(cam.K[:3, :3]).float().to(device) # (3, 3) w2c = torch.tensor(cam.w2c).float().to(device)[:3, :] # (3, 4) c_xyz = (w2c[:3, :3] @ xyz.T).T + w2c[:3, 3] i_xyz = (intr @ c_xyz.mT).mT # (N, 3) uv = i_xyz[:, :2] / i_xyz[:, -1:].clip(1e-3) # (N, 2) return uv def unicycle_b2w(timestamp, model): # model = unicycle_models[track_id]['model'] pred = model(timestamp) if pred is None: return None pred_a, pred_b, pred_v, pred_phi, pred_h = pred # r = euler_angles_to_matrix(torch.tensor([0, pred_phi-torch.pi, 0]), 'XYZ') rt = torch.eye(4).float().cuda() rt[:3,:3] = euler2matrix(pred_phi) rt[1, 3], rt[0, 3], rt[2, 3] = pred_h, pred_a, pred_b return rt def render(viewpoint_camera, prev_viewpoint_camera, pc : GaussianModel, dynamic_gaussians : dict, unicycles : dict, pipe, bg_color : torch.Tensor, render_optical=False, scaling_modifier = 1.0, only_dynamic=False): """ Render the scene. Background tensor (bg_color) must be on GPU! """ timestamp = viewpoint_camera.timestamp all_fg = [None, None, None, None, None, None] prev_all_fg = [None] if len(unicycles) == 0: track_dict = viewpoint_camera.dynamics if prev_viewpoint_camera is not None: prev_track_dict = prev_viewpoint_camera.dynamics else: track_dict, prev_track_dict = {}, {} for track_id, uni_model in unicycles.items(): B2W = unicycle_b2w(timestamp, uni_model['model']) track_dict[track_id] = B2W if prev_viewpoint_camera is not None: prev_B2W = unicycle_b2w(prev_viewpoint_camera.timestamp, uni_model['model']) prev_track_dict[track_id] = prev_B2W for track_id, B2W in track_dict.items(): w_dxyz = (B2W[:3, :3] @ dynamic_gaussians[track_id].get_xyz.T).T + B2W[:3, 3] drot = quaternion_to_matrix(dynamic_gaussians[track_id].get_rotation) w_drot = matrix_to_quaternion(B2W[:3, :3] @ drot) next_fg = [w_dxyz, dynamic_gaussians[track_id].get_opacity, dynamic_gaussians[track_id].get_scaling, w_drot, dynamic_gaussians[track_id].get_features, dynamic_gaussians[track_id].get_3D_features] # next_fg = get_next_fg(dynamic_gaussians[track_id], B2W) # w_dxyz = next_fg[0] all_fg = cat_all_fg(all_fg, next_fg) if render_optical and prev_viewpoint_camera is not None: if track_id in prev_track_dict: prev_B2W = prev_track_dict[track_id] prev_w_dxyz = torch.mm(prev_B2W[:3, :3], dynamic_gaussians[track_id].get_xyz.T).T + prev_B2W[:3, 3] prev_all_fg = cat_all_fg(prev_all_fg, [prev_w_dxyz]) else: prev_all_fg = cat_all_fg(prev_all_fg, [w_dxyz]) xyz, opacity, scales, rotations, shs, feats3D = cat_bgfg(pc, all_fg) if render_optical and prev_viewpoint_camera is not None: prev_xyz = cat_bgfg(pc, prev_all_fg, only_xyz=True)[0] uv = proj_uv(xyz, viewpoint_camera) prev_uv = proj_uv(prev_xyz, prev_viewpoint_camera) delta_uv = uv - prev_uv delta_uv = torch.cat([delta_uv, torch.ones_like(delta_uv[:, :1], device=delta_uv.device)], dim=-1) else: delta_uv = torch.zeros_like(xyz) # Create zero tensor. We will use it to make pytorch return gradients of the 2D (screen-space) means screenspace_points = torch.zeros_like(xyz, dtype=xyz.dtype, requires_grad=True, device="cuda") + 0 try: screenspace_points.retain_grad() except: pass # Set up rasterization configuration tanfovx = math.tan(viewpoint_camera.FoVx * 0.5) tanfovy = math.tan(viewpoint_camera.FoVy * 0.5) if pc.affine: cam_xyz, cam_dir = viewpoint_camera.c2w[:3, 3].cuda(), viewpoint_camera.c2w[:3, 2].cuda() o_enc = pc.pos_enc(cam_xyz[None, :] / 60) d_enc = pc.dir_enc(cam_dir[None, :]) appearance = pc.appearance_model(torch.cat([o_enc, d_enc], dim=1)) * 1e-1 affine_weight, affine_bias = appearance[:, :9].view(3, 3), appearance[:, -3:] affine_weight = affine_weight + torch.eye(3, device=appearance.device) # bg_img = pc.sky_model(enc).view(*rays_d.shape).permute(2, 0, 1).float() raster_settings = GaussianRasterizationSettings( image_height=int(viewpoint_camera.image_height), image_width=int(viewpoint_camera.image_width), tanfovx=tanfovx, tanfovy=tanfovy, bg=bg_color, scale_modifier=scaling_modifier, viewmatrix=viewpoint_camera.world_view_transform, projmatrix=viewpoint_camera.full_proj_transform, sh_degree=pc.active_sh_degree, campos=viewpoint_camera.camera_center, prefiltered=False, debug=pipe.debug ) rasterizer = GaussianRasterizer(raster_settings=raster_settings) means3D = xyz means2D = screenspace_points cov3D_precomp = None colors_precomp = None # Rasterize visible Gaussians to image, obtain their radii (on screen). rendered_image, radii, feats, depth, flow = rasterizer( means3D = means3D, means2D = means2D, shs = shs, colors_precomp = colors_precomp, opacities = opacity, scales = scales, rotations = rotations, cov3D_precomp = cov3D_precomp, feats3D = feats3D, delta = delta_uv) if pc.affine: colors = rendered_image.view(3, -1).permute(1, 0) # (H*W, 3) refined_image = (colors @ affine_weight + affine_bias).clip(0, 1).permute(1, 0).view(*rendered_image.shape) else: refined_image = rendered_image # Those Gaussians that were frustum culled or had a radius of 0 were not visible. # They will be excluded from value updates used in the splitting criteria. return {"render": refined_image, "feats": feats, "depth": depth, "opticalflow": flow, "viewspace_points": screenspace_points, "visibility_filter" : radii > 0, "radii": radii} ================================================ FILE: lpipsPyTorch/__init__.py ================================================ import torch from .modules.lpips import LPIPS def lpips(x: torch.Tensor, y: torch.Tensor, net_type: str = 'alex', version: str = '0.1'): r"""Function that measures Learned Perceptual Image Patch Similarity (LPIPS). Arguments: x, y (torch.Tensor): the input tensors to compare. net_type (str): the network type to compare the features: 'alex' | 'squeeze' | 'vgg'. Default: 'alex'. version (str): the version of LPIPS. Default: 0.1. """ device = x.device criterion = LPIPS(net_type, version).to(device) return criterion(x, y) ================================================ FILE: lpipsPyTorch/modules/lpips.py ================================================ import torch import torch.nn as nn from .networks import get_network, LinLayers from .utils import get_state_dict class LPIPS(nn.Module): r"""Creates a criterion that measures Learned Perceptual Image Patch Similarity (LPIPS). Arguments: net_type (str): the network type to compare the features: 'alex' | 'squeeze' | 'vgg'. Default: 'alex'. version (str): the version of LPIPS. Default: 0.1. """ def __init__(self, net_type: str = 'alex', version: str = '0.1'): assert version in ['0.1'], 'v0.1 is only supported now' super(LPIPS, self).__init__() # pretrained network self.net = get_network(net_type) # linear layers self.lin = LinLayers(self.net.n_channels_list) self.lin.load_state_dict(get_state_dict(net_type, version)) def forward(self, x: torch.Tensor, y: torch.Tensor): feat_x, feat_y = self.net(x), self.net(y) diff = [(fx - fy) ** 2 for fx, fy in zip(feat_x, feat_y)] res = [l(d).mean((2, 3), True) for d, l in zip(diff, self.lin)] return torch.sum(torch.cat(res, 0), 0, True) ================================================ FILE: lpipsPyTorch/modules/networks.py ================================================ from typing import Sequence from itertools import chain import torch import torch.nn as nn from torchvision import models from .utils import normalize_activation def get_network(net_type: str): if net_type == 'alex': return AlexNet() elif net_type == 'squeeze': return SqueezeNet() elif net_type == 'vgg': return VGG16() else: raise NotImplementedError('choose net_type from [alex, squeeze, vgg].') class LinLayers(nn.ModuleList): def __init__(self, n_channels_list: Sequence[int]): super(LinLayers, self).__init__([ nn.Sequential( nn.Identity(), nn.Conv2d(nc, 1, 1, 1, 0, bias=False) ) for nc in n_channels_list ]) for param in self.parameters(): param.requires_grad = False class BaseNet(nn.Module): def __init__(self): super(BaseNet, self).__init__() # register buffer self.register_buffer( 'mean', torch.Tensor([-.030, -.088, -.188])[None, :, None, None]) self.register_buffer( 'std', torch.Tensor([.458, .448, .450])[None, :, None, None]) def set_requires_grad(self, state: bool): for param in chain(self.parameters(), self.buffers()): param.requires_grad = state def z_score(self, x: torch.Tensor): return (x - self.mean) / self.std def forward(self, x: torch.Tensor): x = self.z_score(x) output = [] for i, (_, layer) in enumerate(self.layers._modules.items(), 1): x = layer(x) if i in self.target_layers: output.append(normalize_activation(x)) if len(output) == len(self.target_layers): break return output class SqueezeNet(BaseNet): def __init__(self): super(SqueezeNet, self).__init__() self.layers = models.squeezenet1_1(True).features self.target_layers = [2, 5, 8, 10, 11, 12, 13] self.n_channels_list = [64, 128, 256, 384, 384, 512, 512] self.set_requires_grad(False) class AlexNet(BaseNet): def __init__(self): super(AlexNet, self).__init__() self.layers = models.alexnet(True).features self.target_layers = [2, 5, 8, 10, 12] self.n_channels_list = [64, 192, 384, 256, 256] self.set_requires_grad(False) class VGG16(BaseNet): def __init__(self): super(VGG16, self).__init__() self.layers = models.vgg16(weights=models.VGG16_Weights.IMAGENET1K_V1).features self.target_layers = [4, 9, 16, 23, 30] self.n_channels_list = [64, 128, 256, 512, 512] self.set_requires_grad(False) ================================================ FILE: lpipsPyTorch/modules/utils.py ================================================ from collections import OrderedDict import torch def normalize_activation(x, eps=1e-10): norm_factor = torch.sqrt(torch.sum(x ** 2, dim=1, keepdim=True)) return x / (norm_factor + eps) def get_state_dict(net_type: str = 'alex', version: str = '0.1'): # build url url = 'https://raw.githubusercontent.com/richzhang/PerceptualSimilarity/' \ + f'master/lpips/weights/v{version}/{net_type}.pth' # download old_state_dict = torch.hub.load_state_dict_from_url( url, progress=True, map_location=None if torch.cuda.is_available() else torch.device('cpu') ) # rename keys new_state_dict = OrderedDict() for key, val in old_state_dict.items(): new_key = key new_key = new_key.replace('lin', '') new_key = new_key.replace('model.', '') new_state_dict[new_key] = val return new_state_dict ================================================ FILE: metrics.py ================================================ # # Copyright (C) 2023, Inria # GRAPHDECO research group, https://team.inria.fr/graphdeco # All rights reserved. # # This software is free for non-commercial, research and evaluation use # under the terms of the LICENSE.md file. # # For inquiries contact george.drettakis@inria.fr # from pathlib import Path import os from PIL import Image import torch import torchvision.transforms.functional as tf from utils.loss_utils import ssim from lpipsPyTorch import lpips import json from tqdm import tqdm from utils.image_utils import psnr from argparse import ArgumentParser from collections import OrderedDict def readImages(renders_dir, gt_dir): renders = [] gts = [] image_names = [] for fname in os.listdir(renders_dir): render = Image.open(renders_dir / fname) gt = Image.open(gt_dir / fname) renders.append(tf.to_tensor(render).unsqueeze(0)[:, :3, :, :].cuda()) gts.append(tf.to_tensor(gt).unsqueeze(0)[:, :3, :, :].cuda()) image_names.append(fname) return renders, gts, image_names def evaluate(model_paths, write): # import ipdb; ipdb.set_trace() full_dict = {} per_view_dict = {} full_dict_polytopeonly = {} per_view_dict_polytopeonly = {} print("") scene_dir = model_paths[0] print("Scene:", scene_dir) for splits in ['test', 'train']: full_dict[splits] = {} per_view_dict[splits] = {} dir_path = Path(scene_dir) / splits for method in os.listdir(dir_path): print("Method:", method) full_dict[splits][method] = {} per_view_dict[splits][method] = {} method_dir = dir_path / method gt_dir = method_dir/ "gt" renders_dir = method_dir / "renders" renders, gts, image_names = readImages(renders_dir, gt_dir) ssims = [] psnrs = [] lpipss = [] for idx in tqdm(range(len(renders)), desc="Metric evaluation progress"): ssims.append(ssim(renders[idx], gts[idx])) psnrs.append(psnr(renders[idx], gts[idx])) lpipss.append(lpips(renders[idx], gts[idx], net_type='alex')) print(" SSIM : {:>12.7f}".format(torch.tensor(ssims).mean(), ".5")) print(" PSNR : {:>12.7f}".format(torch.tensor(psnrs).mean(), ".5")) print(" LPIPS: {:>12.7f}".format(torch.tensor(lpipss).mean(), ".5")) print("") full_dict[splits][method].update({"SSIM": torch.tensor(ssims).mean().item(), "PSNR": torch.tensor(psnrs).mean().item(), "LPIPS": torch.tensor(lpipss).mean().item()}) per_view_dict[splits][method].update({ "SSIM": OrderedDict(sorted({name: ssim for ssim, name in zip(torch.tensor(ssims).tolist(), image_names)}.items())), "PSNR": OrderedDict(sorted({name: psnr for psnr, name in zip(torch.tensor(psnrs).tolist(), image_names)}.items())), "LPIPS": OrderedDict(sorted({name: lp for lp, name in zip(torch.tensor(lpipss).tolist(), image_names)}.items())) }) if write: with open(scene_dir + "/metric_results.json", 'w') as fp: json.dump(full_dict, fp, indent=True) with open(scene_dir + "/per_view.json", 'w') as fp: json.dump(per_view_dict, fp, indent=True) if __name__ == "__main__": device = torch.device("cuda:0") torch.cuda.set_device(device) # Set up command line argument parser parser = ArgumentParser(description="Training script parameters") parser.add_argument('--model_paths', '-m', required=True, nargs="+", type=str, default=[]) parser.add_argument('--write', action='store_false', default=True) args = parser.parse_args() evaluate(args.model_paths, args.write) ================================================ FILE: render.py ================================================ # # Copyright (C) 2023, Inria # GRAPHDECO research group, https://team.inria.fr/graphdeco # All rights reserved. # # This software is free for non-commercial, research and evaluation use # under the terms of the LICENSE.md file. # # For inquiries contact george.drettakis@inria.fr # import torch from scene import Scene import os from tqdm import tqdm from os import makedirs from gaussian_renderer import render import torchvision from utils.general_utils import safe_state from argparse import ArgumentParser from arguments import ModelParams, PipelineParams, get_combined_args from gaussian_renderer import GaussianModel import numpy as np from copy import deepcopy from torchmetrics.functional import structural_similarity_index_measure as ssim import matplotlib.pyplot as plt from mpl_toolkits.axes_grid1 import make_axes_locatable from matplotlib import cm from utils.semantic_utils import colorize import flow_vis_torch from utils.cmap import color_depth_map from imageio.v2 import imwrite def to4x4(R, T): RT = np.eye(4,4) RT[:3, :3] = R RT[:3, 3] = T return RT def apply_colormap(image, cmap="viridis"): colormap = cm.get_cmap(cmap) colormap = torch.tensor(colormap.colors).to(image.device) # type: ignore image_long = (image * 255).long() image_long_min = torch.min(image_long) image_long_max = torch.max(image_long) assert image_long_min >= 0, f"the min value is {image_long_min}" assert image_long_max <= 255, f"the max value is {image_long_max}" return colormap[image_long[0, ...]].permute(2, 0, 1) def apply_depth_colormap(depth, near_plane=None, far_plane=None, cmap="turbo"): near_plane = near_plane or float(torch.min(depth)) far_plane = far_plane or float(torch.max(depth)) depth = (depth - near_plane) / (far_plane - near_plane + 1e-10) depth = torch.clip(depth, 0, 1) colored_image = apply_colormap(depth, cmap=cmap) return colored_image def render_set(model_path, name, iteration, views, scene, pipeline, background, data_type): render_path = os.path.join(model_path, name, "ours_{}".format(iteration), "renders") semantic_path = os.path.join(model_path, name, "ours_{}".format(iteration), "semantic") optical_path = os.path.join(model_path, name, "ours_{}".format(iteration), "optical") gts_path = os.path.join(model_path, name, "ours_{}".format(iteration), "gt") error_path = os.path.join(model_path, name, "ours_{}".format(iteration), "error_map") depth_path = os.path.join(model_path, name, "ours_{}".format(iteration), "depth") makedirs(render_path, exist_ok=True) makedirs(semantic_path, exist_ok=True) makedirs(optical_path, exist_ok=True) makedirs(gts_path, exist_ok=True) makedirs(error_path, exist_ok=True) makedirs(depth_path, exist_ok=True) for idx, view in enumerate(tqdm(views, desc="Rendering progress")): if data_type == 'kitti': gap = 2 elif data_type == 'kitti360': gap = 4 elif data_type == 'waymo': gap = 1 elif data_type == 'nuscenes' or data_type == 'pandaset': gap = 6 if idx - gap < 0: prev_view = None else: prev_view = views[idx-4] render_pkg = render( view, prev_view, scene.gaussians, scene.dynamic_gaussians, scene.unicycles, pipeline, background, True ) rendering = render_pkg['render'].detach().cpu() semantic = render_pkg['feats'].detach().cpu() semantic = torch.argmax(semantic, dim=0) semantic_rgb = colorize(semantic.detach().cpu().numpy()) depth = render_pkg['depth'] color_depth = color_depth_map(depth[0].detach().cpu().numpy()) color_depth[semantic == 10] = np.array([255.0, 255.0, 255.0]) gt = view.original_image[0:3, :, :] # _, ssim_map = ssim(rendering[None, ...], gt[None, ...], return_full_image=True) # ssim_map = torch.mean(ssim_map[0], dim=0).clip(0, 1)[None, ...] # error_map = 1 - ssim_maps error_map = torch.mean((rendering - gt) ** 2, dim=0)[None, ...] fig = plt.figure(frameon=False) fig.set_size_inches(1.408, 0.376) ax = plt.Axes(fig, [0., 0., 1., 1.]) ax.set_axis_off() fig.add_axes(ax) ax.imshow((error_map.detach().cpu().numpy().transpose(1,2,0)), cmap='jet') plt.savefig(os.path.join(error_path, view.image_name + ".png"), dpi=1000) plt.close('all') torchvision.utils.save_image(rendering, os.path.join(render_path, view.image_name + ".png")) torchvision.utils.save_image(gt, os.path.join(gts_path, view.image_name + ".png")) semantic_rgb.save(os.path.join(semantic_path, view.image_name + ".png")) imwrite(os.path.join(depth_path, view.image_name + ".png"), color_depth) opticalflow = render_pkg["opticalflow"] opticalflow = opticalflow.permute(1,2,0) opticalflow = opticalflow[..., :2] pytorch_optic_rgb = flow_vis_torch.flow_to_color(opticalflow.permute(2, 0, 1)) # (2, h, w) torchvision.utils.save_image(pytorch_optic_rgb.float(), os.path.join(optical_path, view.image_name + ".png"), normalize=True) # torchvision.utils.save_image(error_map, os.path.join(error_path, '{0:05d}'.format(idx) + ".png")) def render_sets(dataset : ModelParams, iteration : int, pipeline : PipelineParams, skip_train : bool, skip_test : bool, data_type, affine, ignore_dynamic): with torch.no_grad(): gaussians = GaussianModel(dataset.sh_degree, affine=affine) scene = Scene(dataset, gaussians, load_iteration=iteration, shuffle=False, data_type=data_type, ignore_dynamic=ignore_dynamic) bg_color = [1,1,1] if dataset.white_background else [0, 0, 0] background = torch.tensor(bg_color, dtype=torch.float32, device="cuda") if not skip_train: render_set(dataset.model_path, "train", scene.loaded_iter, scene.getTrainCameras(), scene, pipeline, background, data_type) if not skip_test: render_set(dataset.model_path, "test", scene.loaded_iter, scene.getTestCameras(), scene, pipeline, background, data_type) if __name__ == "__main__": # Set up command line argument parser parser = ArgumentParser(description="Testing script parameters") model = ModelParams(parser, sentinel=True) pipeline = PipelineParams(parser) parser.add_argument("--iteration", default=-1, type=int) parser.add_argument("--data_type", default='kitti360', type=str) parser.add_argument("--affine", action="store_true") parser.add_argument("--ignore_dynamic", action="store_true") parser.add_argument("--skip_train", action="store_true") parser.add_argument("--skip_test", action="store_true") parser.add_argument("--quiet", action="store_true") args = get_combined_args(parser) print("Rendering " + args.model_path) args.source_path = os.path.join(args.model_path, 'data') # Initialize system state (RNG) # safe_state(args.quiet) render_sets(model.extract(args), args.iteration, pipeline.extract(args), args.skip_train, args.skip_test, args.data_type, args.affine, args.ignore_dynamic) ================================================ FILE: requirements.txt ================================================ config==0.5.1 datasets==2.19.2 # flow_vis_torch==0.1 imageio==2.34.1 matplotlib==3.9.0 network==0.1 numpy==1.26.4 open3d==0.18.0 opencv_python==4.10.0.82 Pillow==10.3.0 plyfile==1.0.3 # pytorch3d==0.7.4 runx==0.0.11 scipy==1.13.1 setuptools==69.5.1 # torch==2.3.1+cu118 torchmetrics==1.4.0.post0 # torchvision==0.18.1+cu118 tqdm==4.66.4 ================================================ FILE: scene/__init__.py ================================================ # # Copyright (C) 2023, Inria # GRAPHDECO research group, https://team.inria.fr/graphdeco # All rights reserved. # # This software is free for non-commercial, research and evaluation use # under the terms of the LICENSE.md file. # # For inquiries contact george.drettakis@inria.fr # import os import random import json from utils.system_utils import searchForMaxIteration from scene.dataset_readers import sceneLoadTypeCallbacks from scene.gaussian_model import GaussianModel from arguments import ModelParams from utils.camera_utils import cameraList_from_camInfos, camera_to_JSON import torch import open3d as o3d import numpy as np from utils.dynamic_utils import create_unicycle_model import shutil class Scene: gaussians : GaussianModel def __init__(self, args : ModelParams, gaussians : GaussianModel, load_iteration=None, shuffle=True, unicycle=False, uc_fit_iter=0, resolution_scales=[1.0], data_type='kitti360', ignore_dynamic=False): """b :param path: Path to colmap scene main folder. """ self.model_path = args.model_path self.loaded_iter = None self.gaussians = gaussians if load_iteration: if load_iteration == -1: self.loaded_iter = searchForMaxIteration(os.path.join(self.model_path, "ckpts")) else: self.loaded_iter = load_iteration print("Loading trained model at iteration {}".format(self.loaded_iter)) self.train_cameras = {} self.test_cameras = {} if os.path.exists(os.path.join(args.source_path, "sparse")): # scene_info = sceneLoadTypeCallbacks["Colmap"](args.source_path, args.images, args.eval) raise NotImplementedError elif os.path.exists(os.path.join(args.source_path, "transforms_train.json")): print("Found transforms_train.json file, assuming Blender data set!") # scene_info = sceneLoadTypeCallbacks["Blender"](args.source_path, args.white_background, args.eval) raise NotImplementedError elif os.path.exists(os.path.join(args.source_path, "meta_data.json")): print("Found meta_data.json file, assuming Studio data set!") scene_info = sceneLoadTypeCallbacks['Studio'](args.source_path, args.white_background, args.eval, data_type, ignore_dynamic) else: assert False, "Could not recognize scene type!" self.dynamic_verts = scene_info.verts self.dynamic_gaussians = {} for track_id in scene_info.verts: self.dynamic_gaussians[track_id] = GaussianModel(args.sh_degree, feat_mutable=False) if unicycle: self.unicycles = create_unicycle_model(scene_info.train_cameras, self.model_path, uc_fit_iter, data_type) else: self.unicycles = {} if not self.loaded_iter: with open(scene_info.ply_path, 'rb') as src_file, open(os.path.join(self.model_path, "input.ply") , 'wb') as dest_file: dest_file.write(src_file.read()) json_cams = [] camlist = [] if scene_info.test_cameras: camlist.extend(scene_info.test_cameras) if scene_info.train_cameras: camlist.extend(scene_info.train_cameras) for id, cam in enumerate(camlist): json_cams.append(camera_to_JSON(id, cam)) with open(os.path.join(self.model_path, "cameras.json"), 'w') as file: json.dump(json_cams, file) shutil.copyfile(os.path.join(args.source_path, 'meta_data.json'), os.path.join(self.model_path, 'meta_data.json')) if shuffle: random.shuffle(scene_info.train_cameras) # Multi-res consistent random shuffling random.shuffle(scene_info.test_cameras) # Multi-res consistent random shuffling self.cameras_extent = scene_info.nerf_normalization["radius"] for resolution_scale in resolution_scales: print("Loading Training Cameras") self.train_cameras[resolution_scale] = cameraList_from_camInfos(scene_info.train_cameras, resolution_scale, args) print("Loading Test Cameras") self.test_cameras[resolution_scale] = cameraList_from_camInfos(scene_info.test_cameras, resolution_scale, args) if self.loaded_iter: (model_params, first_iter) = torch.load(os.path.join(self.model_path, "ckpts", f"chkpnt{self.loaded_iter}.pth")) gaussians.restore(model_params, None) for iid, dynamic_gaussian in self.dynamic_gaussians.items(): (model_params, first_iter) = torch.load(os.path.join(self.model_path, "ckpts", f"dynamic_{iid}_chkpnt{self.loaded_iter}.pth")) dynamic_gaussian.restore(model_params, None) for iid, unicycle_pkg in self.unicycles.items(): model_params = torch.load(os.path.join(self.model_path, "ckpts", f"unicycle_{iid}_chkpnt{self.loaded_iter}.pth")) unicycle_pkg['model'].restore(model_params) else: self.gaussians.create_from_pcd(scene_info.point_cloud, self.cameras_extent) for track_id in self.dynamic_gaussians.keys(): vertices = scene_info.verts[track_id] # init from template l, h, w = vertices[:, 0].max() - vertices[:, 0].min(), vertices[:, 1].max() - vertices[:, 1].min(), vertices[:, 2].max() - vertices[:, 2].min() pcd = o3d.io.read_point_cloud(f"utils/vehicle_template/benz_{data_type}.ply") points = np.array(pcd.points) * np.array([l, h, w]) pcd.points = o3d.utility.Vector3dVector(points) pcd.colors = o3d.utility.Vector3dVector(np.ones_like(points) * 0.5) self.dynamic_gaussians[track_id].create_from_pcd(pcd, self.cameras_extent) def save(self, iteration): # self.gaussians.save_ply(os.path.join(point_cloud_path, "point_cloud.ply")) point_cloud_vis_path = os.path.join(self.model_path, "point_cloud_vis/iteration_{}".format(iteration)) self.gaussians.save_vis_ply(os.path.join(point_cloud_vis_path, "point.ply")) for iid, dynamic_gaussian in self.dynamic_gaussians.items(): dynamic_gaussian.save_vis_ply(os.path.join(point_cloud_vis_path, f"dynamic_{iid}.ply")) def getTrainCameras(self, scale=1.0): return self.train_cameras[scale] def getTestCameras(self, scale=1.0): return self.test_cameras[scale] ================================================ FILE: scene/cameras.py ================================================ # # Copyright (C) 2023, Inria # GRAPHDECO research group, https://team.inria.fr/graphdeco # All rights reserved. # # This software is free for non-commercial, research and evaluation use # under the terms of the LICENSE.md file. # # For inquiries contact george.drettakis@inria.fr # import torch from torch import nn import numpy as np from utils.graphics_utils import getWorld2View2, getProjectionMatrix, fov2focal from utils.general_utils import decode_op class Camera(nn.Module): def __init__(self, colmap_id, R, T, K, FoVx, FoVy, image, image_name, uid, trans=np.array([0.0, 0.0, 0.0]), scale=1.0, data_device="cuda", cx_ratio=None, cy_ratio=None, semantic2d=None, mask=None, timestamp=-1, optical_image=None, dynamics={} ): super(Camera, self).__init__() self.uid = uid self.colmap_id = colmap_id self.R = R self.T = T self.K = K self.FoVx = FoVx self.FoVy = FoVy self.image_name = image_name self.cx_ratio = cx_ratio self.cy_ratio = cy_ratio self.timestamp = timestamp _, self.H, self.W = image.shape self.w2c = np.eye(4) self.w2c[:3, :3] = self.R.T self.w2c[:3, 3] = self.T self.c2w = torch.from_numpy(np.linalg.inv(self.w2c)).cuda() self.fx = fov2focal(self.FoVx, self.W) self.fy = fov2focal(self.FoVy, self.H) self.dynamics = dynamics try: self.data_device = torch.device(data_device) except Exception as e: print(e) print(f"[Warning] Custom device {data_device} failed, fallback to default cuda device" ) self.data_device = torch.device("cuda") self.original_image = image.clamp(0.0, 1.0).to(self.data_device) if semantic2d is not None: self.semantic2d = semantic2d.to(self.data_device) else: self.semantic2d = None if mask is not None: self.mask = torch.from_numpy(mask).bool().to(self.data_device) else: self.mask = None self.image_width = self.original_image.shape[2] self.image_height = self.original_image.shape[1] if optical_image is not None: self.optical_gt = torch.from_numpy(optical_image).to(self.data_device) else: self.optical_gt = None self.zfar = 100.0 self.znear = 0.01 self.trans = trans self.scale = scale self.world_view_transform = torch.tensor(getWorld2View2(R, T, trans, scale)).transpose(0, 1).cuda() self.projection_matrix = getProjectionMatrix(znear=self.znear, zfar=self.zfar, fovX=self.FoVx, fovY=self.FoVy, cx_ratio=cx_ratio, cy_ratio=cy_ratio).transpose(0,1).cuda() self.full_proj_transform = (self.world_view_transform.unsqueeze(0).bmm(self.projection_matrix.unsqueeze(0))).squeeze(0) self.camera_center = self.world_view_transform.inverse()[3, :3] def get_rays(self): i, j = torch.meshgrid(torch.linspace(0, self.W-1, self.W), torch.linspace(0, self.H-1, self.H)) # pytorch's meshgrid has indexing='ij' i = i.t() j = j.t() dirs = torch.stack([(i-self.cx_ratio)/self.fx, -(j-self.cy_ratio)/self.fy, -torch.ones_like(i)], -1) rays_d = torch.sum(dirs[..., np.newaxis, :] * self.c2w[:3,:3], -1).to(self.data_device) rays_o = self.c2w[:3,-1].expand(rays_d.shape).to(self.data_device) rays_d = torch.nn.functional.normalize(rays_d, dim=-1) return rays_o.permute(2,0,1), rays_d.permute(2,0,1) class MiniCam: def __init__(self, width, height, fovy, fovx, znear, zfar, world_view_transform, full_proj_transform): self.image_width = width self.image_height = height self.FoVy = fovy self.FoVx = fovx self.znear = znear self.zfar = zfar self.world_view_transform = world_view_transform self.full_proj_transform = full_proj_transform view_inv = torch.inverse(self.world_view_transform) self.camera_center = view_inv[3][:3] ================================================ FILE: scene/dataset_readers.py ================================================ # # Copyright (C) 2023, Inria # GRAPHDECO research group, https://team.inria.fr/graphdeco # All rights reserved. # # This software is free for non-commercial, research and evaluation use # under the terms of the LICENSE.md file. # # For inquiries contact george.drettakis@inria.fr # import os import sys from PIL import Image from typing import NamedTuple from utils.graphics_utils import getWorld2View2, focal2fov, fov2focal import numpy as np import json from pathlib import Path from plyfile import PlyData, PlyElement from utils.sh_utils import SH2RGB from scene.gaussian_model import BasicPointCloud import torch.nn.functional as F from imageio.v2 import imread import torch import random class CameraInfo(NamedTuple): uid: int R: np.array T: np.array K: np.array FovY: np.array FovX: np.array image: np.array image_path: str image_name: str width: int height: int cx_ratio: float cy_ratio: float semantic2d: np.array optical_image: np.array mask: np.array timestamp: int dynamics: dict class SceneInfo(NamedTuple): point_cloud: BasicPointCloud train_cameras: list test_cameras: list nerf_normalization: dict ply_path: str verts: dict def getNerfppNorm(cam_info): def get_center_and_diag(cam_centers): cam_centers = np.hstack(cam_centers) avg_cam_center = np.mean(cam_centers, axis=1, keepdims=True) center = avg_cam_center dist = np.linalg.norm(cam_centers - center, axis=0, keepdims=True) diagonal = np.max(dist) return center.flatten(), diagonal cam_centers = [] for cam in cam_info: W2C = getWorld2View2(cam.R, cam.T) C2W = np.linalg.inv(W2C) cam_centers.append(C2W[:3, 3:4]) # cam_centers in world coordinate center, diagonal = get_center_and_diag(cam_centers) # radius = diagonal * 1.1 + 30 radius = 10 translate = -center return {"translate": translate, "radius": radius} def fetchPly(path): plydata = PlyData.read(path) vertices = plydata['vertex'] positions = np.vstack([vertices['x'], vertices['y'], vertices['z']]).T if 'red' in vertices: colors = np.vstack([vertices['red'], vertices['green'], vertices['blue']]).T / 255.0 else: print('Create random colors') # shs = np.random.random((positions.shape[0], 3)) / 255.0 shs = np.ones((positions.shape[0], 3)) * 0.5 colors = SH2RGB(shs) # shs = np.ones((positions.shape[0], 3)) * 0.5 # colors = SH2RGB(shs) normals = np.zeros((positions.shape[0], 3)) return BasicPointCloud(points=positions, colors=colors, normals=normals) def storePly(path, xyz, rgb): # Define the dtype for the structured array dtype = [('x', 'f4'), ('y', 'f4'), ('z', 'f4'), ('nx', 'f4'), ('ny', 'f4'), ('nz', 'f4'), ('red', 'u1'), ('green', 'u1'), ('blue', 'u1')] normals = np.zeros_like(xyz) elements = np.empty(xyz.shape[0], dtype=dtype) attributes = np.concatenate((xyz, normals, rgb), axis=1) elements[:] = list(map(tuple, attributes)) # Create the PlyData object and write to file vertex_element = PlyElement.describe(elements, 'vertex') ply_data = PlyData([vertex_element]) ply_data.write(path) def readStudioCameras(path, white_background, data_type, ignore_dynamic): train_cam_infos, test_cam_infos = [], [] with open(os.path.join(path, 'meta_data.json')) as json_file: meta_data = json.load(json_file) verts = {} if 'verts' in meta_data and not ignore_dynamic: verts_list = meta_data['verts'] for k, v in verts_list.items(): verts[k] = np.array(v) frames = meta_data['frames'] for idx, frame in enumerate(frames): matrix = np.linalg.inv(np.array(frame['camtoworld'])) R = matrix[:3, :3] T = matrix[:3, 3] R = np.transpose(R) rgb_path = os.path.join(path, frame['rgb_path'].replace('./', '')) rgb_split = rgb_path.split('/') image_name = '_'.join([rgb_split[-2], rgb_split[-1][:-4]]) image = Image.open(rgb_path) semantic_2d = None semantic_pth = rgb_path.replace("images", "semantics").replace('.png', '.npy').replace('.jpg', '.npy') if os.path.exists(semantic_pth): semantic_2d = np.load(semantic_pth) semantic_2d[(semantic_2d == 14) | (semantic_2d == 15)] = 13 optical_path = rgb_path.replace("images", "flow").replace('.png', '_flow.npy').replace('.jpg', '_flow.npy') if os.path.exists(optical_path): optical_image = np.load(optical_path) else: optical_image = None mask = None mask_path = rgb_path.replace("images", "masks").replace('.png', '.npy').replace('.jpg', '.npy') if os.path.exists(mask_path): mask = np.load(mask_path) timestamp = frame.get('timestamp', -1) intrinsic = np.array(frame['intrinsics']) FovX = focal2fov(intrinsic[0, 0], image.size[0]) FovY = focal2fov(intrinsic[1, 1], image.size[1]) cx, cy = intrinsic[0, 2], intrinsic[1, 2] w, h = image.size dynamics = {} if 'dynamics' in frame and not ignore_dynamic: dynamics_list = frame['dynamics'] for iid in dynamics_list.keys(): dynamics[iid] = torch.tensor(dynamics_list[iid]).cuda() cam_info = CameraInfo(uid=idx, R=R, T=T, K=intrinsic, FovY=FovY, FovX=FovX, image=image, image_path=rgb_path, image_name=image_name, width=image.size[0], height=image.size[1], cx_ratio=2*cx/w, cy_ratio=2*cy/h, semantic2d=semantic_2d, optical_image=optical_image, mask=mask, timestamp=timestamp, dynamics=dynamics) # kitti360 if data_type == 'kitti360': # if 'cam_2' in cam_info.image_name or 'cam_3' in cam_info.image_name: # train_cam_infos.append(cam_info) # # continue if idx < 20: train_cam_infos.append(cam_info) elif idx % 8 < 4: train_cam_infos.append(cam_info) elif idx % 8 >= 4: test_cam_infos.append(cam_info) else: continue elif data_type == 'kitti': if idx < 10 or idx >= len(frames) - 4: train_cam_infos.append(cam_info) elif idx % 4 < 2: train_cam_infos.append(cam_info) elif idx % 4 == 2: test_cam_infos.append(cam_info) else: continue elif data_type == "nuscenes": if idx < 600 or idx >= 1200: continue elif idx % 30 >= 24: # print('test', cam_info.image_name) test_cam_infos.append(cam_info) else: # print('train', cam_info.image_name) train_cam_infos.append(cam_info) elif data_type == "waymo": if idx > 10 and idx % 10 >= 9: test_cam_infos.append(cam_info) else: train_cam_infos.append(cam_info) elif data_type == "pandaset": # if idx >= 360: # continue if idx > 30 and idx % 30 >= 24: test_cam_infos.append(cam_info) else: train_cam_infos.append(cam_info) else: raise NotImplementedError return train_cam_infos, test_cam_infos, verts def readStudioInfo(path, white_background, eval, data_type, ignore_dynamic): train_cam_infos, test_cam_infos, verts = readStudioCameras(path, white_background, data_type, ignore_dynamic) print(f'Loaded {len(train_cam_infos)} train cameras and {len(test_cam_infos)} test cameras') nerf_normalization = getNerfppNorm(train_cam_infos) ply_path = os.path.join(path, "points3d.ply") # ply_path = os.path.join(path, 'lidar', 'cat.ply') if not os.path.exists(ply_path): # Since this data set has no colmap data, we start with random points num_pts = 500_000 print(f"Generating random point cloud ({num_pts})...") # We create random points inside the bounds of the synthetic Blender scenes AABB = [[-20, -25, -20], [20, 5, 80]] xyz = np.random.uniform(AABB[0], AABB[1], (500000, 3)) # xyz = np.load(os.path.join(path, 'lidar_point.npy')) num_pts = xyz.shape[0] shs = np.ones((num_pts, 3)) * 0.5 pcd = BasicPointCloud(points=xyz, colors=SH2RGB(shs), normals=np.zeros((num_pts, 3))) storePly(ply_path, xyz, SH2RGB(shs) * 255) try: pcd = fetchPly(ply_path) except Exception as e: print('When loading point clound, meet error:', e) exit(0) scene_info = SceneInfo(point_cloud=pcd, train_cameras=train_cam_infos, test_cameras=test_cam_infos, nerf_normalization=nerf_normalization, ply_path=ply_path, verts=verts) return scene_info sceneLoadTypeCallbacks = { "Studio": readStudioInfo, } ================================================ FILE: scene/gaussian_model.py ================================================ # # Copyright (C) 2023, Inria # GRAPHDECO research group, https://team.inria.fr/graphdeco # All rights reserved. # # This software is free for non-commercial, research and evaluation use # under the terms of the LICENSE.md file. # # For inquiries contact george.drettakis@inria.fr # import torch import numpy as np from utils.general_utils import inverse_sigmoid, get_expon_lr_func, build_rotation from torch import nn import os from utils.system_utils import mkdir_p from plyfile import PlyData, PlyElement from utils.sh_utils import RGB2SH, SH2RGB from simple_knn._C import distCUDA2 from utils.graphics_utils import BasicPointCloud from utils.general_utils import strip_symmetric, build_scaling_rotation import open3d as o3d import tinycudann as tcnn from math import sqrt class CustomAdam(torch.optim.Optimizer): def __init__(self, params, lr=0.001, betas=(0.9, 0.999), eps=1e-8): defaults = dict(lr=lr, betas=betas, eps=eps) super(CustomAdam, self).__init__(params, defaults) def step(self, custom_lr=None, name=None): for group in self.param_groups: for p in group['params']: if p.grad is None: continue grad = p.grad.data if grad.is_sparse: raise RuntimeError('Adam does not support sparse gradients') state = self.state[p] # State initialization if len(state) == 0: state['step'] = 0 # Exponential moving averages of gradient values state['exp_avg'] = torch.zeros_like(p.data) # Exponential moving averages of squared gradient values state['exp_avg_sq'] = torch.zeros_like(p.data) exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq'] beta1, beta2 = group['betas'] # Add op to update moving averages state['step'] += 1 exp_avg.mul_(beta1).add_(grad, alpha=1.0 - beta1) exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1.0 - beta2) denom = exp_avg_sq.sqrt().add_(group['eps']) bias_correction1 = 1.0 - beta1 ** state['step'] bias_correction2 = 1.0 - beta2 ** state['step'] if (custom_lr is not None) and (name is not None) and (group['name'] in name): step_size = custom_lr[:, None] * group['lr'] * (sqrt(bias_correction2) / bias_correction1) else: step_size = group['lr'] * (sqrt(bias_correction2) / bias_correction1) p.data -= step_size * exp_avg / denom class GaussianModel: def setup_functions(self): def build_covariance_from_scaling_rotation(scaling, scaling_modifier, rotation): L = build_scaling_rotation(scaling_modifier * scaling, rotation) actual_covariance = L @ L.transpose(1, 2) symm = strip_symmetric(actual_covariance) return symm self.scaling_activation = torch.exp self.scaling_inverse_activation = torch.log self.covariance_activation = build_covariance_from_scaling_rotation self.opacity_activation = torch.sigmoid self.inverse_opacity_activation = inverse_sigmoid self.rotation_activation = torch.nn.functional.normalize def __init__(self, sh_degree : int, feat_mutable=True, affine=False): self.active_sh_degree = 0 self.max_sh_degree = sh_degree self._xyz = torch.empty(0) self._features_dc = torch.empty(0) self._features_rest = torch.empty(0) self._feats3D = torch.empty(0) self._scaling = torch.empty(0) self._rotation = torch.empty(0) self._opacity = torch.empty(0) self.max_radii2D = torch.empty(0) self.xyz_gradient_accum = torch.empty(0) self.denom = torch.empty(0) self.optimizer = None self.percent_dense = 0 self.spatial_lr_scale = 0 self.feat_mutable = feat_mutable self.setup_functions() self.pos_enc = tcnn.Encoding( n_input_dims=3, encoding_config={"otype": "Frequency", "n_frequencies": 2}, ) self.dir_enc = tcnn.Encoding( n_input_dims=3, encoding_config={ "otype": "SphericalHarmonics", "degree": 3, }, ) self.affine = affine if affine: self.appearance_model = tcnn.Network( n_input_dims=self.pos_enc.n_output_dims + self.dir_enc.n_output_dims, n_output_dims=12, network_config={ "otype": "FullyFusedMLP", "activation": "ReLU", "output_activation": "None", "n_neurons": 32, "n_hidden_layers": 2, } ) else: self.appearance_model = None def capture(self): return ( self.active_sh_degree, self._xyz, self._features_dc, self._features_rest, self._feats3D, self._scaling, self._rotation, self._opacity, self.max_radii2D, self.xyz_gradient_accum, self.denom, self.optimizer.state_dict(), self.spatial_lr_scale, self.appearance_model, ) def restore(self, model_args, training_args): (self.active_sh_degree, self._xyz, self._features_dc, self._features_rest, self._feats3D, self._scaling, self._rotation, self._opacity, self.max_radii2D, xyz_gradient_accum, denom, opt_dict, self.spatial_lr_scale, self.appearance_model,) = model_args self.xyz_gradient_accum = xyz_gradient_accum self.denom = denom if training_args is not None: self.training_setup(training_args) self.optimizer.load_state_dict(opt_dict) @property def get_scaling(self): return self.scaling_activation(self._scaling) @property def get_rotation(self): return self.rotation_activation(self._rotation) # TODO add get_xyz for dynamic car @property def get_xyz(self): return self._xyz @property def get_features(self): features_dc = self._features_dc features_rest = self._features_rest return torch.cat((features_dc, features_rest), dim=1) @property def get_3D_features(self): return torch.softmax(self._feats3D, dim=-1) @property def get_opacity(self): return self.opacity_activation(self._opacity) def get_covariance(self, scaling_modifier = 1): return self.covariance_activation(self.get_scaling, scaling_modifier, self._rotation) def oneupSHdegree(self): if self.active_sh_degree < self.max_sh_degree: self.active_sh_degree += 1 def create_from_pcd(self, pcd : BasicPointCloud, spatial_lr_scale : float): # self.spatial_lr_scale = 1 self.spatial_lr_scale = spatial_lr_scale fused_point_cloud = torch.tensor(np.asarray(pcd.points)).float().cuda() fused_color = RGB2SH(torch.tensor(np.asarray(pcd.colors)).float().cuda()) features = torch.zeros((fused_color.shape[0], 3, (self.max_sh_degree + 1) ** 2)).float().cuda() features[:, :3, 0 ] = fused_color features[:, 3:, 1:] = 0.0 if self.feat_mutable: feats3D = torch.zeros(fused_color.shape[0], 20).float().cuda() self._feats3D = nn.Parameter(feats3D.requires_grad_(True)) else: feats3D = torch.zeros(fused_color.shape[0], 20).float().cuda() feats3D[:, 13] = 1 self._feats3D = nn.Parameter(feats3D.requires_grad_(True)) print("Number of points at initialisation : ", fused_point_cloud.shape[0]) dist2 = torch.clamp_min(distCUDA2(torch.from_numpy(np.asarray(pcd.points)).float().cuda()), 0.0000001) scales = torch.log(torch.sqrt(dist2))[...,None].repeat(1, 3) rots = torch.zeros((fused_point_cloud.shape[0], 4), device="cuda") rots[:, 0] = 1 opacities = inverse_sigmoid(0.1 * torch.ones((fused_point_cloud.shape[0], 1), dtype=torch.float, device="cuda")) self._xyz = nn.Parameter(fused_point_cloud.requires_grad_(True)) self._features_dc = nn.Parameter(features[:,:,0:1].transpose(1, 2).contiguous().requires_grad_(True)) self._features_rest = nn.Parameter(features[:,:,1:].transpose(1, 2).contiguous().requires_grad_(True)) self._scaling = nn.Parameter(scales.requires_grad_(True)) self._rotation = nn.Parameter(rots.requires_grad_(True)) self._opacity = nn.Parameter(opacities.requires_grad_(True)) self.max_radii2D = torch.zeros((self.get_xyz.shape[0]), device="cuda") def training_setup(self, training_args): self.percent_dense = training_args.percent_dense self.xyz_gradient_accum = torch.zeros((self.get_xyz.shape[0], 1), device="cuda") self.denom = torch.zeros((self.get_xyz.shape[0], 1), device="cuda") # self.spatial_lr_scale /= 3 l = [ {'params': [self._xyz], 'lr': training_args.position_lr_init*self.spatial_lr_scale, "name": "xyz"}, {'params': [self._features_dc], 'lr': training_args.feature_lr, "name": "f_dc"}, {'params': [self._features_rest], 'lr': training_args.feature_lr / 20.0, "name": "f_rest"}, {'params': [self._opacity], 'lr': training_args.opacity_lr, "name": "opacity"}, {'params': [self._scaling], 'lr': training_args.scaling_lr*self.spatial_lr_scale, "name": "scaling"}, {'params': [self._rotation], 'lr': training_args.rotation_lr, "name": "rotation"}, ] if self.affine: l.append({'params': [*self.appearance_model.parameters()], 'lr': 1e-3, "name": "appearance_model"}) if self.feat_mutable: l.append({'params': [self._feats3D], 'lr': 1e-2, "name": "feats3D"}) self.optimizer = torch.optim.Adam(l, lr=0.0, eps=1e-15) # self.optimizer = CustomAdam(l, lr=0.0, eps=1e-15) self.xyz_scheduler_args = get_expon_lr_func(lr_init=training_args.position_lr_init*self.spatial_lr_scale, lr_final=training_args.position_lr_final*self.spatial_lr_scale, lr_delay_mult=training_args.position_lr_delay_mult, max_steps=training_args.position_lr_max_steps) def update_learning_rate(self, iteration): ''' Learning rate scheduling per step ''' for param_group in self.optimizer.param_groups: if param_group["name"] == "xyz": lr = self.xyz_scheduler_args(iteration) param_group['lr'] = lr return lr def construct_list_of_attributes(self): l = ['x', 'y', 'z', 'nx', 'ny', 'nz'] # All channels except the 3 DC for i in range(self._features_dc.shape[1]*self._features_dc.shape[2]): l.append('f_dc_{}'.format(i)) for i in range(self._features_rest.shape[1]*self._features_rest.shape[2]): l.append('f_rest_{}'.format(i)) for i in range(self._feats3D.shape[1]): l.append('semantic_{}'.format(i)) l.append('opacity') for i in range(self._scaling.shape[1]): l.append('scale_{}'.format(i)) for i in range(self._rotation.shape[1]): l.append('rot_{}'.format(i)) return l def save_ply(self, path): mkdir_p(os.path.dirname(path)) xyz = self._xyz.detach().cpu().numpy() normals = np.zeros_like(xyz) f_dc = self._features_dc.detach().transpose(1, 2).flatten(start_dim=1).contiguous().cpu().numpy() f_rest = self._features_rest.detach().transpose(1, 2).flatten(start_dim=1).contiguous().cpu().numpy() feats3D = self._feats3D.detach().cpu().numpy() opacities = self._opacity.detach().cpu().numpy() scale = self._scaling.detach().cpu().numpy() rotation = self._rotation.detach().cpu().numpy() dtype_full = [(attribute, 'f4') for attribute in self.construct_list_of_attributes()] elements = np.empty(xyz.shape[0], dtype=dtype_full) attributes = np.concatenate((xyz, normals, f_dc, f_rest, feats3D, opacities, scale, rotation), axis=1) elements[:] = list(map(tuple, attributes)) el = PlyElement.describe(elements, 'vertex') PlyData([el]).write(path) def save_vis_ply(self, path): mkdir_p(os.path.dirname(path)) xyz = self.get_xyz.detach().cpu().numpy() pcd = o3d.geometry.PointCloud() pcd.points = o3d.utility.Vector3dVector(xyz) colors = SH2RGB(self._features_dc[:, 0, :].detach().cpu().numpy()).clip(0, 1) pcd.colors = o3d.utility.Vector3dVector(colors) o3d.io.write_point_cloud(path, pcd) def reset_opacity(self): opacities_new = inverse_sigmoid(torch.min(self.get_opacity, torch.ones_like(self.get_opacity)*0.01)) optimizable_tensors = self.replace_tensor_to_optimizer(opacities_new, "opacity") self._opacity = optimizable_tensors["opacity"] def load_ply(self, path): plydata = PlyData.read(path) xyz = np.stack((np.asarray(plydata.elements[0]["x"]), np.asarray(plydata.elements[0]["y"]), np.asarray(plydata.elements[0]["z"])), axis=1) opacities = np.asarray(plydata.elements[0]["opacity"])[..., np.newaxis] features_dc = np.zeros((xyz.shape[0], 3, 1)) features_dc[:, 0, 0] = np.asarray(plydata.elements[0]["f_dc_0"]) features_dc[:, 1, 0] = np.asarray(plydata.elements[0]["f_dc_1"]) features_dc[:, 2, 0] = np.asarray(plydata.elements[0]["f_dc_2"]) extra_f_names = [p.name for p in plydata.elements[0].properties if p.name.startswith("f_rest_")] assert len(extra_f_names)==3*(self.max_sh_degree + 1) ** 2 - 3 features_extra = np.zeros((xyz.shape[0], len(extra_f_names))) for idx, attr_name in enumerate(extra_f_names): features_extra[:, idx] = np.asarray(plydata.elements[0][attr_name]) # Reshape (P,F*SH_coeffs) to (P, F, SH_coeffs except DC) features_extra = features_extra.reshape((features_extra.shape[0], 3, (self.max_sh_degree + 1) ** 2 - 1)) scale_names = [p.name for p in plydata.elements[0].properties if p.name.startswith("scale_")] scales = np.zeros((xyz.shape[0], len(scale_names))) for idx, attr_name in enumerate(scale_names): scales[:, idx] = np.asarray(plydata.elements[0][attr_name]) rot_names = [p.name for p in plydata.elements[0].properties if p.name.startswith("rot")] rots = np.zeros((xyz.shape[0], len(rot_names))) for idx, attr_name in enumerate(rot_names): rots[:, idx] = np.asarray(plydata.elements[0][attr_name]) self._xyz = nn.Parameter(torch.tensor(xyz, dtype=torch.float, device="cuda").requires_grad_(True)) self._features_dc = nn.Parameter(torch.tensor(features_dc, dtype=torch.float, device="cuda").transpose(1, 2).contiguous().requires_grad_(True)) self._features_rest = nn.Parameter(torch.tensor(features_extra, dtype=torch.float, device="cuda").transpose(1, 2).contiguous().requires_grad_(True)) self._opacity = nn.Parameter(torch.tensor(opacities, dtype=torch.float, device="cuda").requires_grad_(True)) self._scaling = nn.Parameter(torch.tensor(scales, dtype=torch.float, device="cuda").requires_grad_(True)) self._rotation = nn.Parameter(torch.tensor(rots, dtype=torch.float, device="cuda").requires_grad_(True)) self.active_sh_degree = self.max_sh_degree def replace_tensor_to_optimizer(self, tensor, name): optimizable_tensors = {} for group in self.optimizer.param_groups: if group["name"] == name: stored_state = self.optimizer.state.get(group['params'][0], None) stored_state["exp_avg"] = torch.zeros_like(tensor) stored_state["exp_avg_sq"] = torch.zeros_like(tensor) del self.optimizer.state[group['params'][0]] group["params"][0] = nn.Parameter(tensor.requires_grad_(True)) self.optimizer.state[group['params'][0]] = stored_state optimizable_tensors[group["name"]] = group["params"][0] return optimizable_tensors def _prune_optimizer(self, mask): optimizable_tensors = {} for group in self.optimizer.param_groups: if group['name'] == 'appearance_model': continue stored_state = self.optimizer.state.get(group['params'][0], None) if stored_state is not None: stored_state["exp_avg"] = stored_state["exp_avg"][mask] stored_state["exp_avg_sq"] = stored_state["exp_avg_sq"][mask] del self.optimizer.state[group['params'][0]] group["params"][0] = nn.Parameter((group["params"][0][mask].requires_grad_(True))) self.optimizer.state[group['params'][0]] = stored_state optimizable_tensors[group["name"]] = group["params"][0] else: group["params"][0] = nn.Parameter(group["params"][0][mask].requires_grad_(True)) optimizable_tensors[group["name"]] = group["params"][0] return optimizable_tensors def prune_points(self, mask): valid_points_mask = ~mask optimizable_tensors = self._prune_optimizer(valid_points_mask) self._xyz = optimizable_tensors["xyz"] self._features_dc = optimizable_tensors["f_dc"] self._features_rest = optimizable_tensors["f_rest"] if self.feat_mutable: self._feats3D = optimizable_tensors["feats3D"] else: self._feats3D = self._feats3D[1, :].repeat((self._xyz.shape[0], 1)) self._opacity = optimizable_tensors["opacity"] self._scaling = optimizable_tensors["scaling"] self._rotation = optimizable_tensors["rotation"] self.xyz_gradient_accum = self.xyz_gradient_accum[valid_points_mask] self.denom = self.denom[valid_points_mask] self.max_radii2D = self.max_radii2D[valid_points_mask] def cat_tensors_to_optimizer(self, tensors_dict): optimizable_tensors = {} for group in self.optimizer.param_groups: if group['name'] not in tensors_dict: continue assert len(group["params"]) == 1 extension_tensor = tensors_dict[group["name"]] stored_state = self.optimizer.state.get(group["params"][0], None) if stored_state is not None: stored_state["exp_avg"] = torch.cat((stored_state["exp_avg"], torch.zeros_like(extension_tensor)), dim=0) stored_state["exp_avg_sq"] = torch.cat((stored_state["exp_avg_sq"], torch.zeros_like(extension_tensor)), dim=0) del self.optimizer.state[group["params"][0]] group["params"][0] = nn.Parameter(torch.cat((group["params"][0], extension_tensor), dim=0).requires_grad_(True)) self.optimizer.state[group["params"][0]] = stored_state optimizable_tensors[group["name"]] = group["params"][0] else: group["params"][0] = nn.Parameter(torch.cat((group["params"][0], extension_tensor), dim=0).requires_grad_(True)) optimizable_tensors[group["name"]] = group["params"][0] return optimizable_tensors def densification_postfix(self, new_xyz, new_features_dc, new_features_rest, new_feats3D, new_opacities, new_scaling, new_rotation): d = {"xyz": new_xyz, "f_dc": new_features_dc, "f_rest": new_features_rest, "feats3D": new_feats3D, "opacity": new_opacities, "scaling" : new_scaling, "rotation" : new_rotation} optimizable_tensors = self.cat_tensors_to_optimizer(d) self._xyz = optimizable_tensors["xyz"] self._features_dc = optimizable_tensors["f_dc"] if self.feat_mutable: self._feats3D = optimizable_tensors["feats3D"] else: self._feats3D = self._feats3D[1, :].repeat((self._xyz.shape[0], 1)) self._features_rest = optimizable_tensors["f_rest"] self._opacity = optimizable_tensors["opacity"] self._scaling = optimizable_tensors["scaling"] self._rotation = optimizable_tensors["rotation"] self.xyz_gradient_accum = torch.zeros((self.get_xyz.shape[0], 1), device="cuda") self.denom = torch.zeros((self.get_xyz.shape[0], 1), device="cuda") self.max_radii2D = torch.zeros((self.get_xyz.shape[0]), device="cuda") def densify_and_split(self, grads, grad_threshold, scene_extent, N=2): n_init_points = self.get_xyz.shape[0] # Extract points that satisfy the gradient condition padded_grad = torch.zeros((n_init_points), device="cuda") padded_grad[:grads.shape[0]] = grads.squeeze() selected_pts_mask = torch.where(padded_grad >= grad_threshold, True, False) selected_pts_mask = torch.logical_and(selected_pts_mask, torch.max(self.get_scaling, dim=1).values > self.percent_dense*scene_extent) stds = self.get_scaling[selected_pts_mask].repeat(N,1) means =torch.zeros((stds.size(0), 3),device="cuda") samples = torch.normal(mean=means, std=stds) rots = build_rotation(self._rotation[selected_pts_mask]).repeat(N,1,1) new_xyz = torch.bmm(rots, samples.unsqueeze(-1)).squeeze(-1) + self.get_xyz[selected_pts_mask].repeat(N, 1) new_scaling = self.scaling_inverse_activation(self.get_scaling[selected_pts_mask].repeat(N,1) / (0.8*N)) new_rotation = self._rotation[selected_pts_mask].repeat(N,1) new_features_dc = self._features_dc[selected_pts_mask].repeat(N,1,1) new_features_rest = self._features_rest[selected_pts_mask].repeat(N,1,1) new_feats3D = self._feats3D[selected_pts_mask].repeat(N,1) new_opacity = self._opacity[selected_pts_mask].repeat(N,1) self.densification_postfix(new_xyz, new_features_dc, new_features_rest, new_feats3D, new_opacity, new_scaling, new_rotation) prune_filter = torch.cat((selected_pts_mask, torch.zeros(N * selected_pts_mask.sum(), device="cuda", dtype=bool))) self.prune_points(prune_filter) def densify_and_clone(self, grads, grad_threshold, scene_extent): # Extract points that satisfy the gradient condition selected_pts_mask = torch.where(torch.norm(grads, dim=-1) >= grad_threshold, True, False) selected_pts_mask = torch.logical_and(selected_pts_mask, torch.max(self.get_scaling, dim=1).values <= self.percent_dense*scene_extent) new_xyz = self._xyz[selected_pts_mask] new_features_dc = self._features_dc[selected_pts_mask] new_features_rest = self._features_rest[selected_pts_mask] new_feats3D = self._feats3D[selected_pts_mask] new_opacities = self._opacity[selected_pts_mask] new_scaling = self._scaling[selected_pts_mask] new_rotation = self._rotation[selected_pts_mask] self.densification_postfix(new_xyz, new_features_dc, new_features_rest, new_feats3D, new_opacities, new_scaling, new_rotation) def densify_and_prune(self, max_grad, min_opacity, extent, max_screen_size): grads = self.xyz_gradient_accum / self.denom grads[grads.isnan()] = 0.0 self.densify_and_clone(grads, max_grad, extent) self.densify_and_split(grads, max_grad, extent) prune_mask = (self.get_opacity < min_opacity).squeeze() if max_screen_size: big_points_vs = self.max_radii2D > max_screen_size big_points_ws = self.get_scaling.max(dim=1).values > 0.1 * extent * 10 prune_mask = torch.logical_or(torch.logical_or(prune_mask, big_points_vs), big_points_ws) self.prune_points(prune_mask) torch.cuda.empty_cache() def add_densification_stats(self, viewspace_point_tensor, update_filter): self.xyz_gradient_accum[update_filter] += torch.norm(viewspace_point_tensor.grad[update_filter,:2], dim=-1, keepdim=True) self.denom[update_filter] += 1 def add_densification_stats_grad(self, tensor_grad, update_filter): self.xyz_gradient_accum[update_filter] += torch.norm(tensor_grad[update_filter,:2], dim=-1, keepdim=True) self.denom[update_filter] += 1 ================================================ FILE: submodules/simple-knn/ext.cpp ================================================ /* * Copyright (C) 2023, Inria * GRAPHDECO research group, https://team.inria.fr/graphdeco * All rights reserved. * * This software is free for non-commercial, research and evaluation use * under the terms of the LICENSE.md file. * * For inquiries contact george.drettakis@inria.fr */ #include #include "spatial.h" PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { m.def("distCUDA2", &distCUDA2); } ================================================ FILE: submodules/simple-knn/setup.py ================================================ # # Copyright (C) 2023, Inria # GRAPHDECO research group, https://team.inria.fr/graphdeco # All rights reserved. # # This software is free for non-commercial, research and evaluation use # under the terms of the LICENSE.md file. # # For inquiries contact george.drettakis@inria.fr # from setuptools import setup from torch.utils.cpp_extension import CUDAExtension, BuildExtension import os cxx_compiler_flags = [] if os.name == 'nt': cxx_compiler_flags.append("/wd4624") setup( name="simple_knn", ext_modules=[ CUDAExtension( name="simple_knn._C", sources=[ "spatial.cu", "simple_knn.cu", "ext.cpp"], extra_compile_args={"nvcc": [], "cxx": cxx_compiler_flags}) ], cmdclass={ 'build_ext': BuildExtension } ) ================================================ FILE: submodules/simple-knn/simple_knn/.gitkeep ================================================ ================================================ FILE: submodules/simple-knn/simple_knn.cu ================================================ /* * Copyright (C) 2023, Inria * GRAPHDECO research group, https://team.inria.fr/graphdeco * All rights reserved. * * This software is free for non-commercial, research and evaluation use * under the terms of the LICENSE.md file. * * For inquiries contact george.drettakis@inria.fr */ #define BOX_SIZE 1024 #include "cuda_runtime.h" #include "device_launch_parameters.h" #include "simple_knn.h" #include #include #include #include #include #include #define __CUDACC__ #include #include namespace cg = cooperative_groups; struct CustomMin { __device__ __forceinline__ float3 operator()(const float3& a, const float3& b) const { return { min(a.x, b.x), min(a.y, b.y), min(a.z, b.z) }; } }; struct CustomMax { __device__ __forceinline__ float3 operator()(const float3& a, const float3& b) const { return { max(a.x, b.x), max(a.y, b.y), max(a.z, b.z) }; } }; __host__ __device__ uint32_t prepMorton(uint32_t x) { x = (x | (x << 16)) & 0x030000FF; x = (x | (x << 8)) & 0x0300F00F; x = (x | (x << 4)) & 0x030C30C3; x = (x | (x << 2)) & 0x09249249; return x; } __host__ __device__ uint32_t coord2Morton(float3 coord, float3 minn, float3 maxx) { uint32_t x = prepMorton(((coord.x - minn.x) / (maxx.x - minn.x)) * ((1 << 10) - 1)); uint32_t y = prepMorton(((coord.y - minn.y) / (maxx.y - minn.y)) * ((1 << 10) - 1)); uint32_t z = prepMorton(((coord.z - minn.z) / (maxx.z - minn.z)) * ((1 << 10) - 1)); return x | (y << 1) | (z << 2); } __global__ void coord2Morton(int P, const float3* points, float3 minn, float3 maxx, uint32_t* codes) { auto idx = cg::this_grid().thread_rank(); if (idx >= P) return; codes[idx] = coord2Morton(points[idx], minn, maxx); } struct MinMax { float3 minn; float3 maxx; }; __global__ void boxMinMax(uint32_t P, float3* points, uint32_t* indices, MinMax* boxes) { auto idx = cg::this_grid().thread_rank(); MinMax me; if (idx < P) { me.minn = points[indices[idx]]; me.maxx = points[indices[idx]]; } else { me.minn = { FLT_MAX, FLT_MAX, FLT_MAX }; me.maxx = { -FLT_MAX,-FLT_MAX,-FLT_MAX }; } __shared__ MinMax redResult[BOX_SIZE]; for (int off = BOX_SIZE / 2; off >= 1; off /= 2) { if (threadIdx.x < 2 * off) redResult[threadIdx.x] = me; __syncthreads(); if (threadIdx.x < off) { MinMax other = redResult[threadIdx.x + off]; me.minn.x = min(me.minn.x, other.minn.x); me.minn.y = min(me.minn.y, other.minn.y); me.minn.z = min(me.minn.z, other.minn.z); me.maxx.x = max(me.maxx.x, other.maxx.x); me.maxx.y = max(me.maxx.y, other.maxx.y); me.maxx.z = max(me.maxx.z, other.maxx.z); } __syncthreads(); } if (threadIdx.x == 0) boxes[blockIdx.x] = me; } __device__ __host__ float distBoxPoint(const MinMax& box, const float3& p) { float3 diff = { 0, 0, 0 }; if (p.x < box.minn.x || p.x > box.maxx.x) diff.x = min(abs(p.x - box.minn.x), abs(p.x - box.maxx.x)); if (p.y < box.minn.y || p.y > box.maxx.y) diff.y = min(abs(p.y - box.minn.y), abs(p.y - box.maxx.y)); if (p.z < box.minn.z || p.z > box.maxx.z) diff.z = min(abs(p.z - box.minn.z), abs(p.z - box.maxx.z)); return diff.x * diff.x + diff.y * diff.y + diff.z * diff.z; } template __device__ void updateKBest(const float3& ref, const float3& point, float* knn) { float3 d = { point.x - ref.x, point.y - ref.y, point.z - ref.z }; float dist = d.x * d.x + d.y * d.y + d.z * d.z; for (int j = 0; j < K; j++) { if (knn[j] > dist) { float t = knn[j]; knn[j] = dist; dist = t; } } } __global__ void boxMeanDist(uint32_t P, float3* points, uint32_t* indices, MinMax* boxes, float* dists) { int idx = cg::this_grid().thread_rank(); if (idx >= P) return; float3 point = points[indices[idx]]; float best[3] = { FLT_MAX, FLT_MAX, FLT_MAX }; for (int i = max(0, idx - 3); i <= min(P - 1, idx + 3); i++) { if (i == idx) continue; updateKBest<3>(point, points[indices[i]], best); } float reject = best[2]; best[0] = FLT_MAX; best[1] = FLT_MAX; best[2] = FLT_MAX; for (int b = 0; b < (P + BOX_SIZE - 1) / BOX_SIZE; b++) { MinMax box = boxes[b]; float dist = distBoxPoint(box, point); if (dist > reject || dist > best[2]) continue; for (int i = b * BOX_SIZE; i < min(P, (b + 1) * BOX_SIZE); i++) { if (i == idx) continue; updateKBest<3>(point, points[indices[i]], best); } } dists[indices[idx]] = (best[0] + best[1] + best[2]) / 3.0f; } void SimpleKNN::knn(int P, float3* points, float* meanDists) { float3* result; cudaMalloc(&result, sizeof(float3)); size_t temp_storage_bytes; float3 init = { 0, 0, 0 }, minn, maxx; cub::DeviceReduce::Reduce(nullptr, temp_storage_bytes, points, result, P, CustomMin(), init); thrust::device_vector temp_storage(temp_storage_bytes); cub::DeviceReduce::Reduce(temp_storage.data().get(), temp_storage_bytes, points, result, P, CustomMin(), init); cudaMemcpy(&minn, result, sizeof(float3), cudaMemcpyDeviceToHost); cub::DeviceReduce::Reduce(temp_storage.data().get(), temp_storage_bytes, points, result, P, CustomMax(), init); cudaMemcpy(&maxx, result, sizeof(float3), cudaMemcpyDeviceToHost); thrust::device_vector morton(P); thrust::device_vector morton_sorted(P); coord2Morton << <(P + 255) / 256, 256 >> > (P, points, minn, maxx, morton.data().get()); thrust::device_vector indices(P); thrust::sequence(indices.begin(), indices.end()); thrust::device_vector indices_sorted(P); cub::DeviceRadixSort::SortPairs(nullptr, temp_storage_bytes, morton.data().get(), morton_sorted.data().get(), indices.data().get(), indices_sorted.data().get(), P); temp_storage.resize(temp_storage_bytes); cub::DeviceRadixSort::SortPairs(temp_storage.data().get(), temp_storage_bytes, morton.data().get(), morton_sorted.data().get(), indices.data().get(), indices_sorted.data().get(), P); uint32_t num_boxes = (P + BOX_SIZE - 1) / BOX_SIZE; thrust::device_vector boxes(num_boxes); boxMinMax << > > (P, points, indices_sorted.data().get(), boxes.data().get()); boxMeanDist << > > (P, points, indices_sorted.data().get(), boxes.data().get(), meanDists); cudaFree(result); } ================================================ FILE: submodules/simple-knn/simple_knn.h ================================================ /* * Copyright (C) 2023, Inria * GRAPHDECO research group, https://team.inria.fr/graphdeco * All rights reserved. * * This software is free for non-commercial, research and evaluation use * under the terms of the LICENSE.md file. * * For inquiries contact george.drettakis@inria.fr */ #ifndef SIMPLEKNN_H_INCLUDED #define SIMPLEKNN_H_INCLUDED class SimpleKNN { public: static void knn(int P, float3* points, float* meanDists); }; #endif ================================================ FILE: submodules/simple-knn/spatial.cu ================================================ /* * Copyright (C) 2023, Inria * GRAPHDECO research group, https://team.inria.fr/graphdeco * All rights reserved. * * This software is free for non-commercial, research and evaluation use * under the terms of the LICENSE.md file. * * For inquiries contact george.drettakis@inria.fr */ #include "spatial.h" #include "simple_knn.h" torch::Tensor distCUDA2(const torch::Tensor& points) { const int P = points.size(0); auto float_opts = points.options().dtype(torch::kFloat32); torch::Tensor means = torch::full({P}, 0.0, float_opts); SimpleKNN::knn(P, (float3*)points.contiguous().data(), means.contiguous().data()); return means; } ================================================ FILE: submodules/simple-knn/spatial.h ================================================ /* * Copyright (C) 2023, Inria * GRAPHDECO research group, https://team.inria.fr/graphdeco * All rights reserved. * * This software is free for non-commercial, research and evaluation use * under the terms of the LICENSE.md file. * * For inquiries contact george.drettakis@inria.fr */ #include torch::Tensor distCUDA2(const torch::Tensor& points); ================================================ FILE: utils/camera_utils.py ================================================ # # Copyright (C) 2023, Inria # GRAPHDECO research group, https://team.inria.fr/graphdeco # All rights reserved. # # This software is free for non-commercial, research and evaluation use # under the terms of the LICENSE.md file. # # For inquiries contact george.drettakis@inria.fr # from scene.cameras import Camera import numpy as np from utils.general_utils import PILtoTorch, PIL2toTorch from utils.graphics_utils import fov2focal import torch WARNED = False def loadCam(args, id, cam_info, resolution_scale): orig_w, orig_h = cam_info.image.size if args.resolution in [1, 2, 4, 8]: resolution = round(orig_w/(resolution_scale * args.resolution)), round(orig_h/(resolution_scale * args.resolution)) else: # should be a type that converts to float if args.resolution == -1: if orig_w > 1600: global WARNED if not WARNED: print("[ INFO ] Encountered quite large input images (>1.6K pixels width), rescaling to 1.6K.\n " "If this is not desired, please explicitly specify '--resolution/-r' as 1") WARNED = True global_down = orig_w / 1600 else: global_down = 1 else: global_down = orig_w / args.resolution scale = float(global_down) * float(resolution_scale) resolution = (int(orig_w / scale), int(orig_h / scale)) resized_image_rgb = PILtoTorch(cam_info.image, resolution) if cam_info.semantic2d is not None: semantic2d = torch.from_numpy(cam_info.semantic2d).long()[None, ...] else: semantic2d = None optical_image = cam_info.optical_image mask = cam_info.mask gt_image = resized_image_rgb[:3, ...] return Camera(colmap_id=cam_info.uid, R=cam_info.R, T=cam_info.T, K=cam_info.K, FoVx=cam_info.FovX, FoVy=cam_info.FovY, image=gt_image, image_name=cam_info.image_name, uid=id, data_device=args.data_device, cx_ratio=cam_info.cx_ratio, cy_ratio=cam_info.cy_ratio, semantic2d=semantic2d, mask=mask, timestamp=cam_info.timestamp, optical_image=optical_image, dynamics=cam_info.dynamics) def cameraList_from_camInfos(cam_infos, resolution_scale, args): camera_list = [] for id, c in enumerate(cam_infos): camera_list.append(loadCam(args, id, c, resolution_scale)) return camera_list def camera_to_JSON(id, camera : Camera): Rt = np.zeros((4, 4)) Rt[:3, :3] = camera.R.transpose() Rt[:3, 3] = camera.T Rt[3, 3] = 1.0 W2C = np.linalg.inv(Rt) pos = W2C[:3, 3] rot = W2C[:3, :3] serializable_array_2d = [x.tolist() for x in rot] camera_entry = { 'id' : id, 'img_name' : camera.image_name, 'width' : camera.width, 'height' : camera.height, 'position': pos.tolist(), 'rotation': serializable_array_2d, 'fy' : fov2focal(camera.FovY, camera.height), 'fx' : fov2focal(camera.FovX, camera.width), } return camera_entry ================================================ FILE: utils/cmap.py ================================================ import numpy as np _color_map_errors = np.array([ [149, 54, 49], #0: log2(x) = -infinity [180, 117, 69], #0.0625: log2(x) = -4 [209, 173, 116], #0.125: log2(x) = -3 [233, 217, 171], #0.25: log2(x) = -2 [248, 243, 224], #0.5: log2(x) = -1 [144, 224, 254], #1.0: log2(x) = 0 [97, 174, 253], #2.0: log2(x) = 1 [67, 109, 244], #4.0: log2(x) = 2 [39, 48, 215], #8.0: log2(x) = 3 [38, 0, 165], #16.0: log2(x) = 4 [38, 0, 165] #inf: log2(x) = inf ]).astype(float) def color_error_image(errors, scale=1, mask=None, BGR=True): """ Color an input error map. Arguments: errors -- HxW numpy array of errors [scale=1] -- scaling the error map (color change at unit error) [mask=None] -- zero-pixels are masked white in the result [BGR=True] -- toggle between BGR and RGB Returns: colored_errors -- HxWx3 numpy array visualizing the errors """ errors_flat = errors.flatten() errors_color_indices = np.clip(np.log2(errors_flat / scale + 1e-5) + 5, 0, 9) i0 = np.floor(errors_color_indices).astype(int) f1 = errors_color_indices - i0.astype(float) colored_errors_flat = _color_map_errors[i0, :] * (1-f1).reshape(-1,1) + _color_map_errors[i0+1, :] * f1.reshape(-1,1) if mask is not None: colored_errors_flat[mask.flatten() == 0] = 255 if not BGR: colored_errors_flat = colored_errors_flat[:,[2,1,0]] return colored_errors_flat.reshape(errors.shape[0], errors.shape[1], 3).astype(np.int) _color_map_depths = np.array([ [0, 0, 0], # 0.000 [0, 0, 255], # 0.114 [255, 0, 0], # 0.299 [255, 0, 255], # 0.413 [0, 255, 0], # 0.587 [0, 255, 255], # 0.701 [255, 255, 0], # 0.886 [255, 255, 255], # 1.000 [255, 255, 255], # 1.000 ]).astype(float) _color_map_bincenters = np.array([ 0.0, 0.114, 0.299, 0.413, 0.587, 0.701, 0.886, 1.000, 2.000, # doesn't make a difference, just strictly higher than 1 ]) def color_depth_map(depths, scale=None): """ Color an input depth map. Arguments: depths -- HxW numpy array of depths [scale=None] -- scaling the values (defaults to the maximum depth) Returns: colored_depths -- HxWx3 numpy array visualizing the depths """ # if scale is None: # scale = depths.max() / 1.5 scale = 50 values = np.clip(depths.flatten() / scale, 0, 1) # for each value, figure out where they fit in in the bincenters: what is the last bincenter smaller than this value? lower_bin = ((values.reshape(-1, 1) >= _color_map_bincenters.reshape(1,-1)) * np.arange(0,9)).max(axis=1) lower_bin_value = _color_map_bincenters[lower_bin] higher_bin_value = _color_map_bincenters[lower_bin + 1] alphas = (values - lower_bin_value) / (higher_bin_value - lower_bin_value) colors = _color_map_depths[lower_bin] * (1-alphas).reshape(-1,1) + _color_map_depths[lower_bin + 1] * alphas.reshape(-1,1) return colors.reshape(depths.shape[0], depths.shape[1], 3).astype(np.uint8) ================================================ FILE: utils/dynamic_utils.py ================================================ import numpy as np import torch from torch import optim from torch import nn from tqdm import tqdm from matplotlib import pyplot as plt import torch.nn.functional as F from collections import defaultdict import os def rot2Euler(R): sy = torch.sqrt(R[0,0] * R[0,0] + R[1,0] * R[1,0]) singular = sy < 1e-6 if not singular: x = torch.atan2(R[2,1] , R[2,2]) y = torch.atan2(-R[2,0], sy) z = torch.atan2(R[1,0], R[0,0]) else: x = torch.atan2(-R[1,2], R[1,1]) y = torch.atan2(-R[2,0], sy) z = 0 return torch.stack([x,y,z]) class unicycle(torch.nn.Module): def __init__(self, train_timestamp, centers=None, heights=None, phis=None): super(unicycle, self).__init__() self.train_timestamp = train_timestamp self.delta = torch.diff(self.train_timestamp) self.input_a = centers[:, 0].clone() self.input_b = centers[:, 1].clone() if centers is None: self.a = nn.Parameter(torch.zeros_like(train_timestamp).float()) self.b = nn.Parameter(torch.zeros_like(train_timestamp).float()) else: self.a = nn.Parameter(centers[:, 0]) self.b = nn.Parameter(centers[:, 1]) diff_a = torch.diff(centers[:, 0]) / self.delta diff_b = torch.diff(centers[:, 1]) / self.delta v = torch.sqrt(diff_a ** 2 + diff_b**2) self.v = nn.Parameter(F.pad(v, (0, 1), 'constant', v[-1].item())) self.phi = nn.Parameter(phis) if heights is None: self.h = nn.Parameter(torch.zeros_like(train_timestamp).float()) else: self.h = nn.Parameter(heights) def acc_omega(self): acc = torch.diff(self.v) / self.delta omega = torch.diff(self.phi) / self.delta acc = F.pad(acc, (0, 1), 'constant', acc[-1].item()) omega = F.pad(omega, (0, 1), 'constant', omega[-1].item()) return acc, omega def forward(self, timestamps): idx = torch.searchsorted(self.train_timestamp, timestamps, side='left') invalid = (idx == self.train_timestamp.shape[0]) idx[invalid] -= 1 idx[self.train_timestamp[idx] != timestamps] -= 1 idx[invalid] += 1 prev_timestamps = self.train_timestamp[idx] delta_t = timestamps - prev_timestamps prev_a, prev_b = self.a[idx], self.b[idx] prev_v, prev_phi = self.v[idx], self.phi[idx] acc, omega = self.acc_omega() v = prev_v + acc[idx] * delta_t phi = prev_phi + omega[idx] * delta_t a = prev_a + prev_v * ((torch.sin(phi) - torch.sin(prev_phi)) / (omega[idx] + 1e-6)) b = prev_b - prev_v * ((torch.cos(phi) - torch.cos(prev_phi)) / (omega[idx] + 1e-6)) h = self.h[idx] return a, b, v, phi, h def capture(self): return ( self.a, self.b, self.v, self.phi, self.h, self.train_timestamp, self.delta ) def restore(self, model_args): ( self.a, self.b, self.v, self.phi, self.h, self.train_timestamp, self.delta ) = model_args def visualize(self, save_path, noise_centers=None, gt_centers=None): a, b, _, phi, _ = self.forward(self.train_timestamp) a = a.detach().cpu().numpy() b = b.detach().cpu().numpy() phi = phi.detach().cpu().numpy() plt.scatter(a, b, marker='x', color='b') plt.quiver(a, b, np.ones_like(a) * np.cos(phi), np.ones_like(b) * np.sin(phi), scale=20, width=0.005) if noise_centers is not None: noise_centers = noise_centers.detach().cpu().numpy() plt.scatter(noise_centers[:, 0], noise_centers[:, 1], marker='o', color='gray') if gt_centers is not None: gt_centers = gt_centers.detach().cpu().numpy() plt.scatter(gt_centers[:, 0], gt_centers[:, 1], marker='v', color='g') plt.axis('equal') plt.savefig(save_path) plt.close() def reg_loss(self): reg = 0 acc, omega = self.acc_omega() reg += torch.mean(torch.abs(torch.diff(acc))) * 1 reg += torch.mean(torch.abs(torch.diff(omega))) * 1 reg_a_motion = self.v[:-1] * ((torch.sin(self.phi[1:]) - torch.sin(self.phi[:-1])) / (omega[:-1] + 1e-6)) reg_b_motion = -self.v[:-1] * ((torch.cos(self.phi[1:]) - torch.cos(self.phi[:-1])) / (omega[:-1] + 1e-6)) reg_a = self.a[:-1] + reg_a_motion reg_b = self.b[:-1] + reg_b_motion reg += torch.mean((reg_a - self.a[1:])**2 + (reg_b - self.b[1:])**2) * 1 return reg def pos_loss(self): # a, b, _, _, _ = self.forward(self.train_timestamp) return torch.mean((self.a - self.input_a) ** 2 + (self.b - self.input_b) ** 2) * 10 def create_unicycle_model(train_cams, model_path, opt_iter=0, data_type='kitti'): unicycle_models = {} if data_type == 'kitti': cameras = [cam for cam in train_cams if 'cam_0' in cam.image_name] elif data_type == 'waymo': cameras = [cam for cam in train_cams if 'cam_1' in cam.image_name] else: raise NotImplementedError all_centers, all_heights, all_phis, all_timestamps = defaultdict(list), defaultdict(list), defaultdict(list), defaultdict(list) seq_timestamps = [] for cam in cameras: t = cam.timestamp seq_timestamps.append(t) for track_id, b2w in cam.dynamics.items(): all_centers[track_id].append(b2w[[0, 2], 3]) all_heights[track_id].append(b2w[1, 3]) eulers = rot2Euler(b2w[:3, :3]) all_phis[track_id].append(eulers[1]) all_timestamps[track_id].append(t) for track_id in all_centers.keys(): centers = torch.stack(all_centers[track_id], dim=0).cuda() timestamps = torch.tensor(all_timestamps[track_id]).cuda() heights = torch.tensor(all_heights[track_id]).cuda() phis = torch.tensor(all_phis[track_id]).cuda() + torch.pi model = unicycle(timestamps, centers.clone(), heights.clone(), phis.clone()) l = [ {'params': [model.a], 'lr': 1e-2, "name": "a"}, {'params': [model.b], 'lr': 1e-2, "name": "b"}, {'params': [model.v], 'lr': 1e-3, "name": "v"}, {'params': [model.phi], 'lr': 1e-4, "name": "phi"}, {'params': [model.h], 'lr': 0, "name": "h"} ] optimizer = optim.Adam(l, lr=0.0) t_range = tqdm(range(opt_iter), desc=f"Fitting {track_id}") for iter in t_range: loss = 0.2 * model.pos_loss() + model.reg_loss() t_range.set_postfix({'loss': loss.item()}) optimizer.zero_grad() loss.backward() optimizer.step() unicycle_models[track_id] = {'model': model, 'optimizer': optimizer, 'input_centers': centers} os.makedirs(os.path.join(model_path, "unicycle"), exist_ok=True) for track_id, unicycle_pkg in unicycle_models.items(): model = unicycle_pkg['model'] optimizer = unicycle_pkg['optimizer'] model.visualize(os.path.join(model_path, "unicycle", f"{track_id}_init.png"), # noise_centers=unicycle_pkg['input_centers'] ) # gt_centers=gt_centers) return unicycle_models ================================================ FILE: utils/general_utils.py ================================================ # # Copyright (C) 2023, Inria # GRAPHDECO research group, https://team.inria.fr/graphdeco # All rights reserved. # # This software is free for non-commercial, research and evaluation use # under the terms of the LICENSE.md file. # # For inquiries contact george.drettakis@inria.fr # import torch import sys from datetime import datetime import numpy as np import random import os import cv2 def inverse_sigmoid(x): return torch.log(x/(1-x)) def PILtoTorch(pil_image, resolution): resized_image_PIL = pil_image.resize(resolution) resized_image = torch.from_numpy(np.array(resized_image_PIL)) / 255.0 if len(resized_image.shape) == 3: return resized_image.permute(2, 0, 1) else: return resized_image.unsqueeze(dim=-1).permute(2, 0, 1) def PIL2toTorch(pil_image, resolution): resized_image_PIL = pil_image.resize(resolution) resized_image = torch.from_numpy(np.array(resized_image_PIL)) / 255.0 * (2.0 ** 16 - 1.0) return resized_image def decode_op(optical_png): # use 'PIL Image.Open' to READ "Convert from .png (h, w, 3-rgb) -> (h,w,2)(flow_x, flow_y) .. float32 array" optical_png = optical_png[..., [2, 1, 0]] # bgr -> rgb h, w, _c = optical_png.shape assert optical_png.dtype == np.uint16 and _c == 3 "invalid flow flag: b == 0 for sky or other invalid flow" invalid_points = np.where(optical_png[..., 2] == 0) out_flow = torch.empty((h, w, 2)) decoded = 2.0 / (2**16 - 1.0) * optical_png.astype('f4') - 1 out_flow[..., 0] = torch.tensor(decoded[:, :, 0] * (w - 1)) # (pixel) delta_x : R out_flow[..., 1] = torch.tensor(decoded[:, :, 1] * (h - 1)) # delta_y : G out_flow[invalid_points[0], invalid_points[1], :] = 0 # B=0 for invalid flow return out_flow def get_expon_lr_func( lr_init, lr_final, lr_delay_steps=0, lr_delay_mult=1.0, max_steps=1000000 ): """ Copied from Plenoxels Continuous learning rate decay function. Adapted from JaxNeRF The returned rate is lr_init when step=0 and lr_final when step=max_steps, and is log-linearly interpolated elsewhere (equivalent to exponential decay). If lr_delay_steps>0 then the learning rate will be scaled by some smooth function of lr_delay_mult, such that the initial learning rate is lr_init*lr_delay_mult at the beginning of optimization but will be eased back to the normal learning rate when steps>lr_delay_steps. :param conf: config subtree 'lr' or similar :param max_steps: int, the number of steps during optimization. :return HoF which takes step as input """ def helper(step): if step < 0 or (lr_init == 0.0 and lr_final == 0.0): # Disable this parameter return 0.0 if lr_delay_steps > 0: # A kind of reverse cosine decay. delay_rate = lr_delay_mult + (1 - lr_delay_mult) * np.sin( 0.5 * np.pi * np.clip(step / lr_delay_steps, 0, 1) ) else: delay_rate = 1.0 t = np.clip(step / max_steps, 0, 1) log_lerp = np.exp(np.log(lr_init) * (1 - t) + np.log(lr_final) * t) return delay_rate * log_lerp return helper def strip_lowerdiag(L): uncertainty = torch.zeros((L.shape[0], 6), dtype=torch.float, device="cuda") uncertainty[:, 0] = L[:, 0, 0] uncertainty[:, 1] = L[:, 0, 1] uncertainty[:, 2] = L[:, 0, 2] uncertainty[:, 3] = L[:, 1, 1] uncertainty[:, 4] = L[:, 1, 2] uncertainty[:, 5] = L[:, 2, 2] return uncertainty def strip_symmetric(sym): return strip_lowerdiag(sym) def build_rotation(r): norm = torch.sqrt(r[:,0]*r[:,0] + r[:,1]*r[:,1] + r[:,2]*r[:,2] + r[:,3]*r[:,3]) q = r / norm[:, None] R = torch.zeros((q.size(0), 3, 3), device='cuda') r = q[:, 0] x = q[:, 1] y = q[:, 2] z = q[:, 3] R[:, 0, 0] = 1 - 2 * (y*y + z*z) R[:, 0, 1] = 2 * (x*y - r*z) R[:, 0, 2] = 2 * (x*z + r*y) R[:, 1, 0] = 2 * (x*y + r*z) R[:, 1, 1] = 1 - 2 * (x*x + z*z) R[:, 1, 2] = 2 * (y*z - r*x) R[:, 2, 0] = 2 * (x*z - r*y) R[:, 2, 1] = 2 * (y*z + r*x) R[:, 2, 2] = 1 - 2 * (x*x + y*y) return R def build_scaling_rotation(s, r): L = torch.zeros((s.shape[0], 3, 3), dtype=torch.float, device="cuda") R = build_rotation(r) L[:,0,0] = s[:,0] L[:,1,1] = s[:,1] L[:,2,2] = s[:,2] L = R @ L return L DEFAULT_RANDOM_SEED = 0 def seedBasic(seed=DEFAULT_RANDOM_SEED): random.seed(seed) os.environ['PYTHONHASHSEED'] = str(seed) np.random.seed(seed) def seedTorch(seed=DEFAULT_RANDOM_SEED): torch.manual_seed(seed) torch.cuda.manual_seed(seed) torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = False # basic + tensorflow + torch def seedEverything(seed=DEFAULT_RANDOM_SEED): seedBasic(seed) seedTorch(seed) def safe_state(silent): old_f = sys.stdout class F: def __init__(self, silent): self.silent = silent def write(self, x): if not self.silent: if x.endswith("\n"): old_f.write(x.replace("\n", " [{}]\n".format(str(datetime.now().strftime("%d/%m %H:%M:%S"))))) else: old_f.write(x) def flush(self): old_f.flush() sys.stdout = F(silent) random.seed(DEFAULT_RANDOM_SEED) np.random.seed(DEFAULT_RANDOM_SEED) torch.manual_seed(DEFAULT_RANDOM_SEED) torch.cuda.set_device(torch.device("cuda:0")) # sys.stdout = old_f ================================================ FILE: utils/graphics_utils.py ================================================ # # Copyright (C) 2023, Inria # GRAPHDECO research group, https://team.inria.fr/graphdeco # All rights reserved. # # This software is free for non-commercial, research and evaluation use # under the terms of the LICENSE.md file. # # For inquiries contact george.drettakis@inria.fr # import torch import math import numpy as np from typing import NamedTuple class BasicPointCloud(NamedTuple): points : np.array colors : np.array normals : np.array # feats3D : np.array def geom_transform_points(points, transf_matrix): P, _ = points.shape ones = torch.ones(P, 1, dtype=points.dtype, device=points.device) points_hom = torch.cat([points, ones], dim=1) points_out = torch.matmul(points_hom, transf_matrix.unsqueeze(0)) denom = points_out[..., 3:] + 0.0000001 return (points_out[..., :3] / denom).squeeze(dim=0) def getWorld2View(R, t): Rt = np.zeros((4, 4)) Rt[:3, :3] = R.transpose() Rt[:3, 3] = t Rt[3, 3] = 1.0 return np.float32(Rt) def getWorld2View2(R, t, translate=np.array([.0, .0, .0]), scale=1.0): Rt = np.zeros((4, 4)) Rt[:3, :3] = R.transpose() Rt[:3, 3] = t Rt[3, 3] = 1.0 C2W = np.linalg.inv(Rt) cam_center = C2W[:3, 3] cam_center = (cam_center + translate) * scale C2W[:3, 3] = cam_center Rt = np.linalg.inv(C2W) return np.float32(Rt) def getProjectionMatrix(znear, zfar, fovX, fovY, cx_ratio, cy_ratio): tanHalfFovY = math.tan((fovY / 2)) tanHalfFovX = math.tan((fovX / 2)) top = tanHalfFovY * znear bottom = -top right = tanHalfFovX * znear left = -right P = torch.zeros(4, 4) z_sign = 1.0 P[0, 0] = 2.0 * znear / (right - left) P[1, 1] = 2.0 * znear / (top - bottom) P[0, 2] = (right + left) / (right - left) - 1 + cx_ratio P[1, 2] = (top + bottom) / (top - bottom) - 1 + cy_ratio P[3, 2] = z_sign P[2, 2] = z_sign * (zfar + znear) / (zfar - znear) P[2, 3] = -(2 * zfar * znear) / (zfar - znear) # P[0, 0] = 2.0 * znear / (right - left) # P[1, 1] = 2.0 * znear / (top - bottom) # P[0, 2] = (right + left) / (right - left) # P[1, 2] = (top + bottom) / (top - bottom) # P[3, 2] = z_sign # P[2, 2] = z_sign * zfar / (zfar - znear) # P[2, 3] = -(zfar * znear) / (zfar - znear) return P def fov2focal(fov, pixels): return pixels / (2 * math.tan(fov / 2)) def focal2fov(focal, pixels): return 2*math.atan(pixels/(2*focal)) ================================================ FILE: utils/image_utils.py ================================================ # # Copyright (C) 2023, Inria # GRAPHDECO research group, https://team.inria.fr/graphdeco # All rights reserved. # # This software is free for non-commercial, research and evaluation use # under the terms of the LICENSE.md file. # # For inquiries contact george.drettakis@inria.fr # import torch def mse(img1, img2): return (((img1 - img2)) ** 2).view(img1.shape[0], -1).mean(1, keepdim=True) def psnr(img1, img2): mse = (((img1 - img2)) ** 2).view(img1.shape[0], -1).mean(1, keepdim=True) return 20 * torch.log10(1.0 / torch.sqrt(mse)) ================================================ FILE: utils/iou_utils.py ================================================ # 3D IoU caculate code for 3D object detection # Kent 2018/12 import numpy as np from scipy.spatial import ConvexHull from numpy import * def polygon_clip(subjectPolygon, clipPolygon): """ Clip a polygon with another polygon. Ref: https://rosettacode.org/wiki/Sutherland-Hodgman_polygon_clipping#Python Args: subjectPolygon: a list of (x,y) 2d points, any polygon. clipPolygon: a list of (x,y) 2d points, has to be *convex* Note: **points have to be counter-clockwise ordered** Return: a list of (x,y) vertex point for the intersection polygon. """ def inside(p): return(cp2[0]-cp1[0])*(p[1]-cp1[1]) > (cp2[1]-cp1[1])*(p[0]-cp1[0]) def computeIntersection(): dc = [ cp1[0] - cp2[0], cp1[1] - cp2[1] ] dp = [ s[0] - e[0], s[1] - e[1] ] n1 = cp1[0] * cp2[1] - cp1[1] * cp2[0] n2 = s[0] * e[1] - s[1] * e[0] n3 = 1.0 / (dc[0] * dp[1] - dc[1] * dp[0]) return [(n1*dp[0] - n2*dc[0]) * n3, (n1*dp[1] - n2*dc[1]) * n3] outputList = subjectPolygon cp1 = clipPolygon[-1] for clipVertex in clipPolygon: cp2 = clipVertex inputList = outputList outputList = [] s = inputList[-1] for subjectVertex in inputList: e = subjectVertex if inside(e): if not inside(s): outputList.append(computeIntersection()) outputList.append(e) elif inside(s): outputList.append(computeIntersection()) s = e cp1 = cp2 if len(outputList) == 0: return None return(outputList) def poly_area(x,y): """ Ref: http://stackoverflow.com/questions/24467972/calculate-area-of-polygon-given-x-y-coordinates """ return 0.5*np.abs(np.dot(x,np.roll(y,1))-np.dot(y,np.roll(x,1))) def convex_hull_intersection(p1, p2): """ Compute area of two convex hull's intersection area. p1,p2 are a list of (x,y) tuples of hull vertices. return a list of (x,y) for the intersection and its volume """ inter_p = polygon_clip(p1,p2) if inter_p is not None: hull_inter = ConvexHull(inter_p) return inter_p, hull_inter.volume else: return None, 0.0 def box3d_vol(corners): ''' corners: (8,3) no assumption on axis direction ''' a = np.sqrt(np.sum((corners[0,:] - corners[1,:])**2)) b = np.sqrt(np.sum((corners[1,:] - corners[2,:])**2)) c = np.sqrt(np.sum((corners[0,:] - corners[4,:])**2)) return a*b*c def is_clockwise(p): x = p[:,0] y = p[:,1] return np.dot(x,np.roll(y,1))-np.dot(y,np.roll(x,1)) > 0 def box3d_iou(corners1, corners2): ''' Compute 3D bounding box IoU. Input: corners1: numpy array (8,3), assume up direction is negative Y corners2: numpy array (8,3), assume up direction is negative Y Output: iou: 3D bounding box IoU iou_2d: bird's eye view 2D bounding box IoU todo (kent): add more description on corner points' orders. ''' # corner points are in counter clockwise order rect1 = [(corners1[i,0], corners1[i,2]) for i in [4,5,1,0]] rect2 = [(corners2[i,0], corners2[i,2]) for i in [4,5,1,0]] area1 = poly_area(np.array(rect1)[:,0], np.array(rect1)[:,1]) area2 = poly_area(np.array(rect2)[:,0], np.array(rect2)[:,1]) inter, inter_area = convex_hull_intersection(rect1, rect2) iou_2d = inter_area/(area1+area2-inter_area) # if iou_2d < 0: # print(inter_area, area1, area2) # ymax = min(corners1[0,1], corners2[0,1]) # ymin = max(corners1[4,1], corners2[4,1]) # inter_vol = inter_area * max(0.0, ymax-ymin) # vol1 = box3d_vol(corners1) # vol2 = box3d_vol(corners2) # iou = inter_vol / (vol1 + vol2 - inter_vol) # return iou, iou_2d return 0, iou_2d # ---------------------------------- # Helper functions for evaluation # ---------------------------------- def get_3d_box(box_size, heading_angle, center): ''' Calculate 3D bounding box corners from its parameterization. Input: box_size: tuple of (length,wide,height) heading_angle: rad scalar, clockwise from pos x axis center: tuple of (x,y,z) Output: corners_3d: numpy array of shape (8,3) for 3D box cornders ''' def roty(t): c = np.cos(t) s = np.sin(t) return np.array([[c, 0, s], [0, 1, 0], [-s, 0, c]]) R = roty(heading_angle) l,w,h = box_size x_corners = [l/2,l/2,-l/2,-l/2,l/2,l/2,-l/2,-l/2]; y_corners = [h/2,h/2,h/2,h/2,-h/2,-h/2,-h/2,-h/2]; z_corners = [w/2,-w/2,-w/2,w/2,w/2,-w/2,-w/2,w/2]; corners_3d = np.dot(R, np.vstack([x_corners,y_corners,z_corners])) corners_3d[0,:] = corners_3d[0,:] + center[0]; corners_3d[1,:] = corners_3d[1,:] + center[1]; corners_3d[2,:] = corners_3d[2,:] + center[2]; corners_3d = np.transpose(corners_3d) return corners_3d if __name__=='__main__': print('------------------') # get_3d_box(box_size, heading_angle, center) corners_3d_ground = get_3d_box((1.497255,1.644981, 3.628938), -1.531692, (2.882992 ,1.698800 ,20.785644)) corners_3d_predict = get_3d_box((1.458242, 1.604773, 3.707947), -1.549553, (2.756923, 1.661275, 20.943280 )) (IOU_3d,IOU_2d)=box3d_iou(corners_3d_predict,corners_3d_ground) print (IOU_3d,IOU_2d) #3d IoU/ 2d IoU of BEV(bird eye's view) ================================================ FILE: utils/loss_utils.py ================================================ # # Copyright (C) 2023, Inria # GRAPHDECO research group, https://team.inria.fr/graphdeco # All rights reserved. # # This software is free for non-commercial, research and evaluation use # under the terms of the LICENSE.md file. # # For inquiries contact george.drettakis@inria.fr # import torch import torch.nn.functional as F from torch.autograd import Variable from math import exp def l1_loss(network_output, gt, mask=None): l1 = torch.abs((network_output - gt)) if mask is not None: l1 = l1[:, mask] return l1.mean() def l2_loss(network_output, gt): return ((network_output - gt) ** 2).mean() def gaussian(window_size, sigma): gauss = torch.Tensor([exp(-(x - window_size // 2) ** 2 / float(2 * sigma ** 2)) for x in range(window_size)]) return gauss / gauss.sum() def create_window(window_size, channel): _1D_window = gaussian(window_size, 1.5).unsqueeze(1) _2D_window = _1D_window.mm(_1D_window.t()).float().unsqueeze(0).unsqueeze(0) window = Variable(_2D_window.expand(channel, 1, window_size, window_size).contiguous()) return window def ssim(img1, img2, window_size=11, size_average=True): channel = img1.size(-3) window = create_window(window_size, channel) if img1.is_cuda: window = window.cuda(img1.get_device()) window = window.type_as(img1) return _ssim(img1, img2, window, window_size, channel, size_average) def _ssim(img1, img2, window, window_size, channel, size_average=True): mu1 = F.conv2d(img1, window, padding=window_size // 2, groups=channel) mu2 = F.conv2d(img2, window, padding=window_size // 2, groups=channel) mu1_sq = mu1.pow(2) mu2_sq = mu2.pow(2) mu1_mu2 = mu1 * mu2 sigma1_sq = F.conv2d(img1 * img1, window, padding=window_size // 2, groups=channel) - mu1_sq sigma2_sq = F.conv2d(img2 * img2, window, padding=window_size // 2, groups=channel) - mu2_sq sigma12 = F.conv2d(img1 * img2, window, padding=window_size // 2, groups=channel) - mu1_mu2 C1 = 0.01 ** 2 C2 = 0.03 ** 2 ssim_map = ((2 * mu1_mu2 + C1) * (2 * sigma12 + C2)) / ((mu1_sq + mu2_sq + C1) * (sigma1_sq + sigma2_sq + C2)) if size_average: return ssim_map.mean() else: return ssim_map.mean(1).mean(1).mean(1) def ssim_loss(img1, img2, window_size=11, size_average=True, mask=None): channel = img1.size(-3) window = create_window(window_size, channel) if img1.is_cuda: window = window.cuda(img1.get_device()) window = window.type_as(img1) return _ssim_loss(img1, img2, window, window_size, channel, size_average, mask) def _ssim_loss(img1, img2, window, window_size, channel, size_average=True, mask=None): mu1 = F.conv2d(img1, window, padding=window_size // 2, groups=channel) mu2 = F.conv2d(img2, window, padding=window_size // 2, groups=channel) mu1_sq = mu1.pow(2) mu2_sq = mu2.pow(2) mu1_mu2 = mu1 * mu2 sigma1_sq = F.conv2d(img1 * img1, window, padding=window_size // 2, groups=channel) - mu1_sq sigma2_sq = F.conv2d(img2 * img2, window, padding=window_size // 2, groups=channel) - mu2_sq sigma12 = F.conv2d(img1 * img2, window, padding=window_size // 2, groups=channel) - mu1_mu2 C1 = 0.01 ** 2 C2 = 0.03 ** 2 ssim_map = ((2 * mu1_mu2 + C1) * (2 * sigma12 + C2)) / ((mu1_sq + mu2_sq + C1) * (sigma1_sq + sigma2_sq + C2)) ssim_map = 1 - ssim_map if mask is not None: ssim_map = ssim_map[:, mask] if size_average: return ssim_map.mean() else: return ssim_map.mean(1).mean(1).mean(1) ================================================ FILE: utils/nvseg_utils.py ================================================ import sys sys.path.append("/data0/hyzhou/workspace/nv_seg") from network import get_model from config import cfg, torch_version_float from datasets.cityscapes import Loader as dataset_cls from runx.logx import logx import cv2 import torch from imageio.v2 import imread, imwrite import os import numpy as np from glob import glob from tqdm import tqdm from torchvision.utils import save_image def restore_net(net, checkpoint): assert 'state_dict' in checkpoint, 'cant find state_dict in checkpoint' forgiving_state_restore(net, checkpoint['state_dict']) def forgiving_state_restore(net, loaded_dict): """ Handle partial loading when some tensors don't match up in size. Because we want to use models that were trained off a different number of classes. """ net_state_dict = net.state_dict() new_loaded_dict = {} for k in net_state_dict: new_k = k if new_k in loaded_dict and net_state_dict[k].size() == loaded_dict[new_k].size(): new_loaded_dict[k] = loaded_dict[new_k] else: logx.msg("Skipped loading parameter {}".format(k)) net_state_dict.update(new_loaded_dict) net.load_state_dict(net_state_dict) return net def get_nvseg_model(): logx.initialize(logdir="./results", global_rank=0) cfg.immutable(False) cfg.DATASET.NUM_CLASSES = dataset_cls.num_classes cfg.DATASET.IGNORE_LABEL = dataset_cls.ignore_label cfg.MODEL.MSCALE = True cfg.MODEL.N_SCALES = [0.5,1.0,2.0] cfg.MODEL.BNFUNC = torch.nn.BatchNorm2d cfg.OPTIONS.TORCH_VERSION = torch_version_float() cfg.DATASET_INST = dataset_cls('folder') cfg.immutable(True) colorize_mask_fn = cfg.DATASET_INST.colorize_mask net = get_model(network='network.ocrnet.HRNet_Mscale', num_classes=cfg.DATASET.NUM_CLASSES, criterion=None) snapshot = "ASSETS_PATH/seg_weights/cityscapes_trainval_ocr.HRNet_Mscale_nimble-chihuahua.pth".replace('ASSETS_PATH', cfg.ASSETS_PATH) checkpoint = torch.load(snapshot, map_location=torch.device('cpu')) renamed_ckpt = {'state_dict': {}} for k, v in checkpoint['state_dict'].items(): renamed_ckpt['state_dict'][k.replace('module.', '')] = v restore_net(net, renamed_ckpt) net = net.eval().cuda() return net ================================================ FILE: utils/semantic_utils.py ================================================ #!/usr/bin/python # # KITTI-360 labels # from collections import namedtuple from PIL import Image import numpy as np #-------------------------------------------------------------------------------- # Definitions #-------------------------------------------------------------------------------- # a label and all meta information Label = namedtuple( 'Label' , [ 'name' , # The identifier of this label, e.g. 'car', 'person', ... . # We use them to uniquely name a class 'id' , # An integer ID that is associated with this label. # The IDs are used to represent the label in ground truth images # An ID of -1 means that this label does not have an ID and thus # is ignored when creating ground truth images (e.g. license plate). # Do not modify these IDs, since exactly these IDs are expected by the # evaluation server. 'trainId' , # Feel free to modify these IDs as suitable for your method. Then create # ground truth images with train IDs, using the tools provided in the # 'preparation' folder. However, make sure to validate or submit results # to our evaluation server using the regular IDs above! # For trainIds, multiple labels might have the same ID. Then, these labels # are mapped to the same class in the ground truth images. For the inverse # mapping, we use the label that is defined first in the list below. # For example, mapping all void-type classes to the same ID in training, # might make sense for some approaches. # Max value is 255! 'category' , # The name of the category that this label belongs to 'categoryId' , # The ID of this category. Used to create ground truth images # on category level. 'hasInstances', # Whether this label distinguishes between single instances or not 'ignoreInEval', # Whether pixels having this class as ground truth label are ignored # during evaluations or not 'color' , # The color of this label ] ) #-------------------------------------------------------------------------------- # A list of all labels #-------------------------------------------------------------------------------- # Please adapt the train IDs as appropriate for your approach. # Note that you might want to ignore labels with ID 255 during training. # Further note that the current train IDs are only a suggestion. You can use whatever you like. # Make sure to provide your results using the original IDs and not the training IDs. # Note that many IDs are ignored in evaluation and thus you never need to predict these! labels = [ # name id trainId category catId hasInstances ignoreInEval color Label( 'unlabeled' , 0 , 255 , 'void' , 0 , False , True , ( 0, 0, 0) ), Label( 'ego vehicle' , 1 , 255 , 'void' , 0 , False , True , ( 0, 0, 0) ), Label( 'rectification border' , 2 , 255 , 'void' , 0 , False , True , ( 0, 0, 0) ), Label( 'out of roi' , 3 , 255 , 'void' , 0 , False , True , ( 0, 0, 0) ), Label( 'static' , 4 , 255 , 'void' , 0 , False , True , ( 0, 0, 0) ), Label( 'dynamic' , 5 , 255 , 'void' , 0 , False , True , (111, 74, 0) ), Label( 'ground' , 6 , 255 , 'void' , 0 , False , True , ( 81, 0, 81) ), Label( 'road' , 7 , 0 , 'flat' , 1 , False , False , (128, 64,128) ), Label( 'sidewalk' , 8 , 1 , 'flat' , 1 , False , False , (244, 35,232) ), Label( 'parking' , 9 , 255 , 'flat' , 1 , False , True , (250,170,160) ), Label( 'rail track' , 10 , 255 , 'flat' , 1 , False , True , (230,150,140) ), Label( 'building' , 11 , 2 , 'construction' , 2 , False , False , ( 70, 70, 70) ), Label( 'wall' , 12 , 3 , 'construction' , 2 , False , False , (102,102,156) ), Label( 'fence' , 13 , 4 , 'construction' , 2 , False , False , (190,153,153) ), Label( 'guard rail' , 14 , 255 , 'construction' , 2 , False , True , (180,165,180) ), Label( 'bridge' , 15 , 255 , 'construction' , 2 , False , True , (150,100,100) ), Label( 'tunnel' , 16 , 255 , 'construction' , 2 , False , True , (150,120, 90) ), Label( 'pole' , 17 , 5 , 'object' , 3 , False , False , (153,153,153) ), Label( 'polegroup' , 18 , 255 , 'object' , 3 , False , True , (153,153,153) ), Label( 'traffic light' , 19 , 6 , 'object' , 3 , False , False , (250,170, 30) ), Label( 'traffic sign' , 20 , 7 , 'object' , 3 , False , False , (220,220, 0) ), Label( 'vegetation' , 21 , 8 , 'nature' , 4 , False , False , (107,142, 35) ), Label( 'terrain' , 22 , 9 , 'nature' , 4 , False , False , (152,251,152) ), Label( 'sky' , 23 , 10 , 'sky' , 5 , False , False , ( 70,130,180) ), Label( 'person' , 24 , 11 , 'human' , 6 , True , False , (220, 20, 60) ), Label( 'rider' , 25 , 12 , 'human' , 6 , True , False , (255, 0, 0) ), Label( 'car' , 26 , 13 , 'vehicle' , 7 , True , False , ( 0, 0,142) ), Label( 'truck' , 27 , 14 , 'vehicle' , 7 , True , False , ( 0, 0, 70) ), Label( 'bus' , 28 , 15 , 'vehicle' , 7 , True , False , ( 0, 60,100) ), Label( 'caravan' , 29 , 255 , 'vehicle' , 7 , True , True , ( 0, 0, 90) ), Label( 'trailer' , 30 , 255 , 'vehicle' , 7 , True , True , ( 0, 0,110) ), Label( 'train' , 31 , 16 , 'vehicle' , 7 , True , False , ( 0, 80,100) ), Label( 'motorcycle' , 32 , 17 , 'vehicle' , 7 , True , False , ( 0, 0,230) ), Label( 'bicycle' , 33 , 18 , 'vehicle' , 7 , True , False , (119, 11, 32) ), Label( 'license plate' , -1 , -1 , 'vehicle' , 7 , False , True , ( 0, 0,142) ), ] #-------------------------------------------------------------------------------- # Create dictionaries for a fast lookup #-------------------------------------------------------------------------------- # Please refer to the main method below for example usages! # name to label object name2label = { label.name : label for label in labels } # id to label object id2label = { label.id : label for label in labels } # trainId to label object trainId2label = { label.trainId : label for label in reversed(labels) } # label2trainid label2trainid = { label.id : label.trainId for label in labels } # trainId to label object trainId2name = { label.trainId : label.name for label in labels } trainId2color = { label.trainId : label.color for label in labels } # category to list of label objects category2labels = {} for label in labels: category = label.category if category in category2labels: category2labels[category].append(label) else: category2labels[category] = [label] #-------------------------------------------------------------------------------- # color mapping #-------------------------------------------------------------------------------- palette = [128, 64, 128, 244, 35, 232, 70, 70, 70, 102, 102, 156, 190, 153, 153, 153, 153, 153, 250, 170, 30, 220, 220, 0, 107, 142, 35, 152, 251, 152, 70, 130, 180, 220, 20, 60, 255, 0, 0, 0, 0, 142, 0, 0, 70, 0, 60, 100, 0, 80, 100, 0, 0, 230, 119, 11, 32] zero_pad = 256 * 3 - len(palette) for i in range(zero_pad): palette.append(0) color_mapping = palette def colorize(image_array): new_mask = Image.fromarray(image_array.astype(np.uint8)).convert('P') new_mask.putpalette(color_mapping) return new_mask #-------------------------------------------------------------------------------- # Assure single instance name #-------------------------------------------------------------------------------- # returns the label name that describes a single instance (if possible) # e.g. input | output # ---------------------- # car | car # cargroup | car # foo | None # foogroup | None # skygroup | None def assureSingleInstanceName( name ): # if the name is known, it is not a group if name in name2label: return name # test if the name actually denotes a group if not name.endswith("group"): return None # remove group name = name[:-len("group")] # test if the new name exists if not name in name2label: return None # test if the new name denotes a label that actually has instances if not name2label[name].hasInstances: return None # all good then return name #-------------------------------------------------------------------------------- # Main for testing #-------------------------------------------------------------------------------- # just a dummy main if __name__ == "__main__": # Print all the labels print("List of KITTI-360 labels:") print("") print(" {:>21} | {:>3} | {:>7} | {:>14} | {:>10} | {:>12} | {:>12}".format( 'name', 'id', 'trainId', 'category', 'categoryId', 'hasInstances', 'ignoreInEval' )) print(" " + ('-' * 98)) for label in labels: # print(" {:>21} | {:>3} | {:>7} | {:>14} | {:>10} | {:>12} | {:>12}".format( label.name, label.id, label.trainId, label.category, label.categoryId, label.hasInstances, label.ignoreInEval )) print(" \"{:}\"".format(label.name)) print("") print("Example usages:") # Map from name to label name = 'car' id = name2label[name].id print("ID of label '{name}': {id}".format( name=name, id=id )) # Map from ID to label category = id2label[id].category print("Category of label with ID '{id}': {category}".format( id=id, category=category )) # Map from trainID to label trainId = 0 name = trainId2label[trainId].name print("Name of label with trainID '{id}': {name}".format( id=trainId, name=name )) ================================================ FILE: utils/sh_utils.py ================================================ # Copyright 2021 The PlenOctree Authors. # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are met: # # 1. Redistributions of source code must retain the above copyright notice, # this list of conditions and the following disclaimer. # # 2. Redistributions in binary form must reproduce the above copyright notice, # this list of conditions and the following disclaimer in the documentation # and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE # POSSIBILITY OF SUCH DAMAGE. import torch C0 = 0.28209479177387814 C1 = 0.4886025119029199 C2 = [ 1.0925484305920792, -1.0925484305920792, 0.31539156525252005, -1.0925484305920792, 0.5462742152960396 ] C3 = [ -0.5900435899266435, 2.890611442640554, -0.4570457994644658, 0.3731763325901154, -0.4570457994644658, 1.445305721320277, -0.5900435899266435 ] C4 = [ 2.5033429417967046, -1.7701307697799304, 0.9461746957575601, -0.6690465435572892, 0.10578554691520431, -0.6690465435572892, 0.47308734787878004, -1.7701307697799304, 0.6258357354491761, ] def eval_sh(deg, sh, dirs): """ Evaluate spherical harmonics at unit directions using hardcoded SH polynomials. Works with torch/np/jnp. ... Can be 0 or more batch dimensions. Args: deg: int SH deg. Currently, 0-3 supported sh: jnp.ndarray SH coeffs [..., C, (deg + 1) ** 2] dirs: jnp.ndarray unit directions [..., 3] Returns: [..., C] """ assert deg <= 4 and deg >= 0 coeff = (deg + 1) ** 2 assert sh.shape[-1] >= coeff result = C0 * sh[..., 0] if deg > 0: x, y, z = dirs[..., 0:1], dirs[..., 1:2], dirs[..., 2:3] result = (result - C1 * y * sh[..., 1] + C1 * z * sh[..., 2] - C1 * x * sh[..., 3]) if deg > 1: xx, yy, zz = x * x, y * y, z * z xy, yz, xz = x * y, y * z, x * z result = (result + C2[0] * xy * sh[..., 4] + C2[1] * yz * sh[..., 5] + C2[2] * (2.0 * zz - xx - yy) * sh[..., 6] + C2[3] * xz * sh[..., 7] + C2[4] * (xx - yy) * sh[..., 8]) if deg > 2: result = (result + C3[0] * y * (3 * xx - yy) * sh[..., 9] + C3[1] * xy * z * sh[..., 10] + C3[2] * y * (4 * zz - xx - yy)* sh[..., 11] + C3[3] * z * (2 * zz - 3 * xx - 3 * yy) * sh[..., 12] + C3[4] * x * (4 * zz - xx - yy) * sh[..., 13] + C3[5] * z * (xx - yy) * sh[..., 14] + C3[6] * x * (xx - 3 * yy) * sh[..., 15]) if deg > 3: result = (result + C4[0] * xy * (xx - yy) * sh[..., 16] + C4[1] * yz * (3 * xx - yy) * sh[..., 17] + C4[2] * xy * (7 * zz - 1) * sh[..., 18] + C4[3] * yz * (7 * zz - 3) * sh[..., 19] + C4[4] * (zz * (35 * zz - 30) + 3) * sh[..., 20] + C4[5] * xz * (7 * zz - 3) * sh[..., 21] + C4[6] * (xx - yy) * (7 * zz - 1) * sh[..., 22] + C4[7] * xz * (xx - 3 * yy) * sh[..., 23] + C4[8] * (xx * (xx - 3 * yy) - yy * (3 * xx - yy)) * sh[..., 24]) return result def RGB2SH(rgb): return (rgb - 0.5) / C0 def SH2RGB(sh): return sh * C0 + 0.5 ================================================ FILE: utils/system_utils.py ================================================ # # Copyright (C) 2023, Inria # GRAPHDECO research group, https://team.inria.fr/graphdeco # All rights reserved. # # This software is free for non-commercial, research and evaluation use # under the terms of the LICENSE.md file. # # For inquiries contact george.drettakis@inria.fr # from errno import EEXIST from os import makedirs, path import os def mkdir_p(folder_path): # Creates a directory. equivalent to using mkdir -p on the command line try: makedirs(folder_path) except OSError as exc: # Python >2.5 if exc.errno == EEXIST and path.isdir(folder_path): pass else: raise def searchForMaxIteration(folder): saved_iters = [int(fname.split("_")[-1]) for fname in os.listdir(folder)] return max(saved_iters)