Repository: kxhit/vMAP
Branch: master
Commit: ac4aefe9fd90
Files: 31
Total size: 158.6 KB
Directory structure:
gitextract_se2unowv/
├── .gitignore
├── LICENSE
├── README.md
├── cfg.py
├── configs/
│ ├── Replica/
│ │ ├── config_replica_room0_iMAP.json
│ │ └── config_replica_room0_vMAP.json
│ └── ScanNet/
│ ├── config_scannet0000_iMAP.json
│ ├── config_scannet0000_vMAP.json
│ ├── config_scannet0024_iMAP.json
│ └── config_scannet0024_vMAP.json
├── data_generation/
│ ├── README.md
│ ├── extract_inst_obj.py
│ ├── habitat_renderer.py
│ ├── replica_render_config_vMAP.yaml
│ ├── settings.py
│ └── transformation.py
├── dataset.py
├── embedding.py
├── environment.yml
├── image_transforms.py
├── loss.py
├── metric/
│ ├── eval_3D_obj.py
│ ├── eval_3D_scene.py
│ └── metrics.py
├── model.py
├── render_rays.py
├── train.py
├── trainer.py
├── utils.py
├── vis.py
└── vmap.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
/.anaconda3/
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
.python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# celery beat schedule file
celerybeat-schedule
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
HierarchicalPriors.sublime-project
HierarchicalPriors.sublime-workspace
scripts/res1/
scripts/results/
scripts/traj.txt
vids_ims
res_datasets/
results/
ScenePriors/train/examples/experiments/
*.pkl
*.idea
================================================
FILE: LICENSE
================================================
vMAP SOFTWARE
LICENCE AGREEMENT
WE (Imperial College of Science, Technology and Medicine, (“Imperial College
London”)) ARE WILLING TO LICENSE THIS SOFTWARE TO YOU (a licensee “You”) ONLY
ON THE CONDITION THAT YOU ACCEPT ALL OF THE TERMS CONTAINED IN THE FOLLOWING
AGREEMENT. PLEASE READ THE AGREEMENT CAREFULLY BEFORE DOWNLOADING THE SOFTWARE.
BY EXERCISING THE OPTION TO DOWNLOAD THE SOFTWARE YOU AGREE TO BE BOUND BY THE
TERMS OF THE AGREEMENT.
SOFTWARE LICENCE AGREEMENT (EXCLUDING BSD COMPONENTS)
1.This Agreement pertains to a worldwide, non-exclusive, temporary, fully
paid-up, royalty free, non-transferable, non-sub- licensable licence (the
“Licence”) to use the elastic fusion source code, including any modification,
part or derivative (the “Software”).
Ownership and Licence. Your rights to use and download the Software onto your
computer, and all other copies that You are authorised to make, are specified
in this Agreement. However, we (or our licensors) retain all rights, including
but not limited to all copyright and other intellectual property rights
anywhere in the world, in the Software not expressly granted to You in this
Agreement.
2. Permitted use of the Licence:
(a) You may download and install the Software onto one computer or server for
use in accordance with Clause 2(b) of this Agreement provided that You ensure
that the Software is not accessible by other users unless they have themselves
accepted the terms of this licence agreement.
(b) You may use the Software solely for non-commercial, internal or academic
research purposes and only in accordance with the terms of this Agreement. You
may not use the Software for commercial purposes, including but not limited to
(1) integration of all or part of the source code or the Software into a
product for sale or licence by or on behalf of You to third parties or (2) use
of the Software or any derivative of it for research to develop software
products for sale or licence to a third party or (3) use of the Software or any
derivative of it for research to develop non-software products for sale or
licence to a third party, or (4) use of the Software to provide any service to
an external organisation for which payment is received.
Should You wish to use the Software for commercial purposes, You shall
email researchcontracts.engineering@imperial.ac.uk .
(c) Right to Copy. You may copy the Software for back-up and archival purposes,
provided that each copy is kept in your possession and provided You reproduce
our copyright notice (set out in Schedule 1) on each copy.
(d) Transfer and sub-licensing. You may not rent, lend, or lease the Software
and You may not transmit, transfer or sub-license this licence to use the
Software or any of your rights or obligations under this Agreement to another
party.
(e) Identity of Licensee. The licence granted herein is personal to You. You
shall not permit any third party to access, modify or otherwise use the
Software nor shall You access modify or otherwise use the Software on behalf of
any third party. If You wish to obtain a licence for mutiple users or a site
licence for the Software please contact us
at researchcontracts.engineering@imperial.ac.uk .
(f) Publications and presentations. You may make public, results or data
obtained from, dependent on or arising from research carried out using the
Software, provided that any such presentation or publication identifies the
Software as the source of the results or the data, including the Copyright
Notice given in each element of the Software, and stating that the Software has
been made available for use by You under licence from Imperial College London
and You provide a copy of any such publication to Imperial College London.
3. Prohibited Uses. You may not, without written permission from us
at researchcontracts.engineering@imperial.ac.uk :
(a) Use, copy, modify, merge, or transfer copies of the Software or any
documentation provided by us which relates to the Software except as provided
in this Agreement;
(b) Use any back-up or archival copies of the Software (or allow anyone else to
use such copies) for any purpose other than to replace the original copy in the
event it is destroyed or becomes defective; or
(c) Disassemble, decompile or "unlock", reverse translate, or in any manner
decode the Software for any reason.
4. Warranty Disclaimer
(a) Disclaimer. The Software has been developed for research purposes only. You
acknowledge that we are providing the Software to You under this licence
agreement free of charge and on condition that the disclaimer set out below
shall apply. We do not represent or warrant that the Software as to: (i) the
quality, accuracy or reliability of the Software; (ii) the suitability of the
Software for any particular use or for use under any specific conditions; and
(iii) whether use of the Software will infringe third-party rights.
You acknowledge that You have reviewed and evaluated the Software to determine
that it meets your needs and that You assume all responsibility and liability
for determining the suitability of the Software as fit for your particular
purposes and requirements. Subject to Clause 4(b), we exclude and expressly
disclaim all express and implied representations, warranties, conditions and
terms not stated herein (including the implied conditions or warranties of
satisfactory quality, merchantable quality, merchantability and fitness for
purpose).
(b) Savings. Some jurisdictions may imply warranties, conditions or terms or
impose obligations upon us which cannot, in whole or in part, be excluded,
restricted or modified or otherwise do not allow the exclusion of implied
warranties, conditions or terms, in which case the above warranty disclaimer
and exclusion will only apply to You to the extent permitted in the relevant
jurisdiction and does not in any event exclude any implied warranties,
conditions or terms which may not under applicable law be excluded.
(c) Imperial College London disclaims all responsibility for the use which is
made of the Software and any liability for the outcomes arising from using the
Software.
5. Limitation of Liability
(a) You acknowledge that we are providing the Software to You under this
licence agreement free of charge and on condition that the limitation of
liability set out below shall apply. Accordingly, subject to Clause 5(b), we
exclude all liability whether in contract, tort, negligence or otherwise, in
respect of the Software and/or any related documentation provided to You by us
including, but not limited to, liability for loss or corruption of data, loss
of contracts, loss of income, loss of profits, loss of cover and any
consequential or indirect loss or damage of any kind arising out of or in
connection with this licence agreement, however caused. This exclusion shall
apply even if we have been advised of the possibility of such loss or damage.
(b) You agree to indemnify Imperial College London and hold it harmless from
and against any and all claims, damages and liabilities asserted by third
parties (including claims for negligence) which arise directly or indirectly
from the use of the Software or any derivative of it or the sale of any
products based on the Software. You undertake to make no liability claim
against any employee, student, agent or appointee of Imperial College London,
in connection with this Licence or the Software.
(c) Nothing in this Agreement shall have the effect of excluding or limiting
our statutory liability.
(d) Some jurisdictions do not allow these limitations or exclusions either
wholly or in part, and, to that extent, they may not apply to you. Nothing in
this licence agreement will affect your statutory rights or other relevant
statutory provisions which cannot be excluded, restricted or modified, and its
terms and conditions must be read and construed subject to any such statutory
rights and/or provisions.
6. Confidentiality. You agree not to disclose any confidential information
provided to You by us pursuant to this Agreement to any third party without our
prior written consent. The obligations in this Clause 6 shall survive the
termination of this Agreement for any reason.
7. Termination.
(a) We may terminate this licence agreement and your right to use the Software
at any time with immediate effect upon written notice to You.
(b) This licence agreement and your right to use the Software automatically
terminate if You:
(i) fail to comply with any provisions of this Agreement; or
(ii) destroy the copies of the Software in your possession, or voluntarily
return the Software to us.
(c) Upon termination You will destroy all copies of the Software.
(d) Otherwise, the restrictions on your rights to use the Software will expire
10 (ten) years after first use of the Software under this licence agreement.
8. Miscellaneous Provisions.
(a) This Agreement will be governed by and construed in accordance with the
substantive laws of England and Wales whose courts shall have exclusive
jurisdiction over all disputes which may arise between us.
(b) This is the entire agreement between us relating to the Software, and
supersedes any prior purchase order, communications, advertising or
representations concerning the Software.
(c) No change or modification of this Agreement will be valid unless it is in
writing, and is signed by us.
(d) The unenforceability or invalidity of any part of this Agreement will not
affect the enforceability or validity of the remaining parts.
BSD Elements of the Software
For BSD elements of the Software, the following terms shall apply:
Copyright as indicated in the header of the individual element of the Software.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors
may be used to endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
SCHEDULE 1
The Software
vMAP is an object-level mapping system with each object represented by a small MLP, that can be efficiently vectorised trained.
It is based on the techniques described in the following publication:
• Xin Kong, Shikun Liu, Marwan Taher, Andrew J. Davison. vMAP: Vectorised Object Mapping for Neural Field SLAM. ArXiv Preprint, 2023
_________________________
Acknowledgments
If you use the software, you should reference the following paper in any
publication:
• Xin Kong, Shikun Liu, Marwan Taher, Andrew J. Davison. vMAP: Vectorised Object Mapping for Neural Field SLAM. ArXiv Preprint, 2023
================================================
FILE: README.md
================================================
[comment]: <> (# vMAP: Vectorised Object Mapping for Neural Field SLAM)
<!-- PROJECT LOGO -->
<p align="center">
<h1 align="center">vMAP: Vectorised Object Mapping for Neural Field SLAM</h1>
<p align="center">
<a href="https://kxhit.github.io"><strong>Xin Kong</strong></a>
·
<a href="https://shikun.io"><strong>Shikun Liu</strong></a>
·
<a href="https://marwan99.github.io/"><strong>Marwan Taher</strong></a>
·
<a href="https://www.doc.ic.ac.uk/~ajd/"><strong>Andrew Davison</strong></a>
</p>
[comment]: <> ( <h2 align="center">PAPER</h2>)
<h3 align="center"><a href="https://arxiv.org/abs/2302.01838">Paper</a> | <a href="https://youtu.be/_H_jNnhUAsE">Video</a> | <a href="https://kxhit.github.io/vMAP">Project Page</a></h3>
<div align="center"></div>
<p align="center">
<a href="">
<img src="./media/teaser.png" alt="Logo" width="80%">
</a>
</p>
<p align="center">
vMAP builds an object-level map from a real-time RGB-D input stream. Each object is represented by a separate MLP neural field model, all optimised in parallel via vectorised training.
</p>
<br>
We provide the implementation of the following neural-field SLAM frameworks:
- **vMAP** [Official Implementation]
- **iMAP** [Simplified and Improved Re-Implementation, with depth guided sampling]
## Install
First, let's start with a virtual environment with the required dependencies.
```bash
conda env create -f environment.yml
```
## Dataset
Please download the following datasets to reproduce our results.
* [Replica Demo](https://huggingface.co/datasets/kxic/vMAP/resolve/main/demo_replica_room_0.zip) - Replica Room 0 only for faster experimentation.
* [Replica](https://huggingface.co/datasets/kxic/vMAP/resolve/main/vmap.zip) - All Pre-generated Replica sequences. For Replica data generation, please refer to directory `data_generation`.
* [ScanNet](https://github.com/ScanNet/ScanNet) - Official ScanNet sequences.
Each dataset contains a sequence of RGB-D images, as well as their corresponding camera poses, and object instance labels.
To extract data from ScanNet .sens files, run
```bash
conda activate py2
python2 reader.py --filename ~/data/ScanNet/scannet/scans/scene0024_00/scene0024_00.sens --output_path ~/data/ScanNet/objnerf/ --export_depth_images --export_color_images --export_poses --export_intrinsics
```
## Config
Then update the config files in `configs/.json` with your dataset paths, as well as other training hyper-parameters.
```json
"dataset": {
"path": "path/to/ims/folder/",
}
```
## Running vMAP / iMAP
The following commands will run vMAP / iMAP in a single-thread setting.
#### vMAP
```bash
python ./train.py --config ./configs/Replica/config_replica_room0_vMAP.json --logdir ./logs/vMAP/room0 --save_ckpt True
```
#### iMAP
```bash
python ./train.py --config ./configs/Replica/config_replica_room0_iMAP.json --logdir ./logs/iMAP/room0 --save_ckpt True
```
[comment]: <> (#### Multi thread demo)
[comment]: <> (```bash)
[comment]: <> (./parallel_train.py --config "config_file.json" --logdir ./logs)
[comment]: <> (```)
## Evaluation
To evaluate the quality of reconstructed scenes, we provide two different methods,
#### 3D Scene-level Evaluation
The same metrics following the original iMAP, to compare with GT scene meshes by **Accuracy**, **Completion** and **Completion Ratio**.
```bash
python ./metric/eval_3D_scene.py
```
#### 3D Object-level Evaluation
We also provide the object-level metrics by computing the same metrics but averaging across all objects in a scene.
```bash
python ./metric/eval_3D_obj.py
```
[comment]: <> (### Novel View Synthesis)
[comment]: <> (##### 2D Novel View Eval)
[comment]: <> (We rendered a new trajectory in each scene and randomly choose novel view pose from it, evaluating 2D rendering performance)
[comment]: <> (```bash)
[comment]: <> (./metric/eval_2D_view.py)
[comment]: <> (```)
## Results
We provide raw results, including 3D meshes, 2D novel view rendering, and evaluated metrics of vMAP and iMAP* for easier comparison.
* [Replica](https://huggingface.co/datasets/kxic/vMAP/resolve/main/vMAP_Replica_Results.zip)
## Acknowledgement
We would like thank the following open-source repositories that we have build upon for the implementation of this work: [NICE-SLAM](https://github.com/cvg/nice-slam), and [functorch](https://github.com/pytorch/functorch).
## Citation
If you found this code/work to be useful in your own research, please considering citing the following:
```bibtex
@article{kong2023vmap,
title={vMAP: Vectorised Object Mapping for Neural Field SLAM},
author={Kong, Xin and Liu, Shikun and Taher, Marwan and Davison, Andrew J},
journal={arXiv preprint arXiv:2302.01838},
year={2023}
}
```
```bibtex
@inproceedings{sucar2021imap,
title={iMAP: Implicit mapping and positioning in real-time},
author={Sucar, Edgar and Liu, Shikun and Ortiz, Joseph and Davison, Andrew J},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={6229--6238},
year={2021}
}
```
================================================
FILE: cfg.py
================================================
import json
import numpy as np
import os
import utils
class Config:
def __init__(self, config_file):
# setting params
with open(config_file) as json_file:
config = json.load(json_file)
# training strategy
self.do_bg = bool(config["trainer"]["do_bg"])
self.training_device = config["trainer"]["train_device"]
self.data_device = config["trainer"]["data_device"]
self.max_n_models = config["trainer"]["n_models"]
self.live_mode = bool(config["dataset"]["live"])
self.keep_live_time = config["dataset"]["keep_alive"]
self.imap_mode = config["trainer"]["imap_mode"]
self.training_strategy = config["trainer"]["training_strategy"] # "forloop" "vmap"
self.obj_id = -1
# dataset setting
self.dataset_format = config["dataset"]["format"]
self.dataset_dir = config["dataset"]["path"]
self.depth_scale = 1 / config["trainer"]["scale"]
# camera setting
self.max_depth = config["render"]["depth_range"][1]
self.min_depth = config["render"]["depth_range"][0]
self.mh = config["camera"]["mh"]
self.mw = config["camera"]["mw"]
self.height = config["camera"]["h"]
self.width = config["camera"]["w"]
self.H = self.height - 2 * self.mh
self.W = self.width - 2 * self.mw
if "fx" in config["camera"]:
self.fx = config["camera"]["fx"]
self.fy = config["camera"]["fy"]
self.cx = config["camera"]["cx"] - self.mw
self.cy = config["camera"]["cy"] - self.mh
else: # for scannet
intrinsic = utils.load_matrix_from_txt(os.path.join(self.dataset_dir, "intrinsic/intrinsic_depth.txt"))
self.fx = intrinsic[0, 0]
self.fy = intrinsic[1, 1]
self.cx = intrinsic[0, 2] - self.mw
self.cy = intrinsic[1, 2] - self.mh
if "distortion" in config["camera"]:
self.distortion_array = np.array(config["camera"]["distortion"])
elif "k1" in config["camera"]:
k1 = config["camera"]["k1"]
k2 = config["camera"]["k2"]
k3 = config["camera"]["k3"]
k4 = config["camera"]["k4"]
k5 = config["camera"]["k5"]
k6 = config["camera"]["k6"]
p1 = config["camera"]["p1"]
p2 = config["camera"]["p2"]
self.distortion_array = np.array([k1, k2, p1, p2, k3, k4, k5, k6])
else:
self.distortion_array = None
# training setting
self.win_size = config["model"]["window_size"]
self.n_iter_per_frame = config["render"]["iters_per_frame"]
self.n_per_optim = config["render"]["n_per_optim"]
self.n_samples_per_frame = self.n_per_optim // self.win_size
self.win_size_bg = config["model"]["window_size_bg"]
self.n_per_optim_bg = config["render"]["n_per_optim_bg"]
self.n_samples_per_frame_bg = self.n_per_optim_bg // self.win_size_bg
self.keyframe_buffer_size = config["model"]["keyframe_buffer_size"]
self.keyframe_step = config["model"]["keyframe_step"]
self.keyframe_step_bg = config["model"]["keyframe_step_bg"]
self.obj_scale = config["model"]["obj_scale"]
self.bg_scale = config["model"]["bg_scale"]
self.hidden_feature_size = config["model"]["hidden_feature_size"]
self.hidden_feature_size_bg = config["model"]["hidden_feature_size_bg"]
self.n_bins_cam2surface = config["render"]["n_bins_cam2surface"]
self.n_bins_cam2surface_bg = config["render"]["n_bins_cam2surface_bg"]
self.n_bins = config["render"]["n_bins"]
self.n_unidir_funcs = config["model"]["n_unidir_funcs"]
self.surface_eps = config["model"]["surface_eps"]
self.stop_eps = config["model"]["other_eps"]
# optimizer setting
self.learning_rate = config["optimizer"]["args"]["lr"]
self.weight_decay = config["optimizer"]["args"]["weight_decay"]
# vis setting
self.vis_device = config["vis"]["vis_device"]
self.n_vis_iter = config["vis"]["n_vis_iter"]
self.live_voxel_size = config["vis"]["live_voxel_size"]
self.grid_dim = config["vis"]["grid_dim"]
================================================
FILE: configs/Replica/config_replica_room0_iMAP.json
================================================
{
"dataset": {
"live": 0,
"path": "/home/xin/data/Replica/vmap/room_0/imap/00",
"format": "Replica",
"keep_alive": 20
},
"optimizer": {
"args":{
"lr": 0.001,
"weight_decay": 0.013,
"pose_lr": 0.001
}
},
"trainer": {
"imap_mode": 1,
"do_bg": 0,
"n_models": 1,
"train_device": "cuda:0",
"data_device": "cuda:0",
"training_strategy": "vmap",
"epochs": 1000000,
"scale": 1000.0
},
"render": {
"depth_range": [0.0, 8.0],
"n_bins": 9,
"n_bins_cam2surface": 5,
"n_bins_cam2surface_bg": 5,
"iters_per_frame": 20,
"n_per_optim": 4800,
"n_per_optim_bg": 1200
},
"model": {
"n_unidir_funcs": 5,
"obj_scale": 5.0,
"bg_scale": 5.0,
"color_scaling": 5.0,
"opacity_scaling": 10.0,
"gt_scene": 1,
"surface_eps": 0.1,
"other_eps": 0.05,
"keyframe_buffer_size": 20,
"keyframe_step": 50,
"keyframe_step_bg": 50,
"window_size": 5,
"window_size_bg": 10,
"hidden_layers_block": 1,
"hidden_feature_size": 256,
"hidden_feature_size_bg": 128
},
"camera": {
"w": 1200,
"h": 680,
"fx": 600.0,
"fy": 600.0,
"cx": 599.5,
"cy": 339.5,
"mw": 0,
"mh": 0
},
"vis": {
"vis_device": "cuda:0",
"n_vis_iter": 500,
"n_bins_fine_vis": 10,
"im_vis_reduce": 10,
"grid_dim": 256,
"live_vis": 1,
"live_voxel_size": 0.005
}
}
================================================
FILE: configs/Replica/config_replica_room0_vMAP.json
================================================
{
"dataset": {
"live": 0,
"path": "/home/xin/data/Replica/vmap/room_0/imap/00",
"format": "Replica",
"keep_alive": 20
},
"optimizer": {
"args":{
"lr": 0.001,
"weight_decay": 0.013,
"pose_lr": 0.001
}
},
"trainer": {
"imap_mode": 0,
"do_bg": 1,
"n_models": 100,
"train_device": "cuda:0",
"data_device": "cuda:0",
"training_strategy": "vmap",
"epochs": 1000000,
"scale": 1000.0
},
"render": {
"depth_range": [0.0, 8.0],
"n_bins": 9,
"n_bins_cam2surface": 1,
"n_bins_cam2surface_bg": 5,
"iters_per_frame": 20,
"n_per_optim": 120,
"n_per_optim_bg": 1200
},
"model": {
"n_unidir_funcs": 5,
"obj_scale": 2.0,
"bg_scale": 5.0,
"color_scaling": 5.0,
"opacity_scaling": 10.0,
"gt_scene": 1,
"surface_eps": 0.1,
"other_eps": 0.05,
"keyframe_buffer_size": 20,
"keyframe_step": 25,
"keyframe_step_bg": 50,
"window_size": 5,
"window_size_bg": 10,
"hidden_layers_block": 1,
"hidden_feature_size": 32,
"hidden_feature_size_bg": 128
},
"camera": {
"w": 1200,
"h": 680,
"fx": 600.0,
"fy": 600.0,
"cx": 599.5,
"cy": 339.5,
"mw": 0,
"mh": 0
},
"vis": {
"vis_device": "cuda:0",
"n_vis_iter": 500,
"n_bins_fine_vis": 10,
"im_vis_reduce": 10,
"grid_dim": 256,
"live_vis": 1,
"live_voxel_size": 0.005
}
}
================================================
FILE: configs/ScanNet/config_scannet0000_iMAP.json
================================================
{
"dataset": {
"live": 0,
"path": "/home/xin/data/ScanNet/NICESLAM/scene0000_00",
"format": "ScanNet",
"keep_alive": 20
},
"optimizer": {
"args":{
"lr": 0.001,
"weight_decay": 0.013,
"pose_lr": 0.001
}
},
"trainer": {
"imap_mode": 1,
"do_bg": 0,
"n_models": 1,
"train_device": "cuda:0",
"data_device": "cuda:0",
"training_strategy": "vmap",
"epochs": 1000000,
"scale": 1000.0
},
"render": {
"depth_range": [0.0, 6.0],
"n_bins": 9,
"n_bins_cam2surface": 5,
"n_bins_cam2surface_bg": 5,
"iters_per_frame": 20,
"n_per_optim": 2400,
"n_per_optim_bg": 1200
},
"model": {
"n_unidir_funcs": 5,
"obj_scale": 3.0,
"bg_scale": 15.0,
"color_scaling": 5.0,
"opacity_scaling": 10.0,
"gt_scene": 1,
"surface_eps": 0.1,
"other_eps": 0.05,
"keyframe_buffer_size": 20,
"keyframe_step": 50,
"keyframe_step_bg": 50,
"window_size": 5,
"window_size_bg": 10,
"hidden_layers_block": 1,
"hidden_feature_size": 256,
"hidden_feature_size_bg": 128
},
"camera": {
"w": 640,
"h": 480,
"mw": 10,
"mh": 10
},
"vis": {
"vis_device": "cuda:0",
"n_vis_iter": 500,
"n_bins_fine_vis": 10,
"im_vis_reduce": 10,
"grid_dim": 256,
"live_vis": 1,
"live_voxel_size": 0.005
}
}
================================================
FILE: configs/ScanNet/config_scannet0000_vMAP.json
================================================
{
"dataset": {
"live": 0,
"path": "/home/xin/data/ScanNet/NICESLAM/scene0000_00",
"format": "ScanNet",
"keep_alive": 20
},
"optimizer": {
"args":{
"lr": 0.001,
"weight_decay": 0.013,
"pose_lr": 0.001
}
},
"trainer": {
"imap_mode": 0,
"do_bg": 1,
"n_models": 100,
"train_device": "cuda:0",
"data_device": "cuda:0",
"training_strategy": "vmap",
"epochs": 1000000,
"scale": 1000.0
},
"render": {
"depth_range": [0.0, 6.0],
"n_bins": 9,
"n_bins_cam2surface": 1,
"n_bins_cam2surface_bg": 5,
"iters_per_frame": 20,
"n_per_optim": 120,
"n_per_optim_bg": 1200
},
"model": {
"n_unidir_funcs": 5,
"obj_scale": 3.0,
"bg_scale": 10.0,
"color_scaling": 5.0,
"opacity_scaling": 10.0,
"gt_scene": 1,
"surface_eps": 0.1,
"other_eps": 0.05,
"keyframe_buffer_size": 20,
"keyframe_step": 25,
"keyframe_step_bg": 50,
"window_size": 5,
"window_size_bg": 10,
"hidden_layers_block": 1,
"hidden_feature_size": 32,
"hidden_feature_size_bg": 128
},
"camera": {
"w": 640,
"h": 480,
"mw": 10,
"mh": 10
},
"vis": {
"vis_device": "cuda:0",
"n_vis_iter": 10000000,
"n_bins_fine_vis": 10,
"im_vis_reduce": 10,
"grid_dim": 256,
"live_vis": 1,
"live_voxel_size": 0.005
}
}
================================================
FILE: configs/ScanNet/config_scannet0024_iMAP.json
================================================
{
"dataset": {
"live": 0,
"path": "/home/xin/data/ScanNet/obj-imap/scene0024_00",
"format": "ScanNet",
"keep_alive": 20
},
"optimizer": {
"args":{
"lr": 0.001,
"weight_decay": 0.013,
"pose_lr": 0.001
}
},
"trainer": {
"imap_mode": 1,
"do_bg": 0,
"n_models": 1,
"train_device": "cuda:0",
"data_device": "cuda:0",
"training_strategy": "vmap",
"epochs": 1000000,
"scale": 1000.0
},
"render": {
"depth_range": [0.0, 6.0],
"n_bins": 9,
"n_bins_cam2surface": 5,
"n_bins_cam2surface_bg": 5,
"iters_per_frame": 20,
"n_per_optim": 2400,
"n_per_optim_bg": 1200
},
"model": {
"n_unidir_funcs": 5,
"obj_scale": 2.0,
"bg_scale": 5.0,
"color_scaling": 5.0,
"opacity_scaling": 10.0,
"gt_scene": 1,
"surface_eps": 0.1,
"other_eps": 0.05,
"keyframe_buffer_size": 20,
"keyframe_step": 50,
"keyframe_step_bg": 50,
"window_size": 5,
"window_size_bg": 10,
"hidden_layers_block": 1,
"hidden_feature_size": 256,
"hidden_feature_size_bg": 128
},
"camera": {
"w": 640,
"h": 480,
"mw": 10,
"mh": 10
},
"vis": {
"vis_device": "cuda:0",
"n_vis_iter": 500,
"n_bins_fine_vis": 10,
"im_vis_reduce": 10,
"grid_dim": 256,
"live_vis": 1,
"live_voxel_size": 0.005
}
}
================================================
FILE: configs/ScanNet/config_scannet0024_vMAP.json
================================================
{
"dataset": {
"live": 0,
"path": "/home/xin/data/ScanNet/obj-imap/scene0024_00",
"format": "ScanNet",
"keep_alive": 20
},
"optimizer": {
"args":{
"lr": 0.001,
"weight_decay": 0.013,
"pose_lr": 0.001
}
},
"trainer": {
"imap_mode": 0,
"do_bg": 1,
"n_models": 100,
"train_device": "cuda:0",
"data_device": "cuda:0",
"training_strategy": "vmap",
"epochs": 1000000,
"scale": 1000.0
},
"render": {
"depth_range": [0.0, 6.0],
"n_bins": 9,
"n_bins_cam2surface": 1,
"n_bins_cam2surface_bg": 5,
"iters_per_frame": 20,
"n_per_optim": 120,
"n_per_optim_bg": 1200
},
"model": {
"n_unidir_funcs": 5,
"obj_scale": 3.0,
"bg_scale": 10.0,
"color_scaling": 5.0,
"opacity_scaling": 10.0,
"gt_scene": 1,
"surface_eps": 0.1,
"other_eps": 0.05,
"keyframe_buffer_size": 20,
"keyframe_step": 25,
"keyframe_step_bg": 50,
"window_size": 5,
"window_size_bg": 10,
"hidden_layers_block": 1,
"hidden_feature_size": 32,
"hidden_feature_size_bg": 128
},
"camera": {
"w": 640,
"h": 480,
"mw": 10,
"mh": 10
},
"vis": {
"vis_device": "cuda:0",
"n_vis_iter": 10000000,
"n_bins_fine_vis": 10,
"im_vis_reduce": 10,
"grid_dim": 256,
"live_vis": 1,
"live_voxel_size": 0.005
}
}
================================================
FILE: data_generation/README.md
================================================
## Replica Data Generation
### Download Replica Dataset
Download 3D models and info files from [Replica](https://github.com/facebookresearch/Replica-Dataset)
### 3D Object Mesh Extraction
Change the input path in `./data_generation/extract_inst_obj.py` and run
```angular2html
python ./data_generation/extract_inst_obj.py
```
### Camera Trajectory Generation
Please refer to [Semantic-NeRF](https://github.com/Harry-Zhi/semantic_nerf/issues/25#issuecomment-1340595427) for more details. The random trajectory generation only works for single room scene. For multiple rooms scene, collision checking is needed. Welcome contributions.
### Rendering 2D Images
Given camera trajectory t_wc (change pose_file in configs), we use [Habitat-Sim](https://github.com/facebookresearch/habitat-sim) to render RGB, Depth, Semantic and Instance images.
#### Install Habitat-Sim 0.2.1
We recommend to use conda to install habitat-sim 0.2.1.
```angular2html
conda create -n habitat python=3.8.12 cmake=3.14.0
conda activate habitat
conda install habitat-sim=0.2.1 withbullet -c conda-forge -c aihabitat
conda install numba=0.54.1
```
#### Run rendering with configs
```angular2html
python ./data_generation/habitat_renderer.py --config ./data_generation/replica_render_config_vMAP.yaml
```
Note that to get HDR img, use mesh.ply not semantic_mesh.ply. Change path in configs. And copy rgb folder to replace previous high exposure rgb.
```angular2html
python ./data_generation/habitat_renderer.py --config ./data_generation/replica_render_config_vMAP.yaml
```
================================================
FILE: data_generation/extract_inst_obj.py
================================================
# reference https://github.com/facebookresearch/Replica-Dataset/issues/17#issuecomment-538757418
from plyfile import *
import numpy as np
import trimesh
# path_in = 'path/to/mesh_semantic.ply'
path_in = '/home/xin/data/vmap/room_0_debug/habitat/mesh_semantic.ply'
print("Reading input...")
mesh = trimesh.load(path_in)
# mesh.show()
file_in = PlyData.read(path_in)
vertices_in = file_in.elements[0]
faces_in = file_in.elements[1]
print("Filtering data...")
objects = {}
sub_mesh_indices = {}
for i, f in enumerate(faces_in):
object_id = f[1]
if not object_id in objects:
objects[object_id] = []
sub_mesh_indices[object_id] = []
objects[object_id].append((f[0],))
sub_mesh_indices[object_id].append(i)
sub_mesh_indices[object_id].append(i+faces_in.data.shape[0])
print("Writing data...")
for object_id, faces in objects.items():
path_out = path_in + f"_{object_id}.ply"
# print("sub_mesh_indices[object_id] ", sub_mesh_indices[object_id])
obj_mesh = mesh.submesh([sub_mesh_indices[object_id]], append=True)
in_n = len(sub_mesh_indices[object_id])
out_n = obj_mesh.faces.shape[0]
# print("obj id ", object_id)
# print("in_n ", in_n)
# print("out_n ", out_n)
# print("faces ", len(faces))
# assert in_n == out_n
obj_mesh.export(path_out)
# faces_out = PlyElement.describe(np.array(faces, dtype=[('vertex_indices', 'O')]), 'face')
# print("faces out ", len(PlyData([vertices_in, faces_out]).elements[1].data))
# PlyData([vertices_in, faces_out]).write(path_out+"_cmp.ply")
================================================
FILE: data_generation/habitat_renderer.py
================================================
#!/usr/bin/env python3
import os, sys, argparse
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
import cv2
import logging
import habitat_sim as hs
import numpy as np
import quaternion
import yaml
import json
from typing import Any, Dict, List, Tuple, Union
from imgviz import label_colormap
from PIL import Image
import matplotlib.pyplot as plt
import transformation
import imgviz
from datetime import datetime
import time
from settings import make_cfg
# Custom type definitions
Config = Dict[str, Any]
Observation = hs.sensor.Observation
Sim = hs.Simulator
def init_habitat(config) :
"""Initialize the Habitat simulator with sensors and scene file"""
_cfg = make_cfg(config)
sim = Sim(_cfg)
sim_cfg = hs.SimulatorConfiguration()
sim_cfg.gpu_device_id = 0
# Note: all sensors must have the same resolution
camera_resolution = [config["height"], config["width"]]
sensors = {
"color_sensor": {
"sensor_type": hs.SensorType.COLOR,
"resolution": camera_resolution,
"position": [0.0, config["sensor_height"], 0.0],
},
"depth_sensor": {
"sensor_type": hs.SensorType.DEPTH,
"resolution": camera_resolution,
"position": [0.0, config["sensor_height"], 0.0],
},
"semantic_sensor": {
"sensor_type": hs.SensorType.SEMANTIC,
"resolution": camera_resolution,
"position": [0.0, config["sensor_height"], 0.0],
},
}
sensor_specs = []
for sensor_uuid, sensor_params in sensors.items():
if config[sensor_uuid]:
sensor_spec = hs.SensorSpec()
sensor_spec.uuid = sensor_uuid
sensor_spec.sensor_type = sensor_params["sensor_type"]
sensor_spec.resolution = sensor_params["resolution"]
sensor_spec.position = sensor_params["position"]
sensor_specs.append(sensor_spec)
# Here you can specify the amount of displacement in a forward action and the turn angle
agent_cfg = hs.agent.AgentConfiguration()
agent_cfg.sensor_specifications = sensor_specs
agent_cfg.action_space = {
"move_forward": hs.agent.ActionSpec(
"move_forward", hs.agent.ActuationSpec(amount=0.25)
),
"turn_left": hs.agent.ActionSpec(
"turn_left", hs.agent.ActuationSpec(amount=30.0)
),
"turn_right": hs.agent.ActionSpec(
"turn_right", hs.agent.ActuationSpec(amount=30.0)
),
}
hs_cfg = hs.Configuration(sim_cfg, [agent_cfg])
# sim = Sim(hs_cfg)
if config["enable_semantics"]: # extract instance to class mapping function
assert os.path.exists(config["instance2class_mapping"])
with open(config["instance2class_mapping"], "r") as f:
annotations = json.load(f)
instance_id_to_semantic_label_id = np.array(annotations["id_to_label"])
num_classes = len(annotations["classes"])
label_colour_map = label_colormap()
config["instance2semantic"] = instance_id_to_semantic_label_id
config["classes"] = annotations["classes"]
config["objects"] = annotations["objects"]
config["num_classes"] = num_classes
config["label_colour_map"] = label_colormap()
config["instance_colour_map"] = label_colormap(500)
# add camera intrinsic
# hfov = float(agent_cfg.sensor_specifications[0].parameters['hfov']) * np.pi / 180.
# https://aihabitat.org/docs/habitat-api/view-transform-warp.html
# config['K'] = K
# config['K'] = np.array([[fx, 0.0, 0.0], [0.0, fx, 0.0], [0.0, 0.0, 1.0]],
# dtype=np.float64)
# hfov = float(agent_cfg.sensor_specifications[0].parameters['hfov'])
# fx = 1.0 / np.tan(hfov / 2.0)
# config['K'] = np.array([[fx, 0.0, 0.0], [0.0, fx, 0.0], [0.0, 0.0, 1.0]],
# dtype=np.float64)
# Get the intrinsic camera parameters
logging.info('Habitat simulator initialized')
return sim, hs_cfg, config
def save_renders(save_path, observation, enable_semantic, suffix=""):
save_path_rgb = os.path.join(save_path, "rgb")
save_path_depth = os.path.join(save_path, "depth")
save_path_sem_class = os.path.join(save_path, "semantic_class")
save_path_sem_instance = os.path.join(save_path, "semantic_instance")
if not os.path.exists(save_path_rgb):
os.makedirs(save_path_rgb)
if not os.path.exists(save_path_depth):
os.makedirs(save_path_depth)
if not os.path.exists(save_path_sem_class):
os.makedirs(save_path_sem_class)
if not os.path.exists(save_path_sem_instance):
os.makedirs(save_path_sem_instance)
cv2.imwrite(os.path.join(save_path_rgb, "rgb{}.png".format(suffix)), observation["color_sensor"][:,:,::-1]) # change from RGB to BGR for opencv write
cv2.imwrite(os.path.join(save_path_depth, "depth{}.png".format(suffix)), observation["depth_sensor_mm"])
if enable_semantic:
cv2.imwrite(os.path.join(save_path_sem_class, "semantic_class{}.png".format(suffix)), observation["semantic_class"])
cv2.imwrite(os.path.join(save_path_sem_class, "vis_sem_class{}.png".format(suffix)), observation["vis_sem_class"][:,:,::-1])
cv2.imwrite(os.path.join(save_path_sem_instance, "semantic_instance{}.png".format(suffix)), observation["semantic_instance"])
cv2.imwrite(os.path.join(save_path_sem_instance, "vis_sem_instance{}.png".format(suffix)), observation["vis_sem_instance"][:,:,::-1])
def render(sim, config):
"""Return the sensor observations and ground truth pose"""
observation = sim.get_sensor_observations()
# process rgb imagem change from RGBA to RGB
observation['color_sensor'] = observation['color_sensor'][..., 0:3]
rgb_img = observation['color_sensor']
# process depth
depth_mm = (observation['depth_sensor'].copy()*1000).astype(np.uint16) # change meters to mm
observation['depth_sensor_mm'] = depth_mm
# process semantics
if config['enable_semantics']:
# Assuming the scene has no more than 65534 objects
observation['semantic_instance'] = np.clip(observation['semantic_sensor'].astype(np.uint16), 0, 65535)
# observation['semantic_instance'][observation['semantic_instance']==12]=0 # mask out certain instance
# Convert instance IDs to class IDs
# observation['semantic_classes'] = np.zeros(observation['semantic'].shape, dtype=np.uint8)
# TODO make this conversion more efficient
semantic_class = config["instance2semantic"][observation['semantic_instance']]
semantic_class[semantic_class < 0] = 0
vis_sem_class = config["label_colour_map"][semantic_class]
vis_sem_instance = config["instance_colour_map"][observation['semantic_instance']] # may cause error when having more than 255 instances in the scene
observation['semantic_class'] = semantic_class.astype(np.uint8)
observation["vis_sem_class"] = vis_sem_class.astype(np.uint8)
observation["vis_sem_instance"] = vis_sem_instance.astype(np.uint8)
# del observation["semantic_sensor"]
# Get the camera ground truth pose (T_HC) in the habitat frame from the
# position and orientation
t_HC = sim.get_agent(0).get_state().position
q_HC = sim.get_agent(0).get_state().rotation
T_HC = transformation.combine_pose(t_HC, q_HC)
observation['T_HC'] = T_HC
observation['T_WC'] = transformation.Thc_to_Twc(T_HC)
return observation
def set_agent_position(sim, pose):
# Move the agent
R = pose[:3, :3]
orientation_quat = quaternion.from_rotation_matrix(R)
t = pose[:3, 3]
position = t
orientation = [orientation_quat.x, orientation_quat.y, orientation_quat.z, orientation_quat.w]
agent = sim.get_agent(0)
agent_state = hs.agent.AgentState(position, orientation)
# agent.set_state(agent_state, reset_sensors=False)
agent.set_state(agent_state)
def main():
parser = argparse.ArgumentParser(description='Render Colour, Depth, Semantic, Instance labeling from Habitat-Simultation.')
parser.add_argument('--config_file', type=str,
default="./data_generation/replica_render_config_vMAP.yaml",
help='the path to custom config file.')
args = parser.parse_args()
"""Initialize the config dict and Habitat simulator"""
# Read YAML file
with open(args.config_file, 'r') as f:
config = yaml.safe_load(f)
config["save_path"] = os.path.join(config["save_path"])
if not os.path.exists(config["save_path"]):
os.makedirs(config["save_path"])
T_wc = np.loadtxt(config["pose_file"]).reshape(-1, 4, 4)
Ts_cam2world = T_wc
print("-----Initialise and Set Habitat-Sim-----")
sim, hs_cfg, config = init_habitat(config)
# Set agent state
sim.initialize_agent(config["default_agent"])
"""Set agent state"""
print("-----Render Images from Habitat-Sim-----")
with open(os.path.join(config["save_path"], 'render_config.yaml'), 'w') as outfile:
yaml.dump(config, outfile, default_flow_style=False)
start_time = time.time()
total_render_num = Ts_cam2world.shape[0]
for i in range(total_render_num):
if i % 100 == 0 :
print("Rendering Process: {}/{}".format(i, total_render_num))
set_agent_position(sim, transformation.Twc_to_Thc(Ts_cam2world[i]))
# replica mode
observation = render(sim, config)
save_renders(config["save_path"], observation, config["enable_semantics"], suffix="_{}".format(i))
end_time = time.time()
print("-----Finish Habitat Rendering, Showing Trajectories.-----")
print("Average rendering time per image is {} seconds.".format((end_time-start_time)/Ts_cam2world.shape[0]))
if __name__ == "__main__":
main()
================================================
FILE: data_generation/replica_render_config_vMAP.yaml
================================================
# Agent settings
default_agent: 0
gpu_id: 0
width: 1200 #1280
height: 680 #960
sensor_height: 0
color_sensor: true # RGB sensor
semantic_sensor: true # Semantic sensor
depth_sensor: true # Depth sensor
enable_semantics: true
# room_0
scene_file: "/home/xin/data/vmap/room_0/habitat/mesh_semantic.ply"
instance2class_mapping: "/home/xin/data/vmap/room_0/habitat/info_semantic.json"
save_path: "/home/xin/data/vmap/room_0/vmap/00/"
pose_file: "/home/xin/data/vmap/room_0/vmap/00/traj_w_c.txt"
## HDR texture
## issue https://github.com/facebookresearch/Replica-Dataset/issues/41#issuecomment-566251467
#scene_file: "/home/xin/data/vmap/room_0/mesh.ply"
#instance2class_mapping: "/home/xin/data/vmap/room_0/habitat/info_semantic.json"
#save_path: "/home/xin/data/vmap/room_0/vmap/00/"
#pose_file: "/home/xin/data/vmap/room_0/vmap/00/traj_w_c.txt"
================================================
FILE: data_generation/settings.py
================================================
# Copyright (c) Facebook, Inc. and its affiliates.
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
import habitat_sim
import habitat_sim.agent
default_sim_settings = {
# settings shared by example.py and benchmark.py
"max_frames": 1000,
"width": 640,
"height": 480,
"default_agent": 0,
"sensor_height": 1.5,
"hfov": 90,
"color_sensor": True, # RGB sensor (default: ON)
"semantic_sensor": False, # semantic sensor (default: OFF)
"depth_sensor": False, # depth sensor (default: OFF)
"ortho_rgba_sensor": False, # Orthographic RGB sensor (default: OFF)
"ortho_depth_sensor": False, # Orthographic depth sensor (default: OFF)
"ortho_semantic_sensor": False, # Orthographic semantic sensor (default: OFF)
"fisheye_rgba_sensor": False,
"fisheye_depth_sensor": False,
"fisheye_semantic_sensor": False,
"equirect_rgba_sensor": False,
"equirect_depth_sensor": False,
"equirect_semantic_sensor": False,
"seed": 1,
"silent": False, # do not print log info (default: OFF)
# settings exclusive to example.py
"save_png": False, # save the pngs to disk (default: OFF)
"print_semantic_scene": False,
"print_semantic_mask_stats": False,
"compute_shortest_path": False,
"compute_action_shortest_path": False,
"scene": "data/scene_datasets/habitat-test-scenes/skokloster-castle.glb",
"test_scene_data_url": "http://dl.fbaipublicfiles.com/habitat/habitat-test-scenes.zip",
"goal_position": [5.047, 0.199, 11.145],
"enable_physics": False,
"enable_gfx_replay_save": False,
"physics_config_file": "./data/default.physics_config.json",
"num_objects": 10,
"test_object_index": 0,
"frustum_culling": True,
}
# build SimulatorConfiguration
def make_cfg(settings):
sim_cfg = habitat_sim.SimulatorConfiguration()
if "frustum_culling" in settings:
sim_cfg.frustum_culling = settings["frustum_culling"]
else:
sim_cfg.frustum_culling = False
if "enable_physics" in settings:
sim_cfg.enable_physics = settings["enable_physics"]
if "physics_config_file" in settings:
sim_cfg.physics_config_file = settings["physics_config_file"]
# if not settings["silent"]:
# print("sim_cfg.physics_config_file = " + sim_cfg.physics_config_file)
if "scene_light_setup" in settings:
sim_cfg.scene_light_setup = settings["scene_light_setup"]
sim_cfg.gpu_device_id = 0
if not hasattr(sim_cfg, "scene_id"):
raise RuntimeError(
"Error: Please upgrade habitat-sim. SimulatorConfig API version mismatch"
)
sim_cfg.scene_id = settings["scene_file"]
# define default sensor parameters (see src/esp/Sensor/Sensor.h)
sensor_specs = []
def create_camera_spec(**kw_args):
camera_sensor_spec = habitat_sim.CameraSensorSpec()
camera_sensor_spec.sensor_type = habitat_sim.SensorType.COLOR
camera_sensor_spec.resolution = [settings["height"], settings["width"]]
camera_sensor_spec.position = [0, settings["sensor_height"], 0]
for k in kw_args:
setattr(camera_sensor_spec, k, kw_args[k])
return camera_sensor_spec
if settings["color_sensor"]:
color_sensor_spec = create_camera_spec(
uuid="color_sensor",
# hfov=settings["hfov"],
sensor_type=habitat_sim.SensorType.COLOR,
sensor_subtype=habitat_sim.SensorSubType.PINHOLE,
)
sensor_specs.append(color_sensor_spec)
if settings["depth_sensor"]:
depth_sensor_spec = create_camera_spec(
uuid="depth_sensor",
# hfov=settings["hfov"],
sensor_type=habitat_sim.SensorType.DEPTH,
channels=1,
sensor_subtype=habitat_sim.SensorSubType.PINHOLE,
)
sensor_specs.append(depth_sensor_spec)
if settings["semantic_sensor"]:
semantic_sensor_spec = create_camera_spec(
uuid="semantic_sensor",
# hfov=settings["hfov"],
sensor_type=habitat_sim.SensorType.SEMANTIC,
channels=1,
sensor_subtype=habitat_sim.SensorSubType.PINHOLE,
)
sensor_specs.append(semantic_sensor_spec)
# if settings["ortho_rgba_sensor"]:
# ortho_rgba_sensor_spec = create_camera_spec(
# uuid="ortho_rgba_sensor",
# sensor_type=habitat_sim.SensorType.COLOR,
# sensor_subtype=habitat_sim.SensorSubType.ORTHOGRAPHIC,
# )
# sensor_specs.append(ortho_rgba_sensor_spec)
#
# if settings["ortho_depth_sensor"]:
# ortho_depth_sensor_spec = create_camera_spec(
# uuid="ortho_depth_sensor",
# sensor_type=habitat_sim.SensorType.DEPTH,
# channels=1,
# sensor_subtype=habitat_sim.SensorSubType.ORTHOGRAPHIC,
# )
# sensor_specs.append(ortho_depth_sensor_spec)
#
# if settings["ortho_semantic_sensor"]:
# ortho_semantic_sensor_spec = create_camera_spec(
# uuid="ortho_semantic_sensor",
# sensor_type=habitat_sim.SensorType.SEMANTIC,
# channels=1,
# sensor_subtype=habitat_sim.SensorSubType.ORTHOGRAPHIC,
# )
# sensor_specs.append(ortho_semantic_sensor_spec)
# TODO Figure out how to implement copying of specs
def create_fisheye_spec(**kw_args):
fisheye_sensor_spec = habitat_sim.FisheyeSensorDoubleSphereSpec()
fisheye_sensor_spec.uuid = "fisheye_sensor"
fisheye_sensor_spec.sensor_type = habitat_sim.SensorType.COLOR
fisheye_sensor_spec.sensor_model_type = (
habitat_sim.FisheyeSensorModelType.DOUBLE_SPHERE
)
# The default value (alpha, xi) is set to match the lens "GoPro" found in Table 3 of this paper:
# Vladyslav Usenko, Nikolaus Demmel and Daniel Cremers: The Double Sphere
# Camera Model, The International Conference on 3D Vision (3DV), 2018
# You can find the intrinsic parameters for the other lenses in the same table as well.
fisheye_sensor_spec.xi = -0.27
fisheye_sensor_spec.alpha = 0.57
fisheye_sensor_spec.focal_length = [364.84, 364.86]
fisheye_sensor_spec.resolution = [settings["height"], settings["width"]]
# The default principal_point_offset is the middle of the image
fisheye_sensor_spec.principal_point_offset = None
# default: fisheye_sensor_spec.principal_point_offset = [i/2 for i in fisheye_sensor_spec.resolution]
fisheye_sensor_spec.position = [0, settings["sensor_height"], 0]
for k in kw_args:
setattr(fisheye_sensor_spec, k, kw_args[k])
return fisheye_sensor_spec
# if settings["fisheye_rgba_sensor"]:
# fisheye_rgba_sensor_spec = create_fisheye_spec(uuid="fisheye_rgba_sensor")
# sensor_specs.append(fisheye_rgba_sensor_spec)
# if settings["fisheye_depth_sensor"]:
# fisheye_depth_sensor_spec = create_fisheye_spec(
# uuid="fisheye_depth_sensor",
# sensor_type=habitat_sim.SensorType.DEPTH,
# channels=1,
# )
# sensor_specs.append(fisheye_depth_sensor_spec)
# if settings["fisheye_semantic_sensor"]:
# fisheye_semantic_sensor_spec = create_fisheye_spec(
# uuid="fisheye_semantic_sensor",
# sensor_type=habitat_sim.SensorType.SEMANTIC,
# channels=1,
# )
# sensor_specs.append(fisheye_semantic_sensor_spec)
def create_equirect_spec(**kw_args):
equirect_sensor_spec = habitat_sim.EquirectangularSensorSpec()
equirect_sensor_spec.uuid = "equirect_rgba_sensor"
equirect_sensor_spec.sensor_type = habitat_sim.SensorType.COLOR
equirect_sensor_spec.resolution = [settings["height"], settings["width"]]
equirect_sensor_spec.position = [0, settings["sensor_height"], 0]
for k in kw_args:
setattr(equirect_sensor_spec, k, kw_args[k])
return equirect_sensor_spec
# if settings["equirect_rgba_sensor"]:
# equirect_rgba_sensor_spec = create_equirect_spec(uuid="equirect_rgba_sensor")
# sensor_specs.append(equirect_rgba_sensor_spec)
#
# if settings["equirect_depth_sensor"]:
# equirect_depth_sensor_spec = create_equirect_spec(
# uuid="equirect_depth_sensor",
# sensor_type=habitat_sim.SensorType.DEPTH,
# channels=1,
# )
# sensor_specs.append(equirect_depth_sensor_spec)
#
# if settings["equirect_semantic_sensor"]:
# equirect_semantic_sensor_spec = create_equirect_spec(
# uuid="equirect_semantic_sensor",
# sensor_type=habitat_sim.SensorType.SEMANTIC,
# channels=1,
# )
# sensor_specs.append(equirect_semantic_sensor_spec)
# create agent specifications
agent_cfg = habitat_sim.agent.AgentConfiguration()
agent_cfg.sensor_specifications = sensor_specs
agent_cfg.action_space = {
"move_forward": habitat_sim.agent.ActionSpec(
"move_forward", habitat_sim.agent.ActuationSpec(amount=0.25)
),
"turn_left": habitat_sim.agent.ActionSpec(
"turn_left", habitat_sim.agent.ActuationSpec(amount=10.0)
),
"turn_right": habitat_sim.agent.ActionSpec(
"turn_right", habitat_sim.agent.ActuationSpec(amount=10.0)
),
}
# override action space to no-op to test physics
if sim_cfg.enable_physics:
agent_cfg.action_space = {
"move_forward": habitat_sim.agent.ActionSpec(
"move_forward", habitat_sim.agent.ActuationSpec(amount=0.0)
)
}
return habitat_sim.Configuration(sim_cfg, [agent_cfg])
================================================
FILE: data_generation/transformation.py
================================================
import numpy as np
import quaternion
import trimesh
def habitat_world_transformations():
import habitat_sim
# Transforms between the habitat frame H (y-up) and the world frame W (z-up).
T_wh = np.identity(4)
# https://stackoverflow.com/questions/1171849/finding-quaternion-representing-the-rotation-from-one-vector-to-another
T_wh[0:3, 0:3] = quaternion.as_rotation_matrix(habitat_sim.utils.common.quat_from_two_vectors(
habitat_sim.geo.GRAVITY, np.array([0.0, 0.0, -1.0])))
T_hw = np.linalg.inv(T_wh)
return T_wh, T_hw
def opencv_to_opengl_camera(transform=None):
if transform is None:
transform = np.eye(4)
return transform @ trimesh.transformations.rotation_matrix(
np.deg2rad(180), [1, 0, 0]
)
def opengl_to_opencv_camera(transform=None):
if transform is None:
transform = np.eye(4)
return transform @ trimesh.transformations.rotation_matrix(
np.deg2rad(-180), [1, 0, 0]
)
def Twc_to_Thc(T_wc): # opencv-camera to world transformation ---> habitat-caemra to habitat world transformation
T_wh, T_hw = habitat_world_transformations()
T_hc = T_hw @ T_wc @ opengl_to_opencv_camera()
return T_hc
def Thc_to_Twc(T_hc): # habitat-caemra to habitat world transformation ---> opencv-camera to world transformation
T_wh, T_hw = habitat_world_transformations()
T_wc = T_wh @ T_hc @ opencv_to_opengl_camera()
return T_wc
def combine_pose(t: np.array, q: quaternion.quaternion) -> np.array:
T = np.identity(4)
T[0:3, 3] = t
T[0:3, 0:3] = quaternion.as_rotation_matrix(q)
return T
================================================
FILE: dataset.py
================================================
import imgviz
from torch.utils.data import Dataset, DataLoader
import torch
import numpy as np
import cv2
import os
from utils import enlarge_bbox, get_bbox2d, get_bbox2d_batch, box_filter
import glob
from torchvision import transforms
import image_transforms
import open3d
import time
def next_live_data(track_to_map_IDT, inited):
while True:
if track_to_map_IDT.empty():
if inited:
return None # no new frame, use kf buffer
else: # blocking until get the first frame
continue
else:
Buffer_data = track_to_map_IDT.get(block=False)
break
if Buffer_data is not None:
image, depth, T, obj, bbox_dict, kf_id = Buffer_data
del Buffer_data
T_obj = torch.eye(4)
sample = {"image": image, "depth": depth, "T": T, "T_obj": T_obj,
"obj": obj, "bbox_dict": bbox_dict, "frame_id": kf_id}
return sample
else:
print("getting nothing?")
exit(-1)
# return None
def init_loader(cfg, multi_worker=True):
if cfg.dataset_format == "Replica":
dataset = Replica(cfg)
elif cfg.dataset_format == "ScanNet":
dataset = ScanNet(cfg)
else:
print("Dataset format {} not found".format(cfg.dataset_format))
exit(-1)
# init dataloader
if multi_worker:
# multi worker loader
dataloader = DataLoader(dataset, batch_size=None, shuffle=False, sampler=None,
batch_sampler=None, num_workers=4, collate_fn=None,
pin_memory=True, drop_last=False, timeout=0,
worker_init_fn=None, generator=None, prefetch_factor=2,
persistent_workers=True)
else:
# single worker loader
dataloader = DataLoader(dataset, batch_size=None, shuffle=False, sampler=None,
batch_sampler=None, num_workers=0)
return dataloader
class Replica(Dataset):
def __init__(self, cfg):
self.imap_mode = cfg.imap_mode
self.root_dir = cfg.dataset_dir
traj_file = os.path.join(self.root_dir, "traj_w_c.txt")
self.Twc = np.loadtxt(traj_file, delimiter=" ").reshape([-1, 4, 4])
self.depth_transform = transforms.Compose(
[image_transforms.DepthScale(cfg.depth_scale),
image_transforms.DepthFilter(cfg.max_depth)])
# background semantic classes: undefined--1, undefined-0 beam-5 blinds-12 curtain-30 ceiling-31 floor-40 pillar-60 vent-92 wall-93 wall-plug-95 window-97 rug-98
self.background_cls_list = [5,12,30,31,40,60,92,93,95,97,98,79]
# Not sure: door-37 handrail-43 lamp-47 pipe-62 rack-66 shower-stall-73 stair-77 switch-79 wall-cabinet-94 picture-59
self.bbox_scale = 0.2 # 1 #1.5 0.9== s=1/9, s=0.2
def __len__(self):
return len(os.listdir(os.path.join(self.root_dir, "depth")))
def __getitem__(self, idx):
bbox_dict = {}
rgb_file = os.path.join(self.root_dir, "rgb", "rgb_" + str(idx) + ".png")
depth_file = os.path.join(self.root_dir, "depth", "depth_" + str(idx) + ".png")
inst_file = os.path.join(self.root_dir, "semantic_instance", "semantic_instance_" + str(idx) + ".png")
obj_file = os.path.join(self.root_dir, "semantic_class", "semantic_class_" + str(idx) + ".png")
depth = cv2.imread(depth_file, -1).astype(np.float32).transpose(1,0)
image = cv2.imread(rgb_file).astype(np.uint8)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB).transpose(1,0,2)
obj = cv2.imread(obj_file, cv2.IMREAD_UNCHANGED).astype(np.int32).transpose(1,0) # uint16 -> int32
inst = cv2.imread(inst_file, cv2.IMREAD_UNCHANGED).astype(np.int32).transpose(1,0) # uint16 -> int32
bbox_scale = self.bbox_scale
if self.imap_mode:
obj = np.zeros_like(obj)
else:
obj_ = np.zeros_like(obj)
inst_list = []
batch_masks = []
for inst_id in np.unique(inst):
inst_mask = inst == inst_id
# if np.sum(inst_mask) <= 2000: # too small 20 400
# continue
sem_cls = np.unique(obj[inst_mask]) # sem label, only interested obj
assert sem_cls.shape[0] != 0
if sem_cls in self.background_cls_list:
continue
obj_mask = inst == inst_id
batch_masks.append(obj_mask)
inst_list.append(inst_id)
if len(batch_masks) > 0:
batch_masks = torch.from_numpy(np.stack(batch_masks))
cmins, cmaxs, rmins, rmaxs = get_bbox2d_batch(batch_masks)
for i in range(batch_masks.shape[0]):
w = rmaxs[i] - rmins[i]
h = cmaxs[i] - cmins[i]
if w <= 10 or h <= 10: # too small todo
continue
bbox_enlarged = enlarge_bbox([rmins[i], cmins[i], rmaxs[i], cmaxs[i]], scale=bbox_scale,
w=obj.shape[1], h=obj.shape[0])
# inst_list.append(inst_id)
inst_id = inst_list[i]
obj_[batch_masks[i]] = 1
# bbox_dict.update({int(inst_id): torch.from_numpy(np.array(bbox_enlarged).reshape(-1))}) # batch format
bbox_dict.update({inst_id: torch.from_numpy(np.array(
[bbox_enlarged[1], bbox_enlarged[3], bbox_enlarged[0], bbox_enlarged[2]]))}) # bbox order
inst[obj_ == 0] = 0 # for background
obj = inst
bbox_dict.update({0: torch.from_numpy(np.array([int(0), int(obj.shape[0]), 0, int(obj.shape[1])]))}) # bbox order
T = self.Twc[idx] # could change to ORB-SLAM pose or else
T_obj = np.eye(4) # obj pose, if dynamic
sample = {"image": image, "depth": depth, "T": T, "T_obj": T_obj,
"obj": obj, "bbox_dict": bbox_dict, "frame_id": idx}
if image is None or depth is None:
print(rgb_file)
print(depth_file)
raise ValueError
if self.depth_transform:
sample["depth"] = self.depth_transform(sample["depth"])
return sample
class ScanNet(Dataset):
def __init__(self, cfg):
self.imap_mode = cfg.imap_mode
self.root_dir = cfg.dataset_dir
self.color_paths = sorted(glob.glob(os.path.join(
self.root_dir, 'color', '*.jpg')), key=lambda x: int(os.path.basename(x)[:-4]))
self.depth_paths = sorted(glob.glob(os.path.join(
self.root_dir, 'depth', '*.png')), key=lambda x: int(os.path.basename(x)[:-4]))
self.inst_paths = sorted(glob.glob(os.path.join(
self.root_dir, 'instance-filt', '*.png')), key=lambda x: int(os.path.basename(x)[:-4])) # instance-filt
self.sem_paths = sorted(glob.glob(os.path.join(
self.root_dir, 'label-filt', '*.png')), key=lambda x: int(os.path.basename(x)[:-4])) # label-filt
self.load_poses(os.path.join(self.root_dir, 'pose'))
self.n_img = len(self.color_paths)
self.depth_transform = transforms.Compose(
[image_transforms.DepthScale(cfg.depth_scale),
image_transforms.DepthFilter(cfg.max_depth)])
# self.rgb_transform = rgb_transform
self.W = cfg.W
self.H = cfg.H
self.fx = cfg.fx
self.fy = cfg.fy
self.cx = cfg.cx
self.cy = cfg.cy
self.edge = cfg.mw
self.intrinsic_open3d = open3d.camera.PinholeCameraIntrinsic(
width=self.W,
height=self.H,
fx=self.fx,
fy=self.fy,
cx=self.cx,
cy=self.cy,
)
self.min_pixels = 1500
# from scannetv2-labels.combined.tsv
#1-wall, 3-floor, 16-window, 41-ceiling, 232-light switch 0-unknown? 21-pillar 161-doorframe, shower walls-128, curtain-21, windowsill-141
self.background_cls_list = [-1, 0, 1, 3, 16, 41, 232, 21, 161, 128, 21]
self.bbox_scale = 0.2
self.inst_dict = {}
def load_poses(self, path):
self.poses = []
pose_paths = sorted(glob.glob(os.path.join(path, '*.txt')),
key=lambda x: int(os.path.basename(x)[:-4]))
for pose_path in pose_paths:
with open(pose_path, "r") as f:
lines = f.readlines()
ls = []
for line in lines:
l = list(map(float, line.split(' ')))
ls.append(l)
c2w = np.array(ls).reshape(4, 4)
self.poses.append(c2w)
def __len__(self):
return self.n_img
def __getitem__(self, index):
bbox_scale = self.bbox_scale
color_path = self.color_paths[index]
depth_path = self.depth_paths[index]
inst_path = self.inst_paths[index]
sem_path = self.sem_paths[index]
color_data = cv2.imread(color_path).astype(np.uint8)
color_data = cv2.cvtColor(color_data, cv2.COLOR_BGR2RGB)
depth_data = cv2.imread(depth_path, cv2.IMREAD_UNCHANGED).astype(np.float32)
depth_data = np.nan_to_num(depth_data, nan=0.)
T = None
if self.poses is not None:
T = self.poses[index]
if np.any(np.isinf(T)):
if index + 1 == self.__len__():
print("pose inf!")
return None
return self.__getitem__(index + 1)
H, W = depth_data.shape
color_data = cv2.resize(color_data, (W, H), interpolation=cv2.INTER_LINEAR)
if self.edge:
edge = self.edge # crop image edge, there are invalid value on the edge of the color image
color_data = color_data[edge:-edge, edge:-edge]
depth_data = depth_data[edge:-edge, edge:-edge]
if self.depth_transform:
depth_data = self.depth_transform(depth_data)
bbox_dict = {}
if self.imap_mode:
inst_data = np.zeros_like(depth_data).astype(np.int32)
else:
inst_data = cv2.imread(inst_path, cv2.IMREAD_UNCHANGED)
inst_data = cv2.resize(inst_data, (W, H), interpolation=cv2.INTER_NEAREST).astype(np.int32)
sem_data = cv2.imread(sem_path, cv2.IMREAD_UNCHANGED)#.astype(np.int32)
sem_data = cv2.resize(sem_data, (W, H), interpolation=cv2.INTER_NEAREST)
if self.edge:
edge = self.edge
inst_data = inst_data[edge:-edge, edge:-edge]
sem_data = sem_data[edge:-edge, edge:-edge]
inst_data += 1 # shift from 0->1 , 0 is for background
# box filter
track_start = time.time()
masks = []
classes = []
# convert to list of arrays
obj_ids = np.unique(inst_data)
for obj_id in obj_ids:
mask = inst_data == obj_id
sem_cls = np.unique(sem_data[mask])
if sem_cls in self.background_cls_list:
inst_data[mask] = 0 # set to background
continue
masks.append(mask)
classes.append(obj_id)
T_CW = np.linalg.inv(T)
inst_data = box_filter(masks, classes, depth_data, self.inst_dict, self.intrinsic_open3d, T_CW, min_pixels=self.min_pixels)
merged_obj_ids = np.unique(inst_data)
for obj_id in merged_obj_ids:
mask = inst_data == obj_id
bbox2d = get_bbox2d(mask, bbox_scale=bbox_scale)
if bbox2d is None:
inst_data[mask] = 0 # set to bg
else:
min_x, min_y, max_x, max_y = bbox2d
bbox_dict.update({int(obj_id): torch.from_numpy(np.array([min_x, max_x, min_y, max_y]).reshape(-1))}) # batch format
bbox_time = time.time()
# print("bbox time ", bbox_time - filter_time)
cv2.imshow("inst", imgviz.label2rgb(inst_data))
cv2.waitKey(1)
print("frame {} track time {}".format(index, bbox_time-track_start))
bbox_dict.update({0: torch.from_numpy(np.array([int(0), int(inst_data.shape[1]), 0, int(inst_data.shape[0])]))}) # bbox order
# wrap data to frame dict
T_obj = np.identity(4)
sample = {"image": color_data.transpose(1,0,2), "depth": depth_data.transpose(1,0), "T": T, "T_obj": T_obj}
if color_data is None or depth_data is None:
print(color_path)
print(depth_path)
raise ValueError
sample.update({"obj": inst_data.transpose(1,0)})
sample.update({"bbox_dict": bbox_dict})
return sample
================================================
FILE: embedding.py
================================================
import torch
import numpy as np
def positional_encoding(
tensor,
B_layer=None,
num_encoding_functions=6,
scale=10.
):
if B_layer is not None:
embedding_gauss = B_layer(tensor / scale)
embedding_gauss = torch.sin(embedding_gauss)
embedding = embedding_gauss
else:
frequency_bands = 2.0 ** torch.linspace(
0.0,
num_encoding_functions - 1,
num_encoding_functions,
dtype=tensor.dtype,
device=tensor.device,
)
n_repeat = num_encoding_functions * 2 + 1
embedding = tensor[..., None, :].repeat(1, 1, n_repeat, 1) / scale
even_idx = np.arange(1, num_encoding_functions + 1) * 2
odd_idx = even_idx - 1
frequency_bands = frequency_bands[None, None, :, None]
embedding[:, :, even_idx, :] = torch.cos(
frequency_bands * embedding[:, :, even_idx, :])
embedding[:, :, odd_idx, :] = torch.sin(
frequency_bands * embedding[:, :, odd_idx, :])
n_dim = tensor.shape[-1]
embedding = embedding.view(
embedding.shape[0], embedding.shape[1], n_repeat * n_dim)
# print("embedding ", embedding.shape)
embedding = embedding.squeeze(0)
return embedding
class UniDirsEmbed(torch.nn.Module):
def __init__(self, min_deg=0, max_deg=2, scale=2.):
super(UniDirsEmbed, self).__init__()
self.min_deg = min_deg
self.max_deg = max_deg
self.n_freqs = max_deg - min_deg + 1
self.tensor_scale = torch.tensor(scale, requires_grad=False)
dirs = torch.tensor([
0.8506508, 0, 0.5257311,
0.809017, 0.5, 0.309017,
0.5257311, 0.8506508, 0,
1, 0, 0,
0.809017, 0.5, -0.309017,
0.8506508, 0, -0.5257311,
0.309017, 0.809017, -0.5,
0, 0.5257311, -0.8506508,
0.5, 0.309017, -0.809017,
0, 1, 0,
-0.5257311, 0.8506508, 0,
-0.309017, 0.809017, -0.5,
0, 0.5257311, 0.8506508,
-0.309017, 0.809017, 0.5,
0.309017, 0.809017, 0.5,
0.5, 0.309017, 0.809017,
0.5, -0.309017, 0.809017,
0, 0, 1,
-0.5, 0.309017, 0.809017,
-0.809017, 0.5, 0.309017,
-0.809017, 0.5, -0.309017
]).reshape(-1, 3)
self.B_layer = torch.nn.Linear(3, 21, bias=False)
self.B_layer.weight.data = dirs
frequency_bands = 2.0 ** torch.linspace(self.min_deg, self.max_deg, self.n_freqs)
self.register_buffer("frequency_bands", frequency_bands, persistent=False)
self.register_buffer("scale", self.tensor_scale, persistent=True)
def forward(self, x):
tensor = x / self.scale # functorch needs buffer, otherwise changed
proj = self.B_layer(tensor)
proj_bands = proj[..., None, :] * self.frequency_bands[None, None, :, None]
xb = proj_bands.view(list(proj.shape[:-1]) + [-1])
# embedding = torch.sin(torch.cat([xb, xb + 0.5 * np.pi], dim=-1))
embedding = torch.sin(xb * np.pi)
embedding = torch.cat([tensor] + [embedding], dim=-1)
# print("emb size ", embedding.shape)
return embedding
================================================
FILE: environment.yml
================================================
name: vmap
channels:
- pytorch
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- blas=1.0=mkl
- brotlipy=0.7.0=py38h27cfd23_1003
- bzip2=1.0.8=h7b6447c_0
- ca-certificates=2022.10.11=h06a4308_0
- certifi=2022.9.24=py38h06a4308_0
- cffi=1.15.1=py38h5eee18b_2
- charset-normalizer=2.0.4=pyhd3eb1b0_0
- cryptography=38.0.1=py38h9ce1e76_0
- cudatoolkit=11.3.1=h2bc3f7f_2
- ffmpeg=4.3=hf484d3e_0
- freetype=2.12.1=h4a9f257_0
- giflib=5.2.1=h7b6447c_0
- gmp=6.2.1=h295c915_3
- gnutls=3.6.15=he1e5248_0
- idna=3.4=py38h06a4308_0
- intel-openmp=2021.4.0=h06a4308_3561
- jpeg=9e=h7f8727e_0
- lame=3.100=h7b6447c_0
- lcms2=2.12=h3be6417_0
- ld_impl_linux-64=2.38=h1181459_1
- lerc=3.0=h295c915_0
- libdeflate=1.8=h7f8727e_5
- libffi=3.4.2=h6a678d5_6
- libgcc-ng=11.2.0=h1234567_1
- libgomp=11.2.0=h1234567_1
- libiconv=1.16=h7f8727e_2
- libidn2=2.3.2=h7f8727e_0
- libpng=1.6.37=hbc83047_0
- libstdcxx-ng=11.2.0=h1234567_1
- libtasn1=4.16.0=h27cfd23_0
- libtiff=4.4.0=hecacb30_2
- libunistring=0.9.10=h27cfd23_0
- libwebp=1.2.4=h11a3e52_0
- libwebp-base=1.2.4=h5eee18b_0
- lz4-c=1.9.3=h295c915_1
- mkl=2021.4.0=h06a4308_640
- mkl-service=2.4.0=py38h7f8727e_0
- mkl_fft=1.3.1=py38hd3c417c_0
- mkl_random=1.2.2=py38h51133e4_0
- ncurses=6.3=h5eee18b_3
- nettle=3.7.3=hbbd107a_1
- numpy=1.23.4=py38h14f4228_0
- numpy-base=1.23.4=py38h31eccc5_0
- openh264=2.1.1=h4ff587b_0
- openssl=1.1.1s=h7f8727e_0
- pillow=9.2.0=py38hace64e9_1
- pip=22.2.2=py38h06a4308_0
- pycparser=2.21=pyhd3eb1b0_0
- pyopenssl=22.0.0=pyhd3eb1b0_0
- pysocks=1.7.1=py38h06a4308_0
- python=3.8.15=h7a1cb2a_2
- pytorch=1.12.1=py3.8_cuda11.3_cudnn8.3.2_0
- pytorch-mutex=1.0=cuda
- readline=8.2=h5eee18b_0
- requests=2.28.1=py38h06a4308_0
- setuptools=65.5.0=py38h06a4308_0
- six=1.16.0=pyhd3eb1b0_1
- sqlite=3.40.0=h5082296_0
- tk=8.6.12=h1ccaba5_0
- torchaudio=0.12.1=py38_cu113
- torchvision=0.13.1=py38_cu113
- typing_extensions=4.3.0=py38h06a4308_0
- urllib3=1.26.12=py38h06a4308_0
- wheel=0.37.1=pyhd3eb1b0_0
- xz=5.2.6=h5eee18b_0
- zlib=1.2.13=h5eee18b_0
- zstd=1.5.2=ha4553b6_0
- pip:
- antlr4-python3-runtime==4.9.3
- bidict==0.22.0
- click==8.1.3
- dash==2.7.0
- dash-core-components==2.0.0
- dash-html-components==2.0.0
- dash-table==5.0.0
- entrypoints==0.4
- flask==2.2.2
- functorch==0.2.0
- fvcore==0.1.5.post20221122
- h5py==3.7.0
- imageio==2.22.4
- importlib-metadata==5.1.0
- iopath==0.1.10
- itsdangerous==2.1.2
- lpips==0.1.4
- nbformat==5.5.0
- omegaconf==2.2.3
- open3d==0.16.0
- pexpect==4.8.0
- plotly==5.11.0
- pycocotools==2.0.6
- pyquaternion==0.9.9
- scikit-image==0.19.3
- tenacity==8.1.0
- termcolor==2.1.1
- timm==0.6.12
- werkzeug==2.2.2
- yacs==0.1.8
- zipp==3.11.0
================================================
FILE: image_transforms.py
================================================
import cv2
import numpy as np
class BGRtoRGB(object):
"""bgr format to rgb"""
def __call__(self, image):
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
return image
class DepthScale(object):
"""scale depth to meters"""
def __init__(self, scale):
self.scale = scale
def __call__(self, depth):
depth = depth.astype(np.float32)
return depth * self.scale
class DepthFilter(object):
"""scale depth to meters"""
def __init__(self, max_depth):
self.max_depth = max_depth
def __call__(self, depth):
far_mask = depth > self.max_depth
depth[far_mask] = 0.
return depth
class Undistort(object):
"""scale depth to meters"""
def __init__(self,
w, h,
fx, fy, cx, cy,
k1, k2, k3, k4, k5, k6,
p1, p2,
interpolation):
self.interpolation = interpolation
K = np.array([[fx, 0., cx],
[0., fy, cy],
[0., 0., 1.]])
self.map1x, self.map1y = cv2.initUndistortRectifyMap(
K,
np.array([k1, k2, p1, p2, k3, k4, k5, k6]),
np.eye(3),
K,
(w, h),
cv2.CV_32FC1)
def __call__(self, im):
im = cv2.remap(im, self.map1x, self.map1y, self.interpolation)
return im
================================================
FILE: loss.py
================================================
import torch
import render_rays
import torch.nn.functional as F
def step_batch_loss(alpha, color, gt_depth, gt_color, sem_labels, mask_depth, z_vals,
color_scaling=5.0, opacity_scaling=10.0):
"""
apply depth where depth are valid -> mask_depth
apply depth, color loss on this_obj & unkown_obj == (~other_obj) -> mask_obj
apply occupancy/opacity loss on this_obj & other_obj == (~unknown_obj) -> mask_sem
output:
loss for training
loss_all for per sample, could be used for active sampling, replay buffer
"""
mask_obj = sem_labels != 0
mask_obj = mask_obj.detach()
mask_sem = sem_labels != 2
mask_sem = mask_sem.detach()
alpha = alpha.squeeze(dim=-1)
color = color.squeeze(dim=-1)
occupancy = render_rays.occupancy_activation(alpha)
termination = render_rays.occupancy_to_termination(occupancy, is_batch=True) # shape [num_batch, num_ray, points_per_ray]
render_depth = render_rays.render(termination, z_vals)
diff_sq = (z_vals - render_depth[..., None]) ** 2
var = render_rays.render(termination, diff_sq).detach() # must detach here!
render_color = render_rays.render(termination[..., None], color, dim=-2)
render_opacity = torch.sum(termination, dim=-1) # similar to obj-nerf opacity loss
# 2D depth loss: only on valid depth & mask
# [mask_depth & mask_obj]
# loss_all = torch.zeros_like(render_depth)
loss_depth_raw = render_rays.render_loss(render_depth, gt_depth, loss="L1", normalise=False)
loss_depth = torch.mul(loss_depth_raw, mask_depth & mask_obj) # keep dim but set invalid element be zero
# loss_all += loss_depth
loss_depth = render_rays.reduce_batch_loss(loss_depth, var=var, avg=True, mask=mask_depth & mask_obj) # apply var as imap
# 2D color loss: only on obj mask
# [mask_obj]
loss_col_raw = render_rays.render_loss(render_color, gt_color, loss="L1", normalise=False)
loss_col = torch.mul(loss_col_raw.sum(-1), mask_obj)
# loss_all += loss_col / 3. * color_scaling
loss_col = render_rays.reduce_batch_loss(loss_col, var=None, avg=True, mask=mask_obj)
# 2D occupancy/opacity loss: apply except unknown area
# [mask_sem]
# loss_opacity_raw = F.mse_loss(torch.clamp(render_opacity, 0, 1), mask_obj.float().detach()) # encourage other_obj to be empty, while this_obj to be solid
# print("opacity max ", torch.max(render_opacity.max()))
# print("opacity min ", torch.max(render_opacity.min()))
loss_opacity_raw = render_rays.render_loss(render_opacity, mask_obj.float(), loss="L1", normalise=False)
loss_opacity = torch.mul(loss_opacity_raw, mask_sem) # but ignore -1 unkown area e.g., mask edges
# loss_all += loss_opacity * opacity_scaling
loss_opacity = render_rays.reduce_batch_loss(loss_opacity, var=None, avg=True, mask=mask_sem) # todo var
# loss for bp
l_batch = loss_depth + loss_col * color_scaling + loss_opacity * opacity_scaling
loss = l_batch.sum()
return loss, None # return loss, loss_all.detach()
================================================
FILE: metric/eval_3D_obj.py
================================================
import numpy as np
from tqdm import tqdm
import trimesh
from metrics import accuracy, completion, completion_ratio
import os
import json
def calc_3d_metric(mesh_rec, mesh_gt, N=200000):
"""
3D reconstruction metric.
"""
metrics = [[] for _ in range(4)]
transform, extents = trimesh.bounds.oriented_bounds(mesh_gt)
extents = extents / 0.9 # enlarge 0.9
box = trimesh.creation.box(extents=extents, transform=np.linalg.inv(transform))
mesh_rec = mesh_rec.slice_plane(box.facets_origin, -box.facets_normal)
if mesh_rec.vertices.shape[0] == 0:
print("no mesh found")
return
rec_pc = trimesh.sample.sample_surface(mesh_rec, N)
rec_pc_tri = trimesh.PointCloud(vertices=rec_pc[0])
gt_pc = trimesh.sample.sample_surface(mesh_gt, N)
gt_pc_tri = trimesh.PointCloud(vertices=gt_pc[0])
accuracy_rec = accuracy(gt_pc_tri.vertices, rec_pc_tri.vertices)
completion_rec = completion(gt_pc_tri.vertices, rec_pc_tri.vertices)
completion_ratio_rec = completion_ratio(gt_pc_tri.vertices, rec_pc_tri.vertices, 0.05)
completion_ratio_rec_1 = completion_ratio(gt_pc_tri.vertices, rec_pc_tri.vertices, 0.01)
# accuracy_rec *= 100 # convert to cm
# completion_rec *= 100 # convert to cm
# completion_ratio_rec *= 100 # convert to %
# print('accuracy: ', accuracy_rec)
# print('completion: ', completion_rec)
# print('completion ratio: ', completion_ratio_rec)
# print("completion_ratio_rec_1cm ", completion_ratio_rec_1)
metrics[0].append(accuracy_rec)
metrics[1].append(completion_rec)
metrics[2].append(completion_ratio_rec_1)
metrics[3].append(completion_ratio_rec)
return metrics
def get_gt_bg_mesh(gt_dir, background_cls_list):
with open(os.path.join(gt_dir, "info_semantic.json")) as f:
label_obj_list = json.load(f)["objects"]
bg_meshes = []
for obj in label_obj_list:
if int(obj["class_id"]) in background_cls_list:
obj_file = os.path.join(gt_dir, "mesh_semantic.ply_" + str(int(obj["id"])) + ".ply")
obj_mesh = trimesh.load(obj_file)
bg_meshes.append(obj_mesh)
bg_mesh = trimesh.util.concatenate(bg_meshes)
return bg_mesh
def get_obj_ids(obj_dir):
files = os.listdir(obj_dir)
obj_ids = []
for f in files:
obj_id = f.split("obj")[1][:-1]
if obj_id == '':
continue
obj_ids.append(int(obj_id))
return obj_ids
if __name__ == "__main__":
background_cls_list = [5, 12, 30, 31, 40, 60, 92, 93, 95, 97, 98, 79]
exp_name = ["room0", "room1", "room2", "office0", "office1", "office2", "office3", "office4"]
data_dir = "/home/xin/data/vmap/"
log_dir = "../logs/iMAP/"
# log_dir = "../logs/vMAP/"
for exp in tqdm(exp_name):
gt_dir = os.path.join(data_dir, exp[:-1]+"_"+exp[-1]+"/habitat")
exp_dir = os.path.join(log_dir, exp)
mesh_dir = os.path.join(exp_dir, "scene_mesh")
output_path = os.path.join(exp_dir, "eval_mesh")
os.makedirs(output_path, exist_ok=True)
metrics_3D = [[] for _ in range(4)]
# get obj ids
# obj_ids = np.loadtxt() # todo use a pre-defined obj list or use vMAP results
obj_ids = get_obj_ids(mesh_dir.replace("iMAP", "vMAP"))
for obj_id in tqdm(obj_ids):
if obj_id == 0: # for bg
N = 200000
mesh_gt = get_gt_bg_mesh(gt_dir, background_cls_list)
else: # for obj
N = 10000
obj_file = os.path.join(gt_dir, "mesh_semantic.ply_" + str(obj_id) + ".ply")
mesh_gt = trimesh.load(obj_file)
if "vMAP" in exp_dir:
rec_meshfile = os.path.join(mesh_dir, "frame_1999_obj"+str(obj_id)+".obj")
elif "iMAP" in exp_dir:
rec_meshfile = os.path.join(mesh_dir, "frame_1999_obj0.obj")
else:
print("Not Implement")
exit(-1)
mesh_rec = trimesh.load(rec_meshfile)
# mesh_rec.invert() # niceslam mesh face needs invert
metrics = calc_3d_metric(mesh_rec, mesh_gt, N=N) # for objs use 10k, for scene use 200k points
if metrics is None:
continue
np.save(output_path + '/metric_obj{}.npy'.format(obj_id), np.array(metrics))
metrics_3D[0].append(metrics[0]) # acc
metrics_3D[1].append(metrics[1]) # comp
metrics_3D[2].append(metrics[2]) # comp ratio 1cm
metrics_3D[3].append(metrics[3]) # comp ratio 5cm
metrics_3D = np.array(metrics_3D)
np.save(output_path + '/metrics_3D_obj.npy', metrics_3D)
print("metrics 3D obj \n Acc | Comp | Comp Ratio 1cm | Comp Ratio 5cm \n", metrics_3D.mean(axis=1))
print("-----------------------------------------")
print("finish exp ", exp)
================================================
FILE: metric/eval_3D_scene.py
================================================
import numpy as np
from tqdm import tqdm
import trimesh
from metrics import accuracy, completion, completion_ratio
import os
def calc_3d_metric(mesh_rec, mesh_gt, N=200000):
"""
3D reconstruction metric.
"""
metrics = [[] for _ in range(4)]
rec_pc = trimesh.sample.sample_surface(mesh_rec, N)
rec_pc_tri = trimesh.PointCloud(vertices=rec_pc[0])
gt_pc = trimesh.sample.sample_surface(mesh_gt, N)
gt_pc_tri = trimesh.PointCloud(vertices=gt_pc[0])
accuracy_rec = accuracy(gt_pc_tri.vertices, rec_pc_tri.vertices)
completion_rec = completion(gt_pc_tri.vertices, rec_pc_tri.vertices)
completion_ratio_rec = completion_ratio(gt_pc_tri.vertices, rec_pc_tri.vertices, 0.05)
completion_ratio_rec_1 = completion_ratio(gt_pc_tri.vertices, rec_pc_tri.vertices, 0.01)
# accuracy_rec *= 100 # convert to cm
# completion_rec *= 100 # convert to cm
# completion_ratio_rec *= 100 # convert to %
# print('accuracy: ', accuracy_rec)
# print('completion: ', completion_rec)
# print('completion ratio: ', completion_ratio_rec)
# print("completion_ratio_rec_1cm ", completion_ratio_rec_1)
metrics[0].append(accuracy_rec)
metrics[1].append(completion_rec)
metrics[2].append(completion_ratio_rec_1)
metrics[3].append(completion_ratio_rec)
return metrics
if __name__ == "__main__":
exp_name = ["room0", "room1", "room2", "office0", "office1", "office2", "office3", "office4"]
data_dir = "/home/xin/data/vmap/"
# log_dir = "../logs/iMAP/"
log_dir = "../logs/vMAP/"
for exp in tqdm(exp_name):
gt_dir = os.path.join(data_dir, exp[:-1]+"_"+exp[-1]+"/habitat")
exp_dir = os.path.join(log_dir, exp)
mesh_dir = os.path.join(exp_dir, "scene_mesh")
output_path = os.path.join(exp_dir, "eval_mesh")
os.makedirs(output_path, exist_ok=True)
if "vMAP" in exp_dir:
mesh_list = os.listdir(mesh_dir)
if "frame_1999_scene.obj" in mesh_list:
rec_meshfile = os.path.join(mesh_dir, "frame_1999_scene.obj")
else: # compose obj into scene mesh
scene_meshes = []
for f in mesh_list:
_, f_type = os.path.splitext(f)
if f_type == ".obj" or f_type == ".ply":
obj_mesh = trimesh.load(os.path.join(mesh_dir, f))
scene_meshes.append(obj_mesh)
scene_mesh = trimesh.util.concatenate(scene_meshes)
scene_mesh.export(os.path.join(mesh_dir, "frame_1999_scene.obj"))
rec_meshfile = os.path.join(mesh_dir, "frame_1999_scene.obj")
elif "iMAP" in exp_dir: # obj0 is the scene mesh
rec_meshfile = os.path.join(mesh_dir, "frame_1999_obj0.obj")
else:
print("Not Implement")
exit(-1)
gt_mesh_files = os.listdir(gt_dir)
gt_mesh_file = os.path.join(gt_dir, "../mesh.ply")
mesh_rec = trimesh.load(rec_meshfile)
# mesh_rec.invert() # niceslam mesh face needs invert
metrics_3D = [[] for _ in range(4)]
mesh_gt = trimesh.load(gt_mesh_file)
metrics = calc_3d_metric(mesh_rec, mesh_gt, N=200000) # for objs use 10k, for scene use 200k points
metrics_3D[0].append(metrics[0]) # acc
metrics_3D[1].append(metrics[1]) # comp
metrics_3D[2].append(metrics[2]) # comp ratio 1cm
metrics_3D[3].append(metrics[3]) # comp ratio 5cm
metrics_3D = np.array(metrics_3D)
np.save(output_path + '/metrics_3D_scene.npy', metrics_3D)
print("metrics 3D scene \n Acc | Comp | Comp Ratio 1cm | Comp Ratio 5cm \n ", metrics_3D.mean(axis=1))
print("-----------------------------------------")
print("finish exp ", exp)
================================================
FILE: metric/metrics.py
================================================
import numpy as np
from scipy.spatial import cKDTree as KDTree
def completion_ratio(gt_points, rec_points, dist_th=0.01):
gen_points_kd_tree = KDTree(rec_points)
one_distances, one_vertex_ids = gen_points_kd_tree.query(gt_points)
completion = np.mean((one_distances < dist_th).astype(np.float))
return completion
def accuracy(gt_points, rec_points):
gt_points_kd_tree = KDTree(gt_points)
two_distances, two_vertex_ids = gt_points_kd_tree.query(rec_points)
gen_to_gt_chamfer = np.mean(two_distances)
return gen_to_gt_chamfer
def completion(gt_points, rec_points):
gt_points_kd_tree = KDTree(rec_points)
one_distances, two_vertex_ids = gt_points_kd_tree.query(gt_points)
gt_to_gen_chamfer = np.mean(one_distances)
return gt_to_gen_chamfer
def chamfer(gt_points, rec_points):
# one direction
gen_points_kd_tree = KDTree(rec_points)
one_distances, one_vertex_ids = gen_points_kd_tree.query(gt_points)
gt_to_gen_chamfer = np.mean(one_distances)
# other direction
gt_points_kd_tree = KDTree(gt_points)
two_distances, two_vertex_ids = gt_points_kd_tree.query(rec_points)
gen_to_gt_chamfer = np.mean(two_distances)
return (gt_to_gen_chamfer + gen_to_gt_chamfer) / 2.
================================================
FILE: model.py
================================================
import torch
import torch.nn as nn
def init_weights(m, init_fn=torch.nn.init.xavier_normal_):
if type(m) == torch.nn.Linear:
init_fn(m.weight)
def fc_block(in_f, out_f):
return torch.nn.Sequential(
torch.nn.Linear(in_f, out_f),
torch.nn.ReLU(out_f)
)
class OccupancyMap(torch.nn.Module):
def __init__(
self,
emb_size1,
emb_size2,
hidden_size=256,
do_color=True,
hidden_layers_block=1
):
super(OccupancyMap, self).__init__()
self.do_color = do_color
self.embedding_size1 = emb_size1
self.in_layer = fc_block(self.embedding_size1, hidden_size)
hidden1 = [fc_block(hidden_size, hidden_size)
for _ in range(hidden_layers_block)]
self.mid1 = torch.nn.Sequential(*hidden1)
# self.embedding_size2 = 21*(5+1)+3 - self.embedding_size # 129-66=63 32
self.embedding_size2 = emb_size2
self.cat_layer = fc_block(
hidden_size + self.embedding_size1, hidden_size)
# self.cat_layer = fc_block(
# hidden_size , hidden_size)
hidden2 = [fc_block(hidden_size, hidden_size)
for _ in range(hidden_layers_block)]
self.mid2 = torch.nn.Sequential(*hidden2)
self.out_alpha = torch.nn.Linear(hidden_size, 1)
if self.do_color:
self.color_linear = fc_block(self.embedding_size2 + hidden_size, hidden_size)
self.out_color = torch.nn.Linear(hidden_size, 3)
# self.relu = torch.nn.functional.relu
self.sigmoid = torch.sigmoid
def forward(self, x,
noise_std=None,
do_alpha=True,
do_color=True,
do_cat=True):
fc1 = self.in_layer(x[...,:self.embedding_size1])
fc2 = self.mid1(fc1)
# fc3 = self.cat_layer(fc2)
if do_cat:
fc2_x = torch.cat((fc2, x[...,:self.embedding_size1]), dim=-1)
fc3 = self.cat_layer(fc2_x)
else:
fc3 = fc2
fc4 = self.mid2(fc3)
alpha = None
if do_alpha:
raw = self.out_alpha(fc4) # todo ignore noise
if noise_std is not None:
noise = torch.randn(raw.shape, device=x.device) * noise_std
raw = raw + noise
# alpha = self.relu(raw) * scale # nerf
alpha = raw * 10. #self.scale # unisurf
color = None
if self.do_color and do_color:
fc4_cat = self.color_linear(torch.cat((fc4, x[..., self.embedding_size1:]), dim=-1))
raw_color = self.out_color(fc4_cat)
color = self.sigmoid(raw_color)
return alpha, color
================================================
FILE: render_rays.py
================================================
import torch
import numpy as np
def occupancy_activation(alpha, distances=None):
# occ = 1.0 - torch.exp(-alpha * distances)
occ = torch.sigmoid(alpha) # unisurf
return occ
def alpha_to_occupancy(depths, dirs, alpha, add_last=False):
interval_distances = depths[..., 1:] - depths[..., :-1]
if add_last:
last_distance = torch.empty(
(depths.shape[0], 1),
device=depths.device,
dtype=depths.dtype).fill_(0.1)
interval_distances = torch.cat(
[interval_distances, last_distance], dim=-1)
dirs_norm = torch.norm(dirs, dim=-1)
interval_distances = interval_distances * dirs_norm[:, None]
occ = occupancy_activation(alpha, interval_distances)
return occ
def occupancy_to_termination(occupancy, is_batch=False):
if is_batch:
first = torch.ones(list(occupancy.shape[:2]) + [1], device=occupancy.device)
free_probs = (1. - occupancy + 1e-10)[:, :, :-1]
else:
first = torch.ones([occupancy.shape[0], 1], device=occupancy.device)
free_probs = (1. - occupancy + 1e-10)[:, :-1]
free_probs = torch.cat([first, free_probs], dim=-1)
term_probs = occupancy * torch.cumprod(free_probs, dim=-1)
# using escape probability
# occupancy = occupancy[:, :-1]
# first = torch.ones([occupancy.shape[0], 1], device=occupancy.device)
# free_probs = (1. - occupancy + 1e-10)
# free_probs = torch.cat([first, free_probs], dim=-1)
# last = torch.ones([occupancy.shape[0], 1], device=occupancy.device)
# occupancy = torch.cat([occupancy, last], dim=-1)
# term_probs = occupancy * torch.cumprod(free_probs, dim=-1)
return term_probs
def render(termination, vals, dim=-1):
weighted_vals = termination * vals
render = weighted_vals.sum(dim=dim)
return render
def render_loss(render, gt, loss="L1", normalise=False):
residual = render - gt
if loss == "L2":
loss_mat = residual ** 2
elif loss == "L1":
loss_mat = torch.abs(residual)
else:
print("loss type {} not implemented!".format(loss))
if normalise:
loss_mat = loss_mat / gt
return loss_mat
def reduce_batch_loss(loss_mat, var=None, avg=True, mask=None, loss_type="L1"):
mask_num = torch.sum(mask, dim=-1)
if (mask_num == 0).any(): # no valid sample, return 0 loss
loss = torch.zeros_like(loss_mat)
if avg:
loss = torch.mean(loss, dim=-1)
return loss
if var is not None:
eps = 1e-4
if loss_type == "L2":
information = 1.0 / (var + eps)
elif loss_type == "L1":
information = 1.0 / (torch.sqrt(var) + eps)
loss_weighted = loss_mat * information
else:
loss_weighted = loss_mat
if avg:
if mask is not None:
loss = (torch.sum(loss_weighted, dim=-1)/(torch.sum(mask, dim=-1)+1e-10))
if (loss > 100000).any():
print("loss explode")
exit(-1)
else:
loss = torch.mean(loss_weighted, dim=-1).sum()
else:
loss = loss_weighted
return loss
def make_3D_grid(occ_range=[-1., 1.], dim=256, device="cuda:0", transform=None, scale=None):
t = torch.linspace(occ_range[0], occ_range[1], steps=dim, device=device)
grid = torch.meshgrid(t, t, t)
grid_3d = torch.cat(
(grid[0][..., None],
grid[1][..., None],
grid[2][..., None]), dim=3
)
if scale is not None:
grid_3d = grid_3d * scale
if transform is not None:
R1 = transform[None, None, None, 0, :3]
R2 = transform[None, None, None, 1, :3]
R3 = transform[None, None, None, 2, :3]
grid1 = (R1 * grid_3d).sum(-1, keepdim=True)
grid2 = (R2 * grid_3d).sum(-1, keepdim=True)
grid3 = (R3 * grid_3d).sum(-1, keepdim=True)
grid_3d = torch.cat([grid1, grid2, grid3], dim=-1)
trans = transform[None, None, None, :3, 3]
grid_3d = grid_3d + trans
return grid_3d
================================================
FILE: train.py
================================================
import time
import loss
from vmap import *
import utils
import open3d
import dataset
import vis
from functorch import vmap
import argparse
from cfg import Config
import shutil
if __name__ == "__main__":
#############################################
# init config
torch.backends.cudnn.enabled = True
torch.backends.cudnn.benchmark = True
# setting params
parser = argparse.ArgumentParser(description='Model training for single GPU')
parser.add_argument('--logdir', default='./logs/debug',
type=str)
parser.add_argument('--config',
default='./configs/Replica/config_replica_room0_vMAP.json',
type=str)
parser.add_argument('--save_ckpt',
default=False,
type=bool)
args = parser.parse_args()
log_dir = args.logdir
config_file = args.config
save_ckpt = args.save_ckpt
os.makedirs(log_dir, exist_ok=True) # saving logs
shutil.copy(config_file, log_dir)
cfg = Config(config_file) # config params
n_sample_per_step = cfg.n_per_optim
n_sample_per_step_bg = cfg.n_per_optim_bg
# param for vis
vis3d = open3d.visualization.Visualizer()
vis3d.create_window(window_name="3D mesh vis",
width=cfg.W,
height=cfg.H,
left=600, top=50)
view_ctl = vis3d.get_view_control()
view_ctl.set_constant_z_far(10.)
# set camera
cam_info = cameraInfo(cfg)
intrinsic_open3d = open3d.camera.PinholeCameraIntrinsic(
width=cfg.W,
height=cfg.H,
fx=cfg.fx,
fy=cfg.fy,
cx=cfg.cx,
cy=cfg.cy)
# init obj_dict
obj_dict = {} # only objs
vis_dict = {} # including bg
# init for training
AMP = False
if AMP:
scaler = torch.cuda.amp.GradScaler() # amp https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/
optimiser = torch.optim.AdamW([torch.autograd.Variable(torch.tensor(0))], lr=cfg.learning_rate, weight_decay=cfg.weight_decay)
#############################################
# init data stream
if not cfg.live_mode:
# load dataset
dataloader = dataset.init_loader(cfg)
dataloader_iterator = iter(dataloader)
dataset_len = len(dataloader)
else:
dataset_len = 1000000
# # init ros node
# torch.multiprocessing.set_start_method('spawn') # spawn
# import ros_nodes
# track_to_map_Buffer = torch.multiprocessing.Queue(maxsize=5)
# # track_to_vis_T_WC = torch.multiprocessing.Queue(maxsize=1)
# kfs_que = torch.multiprocessing.Queue(maxsize=5) # to store one more buffer
# track_p = torch.multiprocessing.Process(target=ros_nodes.Tracking,
# args=(
# (cfg), (track_to_map_Buffer), (None),
# (kfs_que), (True),))
# track_p.start()
# init vmap
fc_models, pe_models = [], []
scene_bg = None
for frame_id in tqdm(range(dataset_len)):
print("*********************************************")
# get new frame data
with performance_measure(f"getting next data"):
if not cfg.live_mode:
# get data from dataloader
sample = next(dataloader_iterator)
else:
pass
if sample is not None: # new frame
last_frame_time = time.time()
with performance_measure(f"Appending data"):
rgb = sample["image"].to(cfg.data_device)
depth = sample["depth"].to(cfg.data_device)
twc = sample["T"].to(cfg.data_device)
bbox_dict = sample["bbox_dict"]
if "frame_id" in sample.keys():
live_frame_id = sample["frame_id"]
else:
live_frame_id = frame_id
if not cfg.live_mode:
inst = sample["obj"].to(cfg.data_device)
obj_ids = torch.unique(inst)
else:
inst_data_dict = sample["obj"]
obj_ids = inst_data_dict.keys()
# append new frame info to objs in current view
for obj_id in obj_ids:
if obj_id == -1: # unsured area
continue
obj_id = int(obj_id)
# convert inst mask to state
if not cfg.live_mode:
state = torch.zeros_like(inst, dtype=torch.uint8, device=cfg.data_device)
state[inst == obj_id] = 1
state[inst == -1] = 2
else:
inst_mask = inst_data_dict[obj_id].permute(1,0)
label_list = torch.unique(inst_mask).tolist()
state = torch.zeros_like(inst_mask, dtype=torch.uint8, device=cfg.data_device)
state[inst_mask == obj_id] = 1
state[inst_mask == -1] = 2
bbox = bbox_dict[obj_id]
if obj_id in vis_dict.keys():
scene_obj = vis_dict[obj_id]
scene_obj.append_keyframe(rgb, depth, state, bbox, twc, live_frame_id)
else: # init scene_obj
if len(obj_dict.keys()) >= cfg.max_n_models:
print("models full!!!! current num ", len(obj_dict.keys()))
continue
print("init new obj ", obj_id)
if cfg.do_bg and obj_id == 0: # todo param
scene_bg = sceneObject(cfg, obj_id, rgb, depth, state, bbox, twc, live_frame_id)
# scene_bg.init_obj_center(intrinsic_open3d, depth, state, twc)
optimiser.add_param_group({"params": scene_bg.trainer.fc_occ_map.parameters(), "lr": cfg.learning_rate, "weight_decay": cfg.weight_decay})
optimiser.add_param_group({"params": scene_bg.trainer.pe.parameters(), "lr": cfg.learning_rate, "weight_decay": cfg.weight_decay})
vis_dict.update({obj_id: scene_bg})
else:
scene_obj = sceneObject(cfg, obj_id, rgb, depth, state, bbox, twc, live_frame_id)
# scene_obj.init_obj_center(intrinsic_open3d, depth, state, twc)
obj_dict.update({obj_id: scene_obj})
vis_dict.update({obj_id: scene_obj})
# params = [scene_obj.trainer.fc_occ_map.parameters(), scene_obj.trainer.pe.parameters()]
optimiser.add_param_group({"params": scene_obj.trainer.fc_occ_map.parameters(), "lr": cfg.learning_rate, "weight_decay": cfg.weight_decay})
optimiser.add_param_group({"params": scene_obj.trainer.pe.parameters(), "lr": cfg.learning_rate, "weight_decay": cfg.weight_decay})
if cfg.training_strategy == "vmap":
update_vmap_model = True
fc_models.append(obj_dict[obj_id].trainer.fc_occ_map)
pe_models.append(obj_dict[obj_id].trainer.pe)
# ###################################
# # measure trainable params in total
# total_params = 0
# obj_k = obj_dict[obj_id]
# for p in obj_k.trainer.fc_occ_map.parameters():
# if p.requires_grad:
# total_params += p.numel()
# for p in obj_k.trainer.pe.parameters():
# if p.requires_grad:
# total_params += p.numel()
# print("total param ", total_params)
# dynamically add vmap
with performance_measure(f"add vmap"):
if cfg.training_strategy == "vmap" and update_vmap_model == True:
fc_model, fc_param, fc_buffer = utils.update_vmap(fc_models, optimiser)
pe_model, pe_param, pe_buffer = utils.update_vmap(pe_models, optimiser)
update_vmap_model = False
##################################################################
# training data preperation, get training data for all objs
Batch_N_gt_depth = []
Batch_N_gt_rgb = []
Batch_N_depth_mask = []
Batch_N_obj_mask = []
Batch_N_input_pcs = []
Batch_N_sampled_z = []
with performance_measure(f"Sampling over {len(obj_dict.keys())} objects,"):
if cfg.do_bg and scene_bg is not None:
gt_rgb, gt_depth, valid_depth_mask, obj_mask, input_pcs, sampled_z \
= scene_bg.get_training_samples(cfg.n_iter_per_frame * cfg.win_size_bg, cfg.n_samples_per_frame_bg,
cam_info.rays_dir_cache)
bg_gt_depth = gt_depth.reshape([gt_depth.shape[0] * gt_depth.shape[1]])
bg_gt_rgb = gt_rgb.reshape([gt_rgb.shape[0] * gt_rgb.shape[1], gt_rgb.shape[2]])
bg_valid_depth_mask = valid_depth_mask
bg_obj_mask = obj_mask
bg_input_pcs = input_pcs.reshape(
[input_pcs.shape[0] * input_pcs.shape[1], input_pcs.shape[2], input_pcs.shape[3]])
bg_sampled_z = sampled_z.reshape([sampled_z.shape[0] * sampled_z.shape[1], sampled_z.shape[2]])
for obj_id, obj_k in obj_dict.items():
gt_rgb, gt_depth, valid_depth_mask, obj_mask, input_pcs, sampled_z \
= obj_k.get_training_samples(cfg.n_iter_per_frame * cfg.win_size, cfg.n_samples_per_frame,
cam_info.rays_dir_cache)
# merge first two dims, sample_per_frame*num_per_frame
Batch_N_gt_depth.append(gt_depth.reshape([gt_depth.shape[0] * gt_depth.shape[1]]))
Batch_N_gt_rgb.append(gt_rgb.reshape([gt_rgb.shape[0] * gt_rgb.shape[1], gt_rgb.shape[2]]))
Batch_N_depth_mask.append(valid_depth_mask)
Batch_N_obj_mask.append(obj_mask)
Batch_N_input_pcs.append(input_pcs.reshape([input_pcs.shape[0] * input_pcs.shape[1], input_pcs.shape[2], input_pcs.shape[3]]))
Batch_N_sampled_z.append(sampled_z.reshape([sampled_z.shape[0] * sampled_z.shape[1], sampled_z.shape[2]]))
# # vis sampled points in open3D
# # sampled pcs
# pc = open3d.geometry.PointCloud()
# pc.points = open3d.utility.Vector3dVector(input_pcs.cpu().numpy().reshape(-1,3))
# open3d.visualization.draw_geometries([pc])
# rgb_np = rgb.cpu().numpy().astype(np.uint8).transpose(1,0,2)
# # print("rgb ", rgb_np.shape)
# # print(rgb_np)
# # cv2.imshow("rgb", rgb_np)
# # cv2.waitKey(1)
# depth_np = depth.cpu().numpy().astype(np.float32).transpose(1,0)
# twc_np = twc.cpu().numpy()
# rgbd = open3d.geometry.RGBDImage.create_from_color_and_depth(
# open3d.geometry.Image(rgb_np),
# open3d.geometry.Image(depth_np),
# depth_trunc=max_depth,
# depth_scale=1,
# convert_rgb_to_intensity=False,
# )
# T_CW = np.linalg.inv(twc_np)
# # input image pc
# input_pc = open3d.geometry.PointCloud.create_from_rgbd_image(
# image=rgbd,
# intrinsic=intrinsic_open3d,
# extrinsic=T_CW)
# input_pc.points = open3d.utility.Vector3dVector(np.array(input_pc.points) - obj_k.obj_center.cpu().numpy())
# open3d.visualization.draw_geometries([pc, input_pc])
####################################################
# training
assert len(Batch_N_input_pcs) > 0
# move data to GPU (n_obj, n_iter_per_frame, win_size*num_per_frame, 3)
with performance_measure(f"stacking and moving to gpu: "):
Batch_N_input_pcs = torch.stack(Batch_N_input_pcs).to(cfg.training_device)
Batch_N_gt_depth = torch.stack(Batch_N_gt_depth).to(cfg.training_device)
Batch_N_gt_rgb = torch.stack(Batch_N_gt_rgb).to(cfg.training_device) / 255. # todo
Batch_N_depth_mask = torch.stack(Batch_N_depth_mask).to(cfg.training_device)
Batch_N_obj_mask = torch.stack(Batch_N_obj_mask).to(cfg.training_device)
Batch_N_sampled_z = torch.stack(Batch_N_sampled_z).to(cfg.training_device)
if cfg.do_bg:
bg_input_pcs = bg_input_pcs.to(cfg.training_device)
bg_gt_depth = bg_gt_depth.to(cfg.training_device)
bg_gt_rgb = bg_gt_rgb.to(cfg.training_device) / 255.
bg_valid_depth_mask = bg_valid_depth_mask.to(cfg.training_device)
bg_obj_mask = bg_obj_mask.to(cfg.training_device)
bg_sampled_z = bg_sampled_z.to(cfg.training_device)
with performance_measure(f"Training over {len(obj_dict.keys())} objects,"):
for iter_step in range(cfg.n_iter_per_frame):
data_idx = slice(iter_step*n_sample_per_step, (iter_step+1)*n_sample_per_step)
batch_input_pcs = Batch_N_input_pcs[:, data_idx, ...]
batch_gt_depth = Batch_N_gt_depth[:, data_idx, ...]
batch_gt_rgb = Batch_N_gt_rgb[:, data_idx, ...]
batch_depth_mask = Batch_N_depth_mask[:, data_idx, ...]
batch_obj_mask = Batch_N_obj_mask[:, data_idx, ...]
batch_sampled_z = Batch_N_sampled_z[:, data_idx, ...]
if cfg.training_strategy == "forloop":
# for loop training
batch_alpha = []
batch_color = []
for k, obj_id in enumerate(obj_dict.keys()):
obj_k = obj_dict[obj_id]
embedding_k = obj_k.trainer.pe(batch_input_pcs[k])
alpha_k, color_k = obj_k.trainer.fc_occ_map(embedding_k)
batch_alpha.append(alpha_k)
batch_color.append(color_k)
batch_alpha = torch.stack(batch_alpha)
batch_color = torch.stack(batch_color)
elif cfg.training_strategy == "vmap":
# batched training
batch_embedding = vmap(pe_model)(pe_param, pe_buffer, batch_input_pcs)
batch_alpha, batch_color = vmap(fc_model)(fc_param, fc_buffer, batch_embedding)
# print("batch alpha ", batch_alpha.shape)
else:
print("training strategy {} is not implemented ".format(cfg.training_strategy))
exit(-1)
# step loss
# with performance_measure(f"Batch LOSS"):
batch_loss, _ = loss.step_batch_loss(batch_alpha, batch_color,
batch_gt_depth.detach(), batch_gt_rgb.detach(),
batch_obj_mask.detach(), batch_depth_mask.detach(),
batch_sampled_z.detach())
if cfg.do_bg:
bg_data_idx = slice(iter_step * n_sample_per_step_bg, (iter_step + 1) * n_sample_per_step_bg)
bg_embedding = scene_bg.trainer.pe(bg_input_pcs[bg_data_idx, ...])
bg_alpha, bg_color = scene_bg.trainer.fc_occ_map(bg_embedding)
bg_loss, _ = loss.step_batch_loss(bg_alpha[None, ...], bg_color[None, ...],
bg_gt_depth[None, bg_data_idx, ...].detach(), bg_gt_rgb[None, bg_data_idx].detach(),
bg_obj_mask[None, bg_data_idx, ...].detach(), bg_valid_depth_mask[None, bg_data_idx, ...].detach(),
bg_sampled_z[None, bg_data_idx, ...].detach())
batch_loss += bg_loss
# with performance_measure(f"Backward"):
if AMP:
scaler.scale(batch_loss).backward()
scaler.step(optimiser)
scaler.update()
else:
batch_loss.backward()
optimiser.step()
optimiser.zero_grad(set_to_none=True)
# print("loss ", batch_loss.item())
# update each origin model params
# todo find a better way # https://github.com/pytorch/functorch/issues/280
with performance_measure(f"updating vmap param"):
if cfg.training_strategy == "vmap":
with torch.no_grad():
for model_id, (obj_id, obj_k) in enumerate(obj_dict.items()):
for i, param in enumerate(obj_k.trainer.fc_occ_map.parameters()):
param.copy_(fc_param[i][model_id])
for i, param in enumerate(obj_k.trainer.pe.parameters()):
param.copy_(pe_param[i][model_id])
####################################################################
# live vis mesh
if (((frame_id % cfg.n_vis_iter) == 0 or frame_id == dataset_len-1) or
(cfg.live_mode and time.time()-last_frame_time>cfg.keep_live_time)) and frame_id >= 10:
vis3d.clear_geometries()
for obj_id, obj_k in vis_dict.items():
bound = obj_k.get_bound(intrinsic_open3d)
if bound is None:
print("get bound failed obj ", obj_id)
continue
adaptive_grid_dim = int(np.minimum(np.max(bound.extent)//cfg.live_voxel_size+1, cfg.grid_dim))
mesh = obj_k.trainer.meshing(bound, obj_k.obj_center, grid_dim=adaptive_grid_dim)
if mesh is None:
print("meshing failed obj ", obj_id)
continue
# save to dir
obj_mesh_output = os.path.join(log_dir, "scene_mesh")
os.makedirs(obj_mesh_output, exist_ok=True)
mesh.export(os.path.join(obj_mesh_output, "frame_{}_obj{}.obj".format(frame_id, str(obj_id))))
# live vis
open3d_mesh = vis.trimesh_to_open3d(mesh)
vis3d.add_geometry(open3d_mesh)
vis3d.add_geometry(bound)
# update vis3d
vis3d.poll_events()
vis3d.update_renderer()
if False: # follow cam
cam = view_ctl.convert_to_pinhole_camera_parameters()
T_CW_np = np.linalg.inv(twc.cpu().numpy())
cam.extrinsic = T_CW_np
view_ctl.convert_from_pinhole_camera_parameters(cam)
vis3d.poll_events()
vis3d.update_renderer()
with performance_measure("saving ckpt"):
if save_ckpt and ((((frame_id % cfg.n_vis_iter) == 0 or frame_id == dataset_len - 1) or
(cfg.live_mode and time.time() - last_frame_time > cfg.keep_live_time)) and frame_id >= 10):
for obj_id, obj_k in vis_dict.items():
ckpt_dir = os.path.join(log_dir, "ckpt", str(obj_id))
os.makedirs(ckpt_dir, exist_ok=True)
bound = obj_k.get_bound(intrinsic_open3d) # update bound
obj_k.save_checkpoints(ckpt_dir, frame_id)
# save current cam pose
cam_dir = os.path.join(log_dir, "cam_pose")
os.makedirs(cam_dir, exist_ok=True)
torch.save({"twc": twc,}, os.path.join(cam_dir, "twc_frame_{}".format(frame_id) + ".pth"))
================================================
FILE: trainer.py
================================================
import torch
import model
import embedding
import render_rays
import numpy as np
import vis
from tqdm import tqdm
class Trainer:
def __init__(self, cfg):
self.obj_id = cfg.obj_id
self.device = cfg.training_device
self.hidden_feature_size = cfg.hidden_feature_size #32 for obj # 256 for iMAP, 128 for seperate bg
self.obj_scale = cfg.obj_scale # 10 for bg and iMAP
self.n_unidir_funcs = cfg.n_unidir_funcs
self.emb_size1 = 21*(3+1)+3
self.emb_size2 = 21*(5+1)+3 - self.emb_size1
self.load_network()
if self.obj_id == 0:
self.bound_extent = 0.995
else:
self.bound_extent = 0.9
def load_network(self):
self.fc_occ_map = model.OccupancyMap(
self.emb_size1,
self.emb_size2,
hidden_size=self.hidden_feature_size
)
self.fc_occ_map.apply(model.init_weights).to(self.device)
self.pe = embedding.UniDirsEmbed(max_deg=self.n_unidir_funcs, scale=self.obj_scale).to(self.device)
def meshing(self, bound, obj_center, grid_dim=256):
occ_range = [-1., 1.]
range_dist = occ_range[1] - occ_range[0]
scene_scale_np = bound.extent / (range_dist * self.bound_extent)
scene_scale = torch.from_numpy(scene_scale_np).float().to(self.device)
transform_np = np.eye(4, dtype=np.float32)
transform_np[:3, 3] = bound.center
transform_np[:3, :3] = bound.R
# transform_np = np.linalg.inv(transform_np) #
transform = torch.from_numpy(transform_np).to(self.device)
grid_pc = render_rays.make_3D_grid(occ_range=occ_range, dim=grid_dim, device=self.device,
scale=scene_scale, transform=transform).view(-1, 3)
grid_pc -= obj_center.to(grid_pc.device)
ret = self.eval_points(grid_pc)
if ret is None:
return None
occ, _ = ret
mesh = vis.marching_cubes(occ.view(grid_dim, grid_dim, grid_dim).cpu().numpy())
if mesh is None:
print("marching cube failed")
return None
# Transform to [-1, 1] range
mesh.apply_translation([-0.5, -0.5, -0.5])
mesh.apply_scale(2)
# Transform to scene coordinates
mesh.apply_scale(scene_scale_np)
mesh.apply_transform(transform_np)
vertices_pts = torch.from_numpy(np.array(mesh.vertices)).float().to(self.device)
ret = self.eval_points(vertices_pts)
if ret is None:
return None
_, color = ret
mesh_color = color * 255
vertex_colors = mesh_color.detach().squeeze(0).cpu().numpy().astype(np.uint8)
mesh.visual.vertex_colors = vertex_colors
return mesh
def eval_points(self, points, chunk_size=100000):
# 256^3 = 16777216
alpha, color = [], []
n_chunks = int(np.ceil(points.shape[0] / chunk_size))
with torch.no_grad():
for k in tqdm(range(n_chunks)): # 2s/it 1000000 pts
chunk_idx = slice(k * chunk_size, (k + 1) * chunk_size)
embedding_k = self.pe(points[chunk_idx, ...])
alpha_k, color_k = self.fc_occ_map(embedding_k)
alpha.extend(alpha_k.detach().squeeze())
color.extend(color_k.detach().squeeze())
alpha = torch.stack(alpha)
color = torch.stack(color)
occ = render_rays.occupancy_activation(alpha).detach()
if occ.max() == 0:
print("no occ")
return None
return (occ, color)
================================================
FILE: utils.py
================================================
import cv2
import imgviz
import numpy as np
import torch
from functorch import combine_state_for_ensemble
import open3d
import queue
import copy
import torch.utils.dlpack
class BoundingBox():
def __init__(self):
super(BoundingBox, self).__init__()
self.extent = None
self.R = None
self.center = None
self.points3d = None # (8,3)
def bbox_open3d2bbox(bbox_o3d):
bbox = BoundingBox()
bbox.extent = bbox_o3d.extent
bbox.R = bbox_o3d.R
bbox.center = bbox_o3d.center
return bbox
def bbox_bbox2open3d(bbox):
bbox_o3d = open3d.geometry.OrientedBoundingBox(bbox.center, bbox.R, bbox.extent)
return bbox_o3d
def update_vmap(models, optimiser):
fmodel, params, buffers = combine_state_for_ensemble(models)
[p.requires_grad_() for p in params]
optimiser.add_param_group({"params": params}) # imap b l
return (fmodel, params, buffers)
def enlarge_bbox(bbox, scale, w, h):
assert scale >= 0
# print(bbox)
min_x, min_y, max_x, max_y = bbox
margin_x = int(0.5 * scale * (max_x - min_x))
margin_y = int(0.5 * scale * (max_y - min_y))
if margin_y == 0 or margin_x == 0:
return None
# assert margin_x != 0
# assert margin_y != 0
min_x -= margin_x
max_x += margin_x
min_y -= margin_y
max_y += margin_y
min_x = np.clip(min_x, 0, w-1)
min_y = np.clip(min_y, 0, h-1)
max_x = np.clip(max_x, 0, w-1)
max_y = np.clip(max_y, 0, h-1)
bbox_enlarged = [int(min_x), int(min_y), int(max_x), int(max_y)]
return bbox_enlarged
def get_bbox2d(obj_mask, bbox_scale=1.0):
contours, hierarchy = cv2.findContours(obj_mask.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)[
-2:]
# # Find the index of the largest contour
# areas = [cv2.contourArea(c) for c in contours]
# max_index = np.argmax(areas)
# cnt = contours[max_index]
# Concatenate all contours
if len(contours) == 0:
return None
cnt = np.concatenate(contours)
x, y, w, h = cv2.boundingRect(cnt) # todo if multiple contours, choose the outmost one?
# x, y, w, h = cv2.boundingRect(contours)
bbox_enlarged = enlarge_bbox([x, y, x + w, y + h], scale=bbox_scale, w=obj_mask.shape[1], h=obj_mask.shape[0])
return bbox_enlarged
def get_bbox2d_batch(img):
b,h,w = img.shape[:3]
rows = torch.any(img, axis=2)
cols = torch.any(img, axis=1)
rmins = torch.argmax(rows.float(), dim=1)
rmaxs = h - torch.argmax(rows.float().flip(dims=[1]), dim=1)
cmins = torch.argmax(cols.float(), dim=1)
cmaxs = w - torch.argmax(cols.float().flip(dims=[1]), dim=1)
return rmins, rmaxs, cmins, cmaxs
def get_latest_queue(q):
message = None
while(True):
try:
message_latest = q.get(block=False)
if message is not None:
del message
message = message_latest
except queue.Empty:
break
return message
# for association/tracking
class InstData:
def __init__(self):
super(InstData, self).__init__()
self.bbox3D = None
self.inst_id = None # instance
self.class_id = None # semantic
self.pc_sample = None
self.merge_cnt = 0 # merge times counting
self.cmp_cnt = 0
def box_filter(masks, classes, depth, inst_dict, intrinsic_open3d, T_CW, min_pixels=500, voxel_size=0.01):
bbox3d_scale = 1.0 # 1.05
inst_data = np.zeros_like(depth, dtype=np.int)
for i in range(len(masks)):
diff_mask = None
inst_mask = masks[i]
inst_id = classes[i]
if inst_id == 0:
continue
inst_depth = np.copy(depth)
inst_depth[~inst_mask] = 0. # inst_mask
# proj_time = time.time()
inst_pc = unproject_pointcloud(inst_depth, intrinsic_open3d, T_CW)
# print("proj time ", time.time()-proj_time)
if len(inst_pc.points) <= 10: # too small
inst_data[inst_mask] = 0 # set to background
continue
if inst_id in inst_dict.keys():
candidate_inst = inst_dict[inst_id]
# iou_time = time.time()
IoU, indices = check_inside_ratio(inst_pc, candidate_inst.bbox3D)
# print("iou time ", time.time()-iou_time)
# if indices empty
candidate_inst.cmp_cnt += 1
if len(indices) >= 1:
candidate_inst.pc += inst_pc.select_by_index(indices) # only merge pcs inside scale*bbox
# todo check indices follow valid depth
valid_depth_mask = np.zeros_like(inst_depth, dtype=np.bool)
valid_pc_mask = valid_depth_mask[inst_depth!=0]
valid_pc_mask[indices] = True
valid_depth_mask[inst_depth != 0] = valid_pc_mask
valid_mask = valid_depth_mask
diff_mask = np.zeros_like(inst_mask)
# uv_opencv, _ = cv2.projectPoints(np.array(inst_pc.select_by_index(indices).points), T_CW[:3, :3],
# T_CW[:3, 3], intrinsic_open3d.intrinsic_matrix[:3, :3], None)
# uv = np.round(uv_opencv).squeeze().astype(int)
# u = uv[:, 0].reshape(-1, 1)
# v = uv[:, 1].reshape(-1, 1)
# vu = np.concatenate([v, u], axis=-1)
# valid_mask = np.zeros_like(inst_mask)
# valid_mask[tuple(vu.T)] = True
# # cv2.imshow("valid", (inst_depth!=0).astype(np.uint8)*255)
# # cv2.waitKey(1)
diff_mask[(inst_depth != 0) & (~valid_mask)] = True
# cv2.imshow("diff_mask", diff_mask.astype(np.uint8) * 255)
# cv2.waitKey(1)
else: # merge all for scannet
# print("too few pcs obj ", inst_id)
inst_data[inst_mask] = -1
continue
# downsample_time = time.time()
# adapt_voxel_size = np.maximum(np.max(candidate_inst.bbox3D.extent)/100, 0.1)
candidate_inst.pc = candidate_inst.pc.voxel_down_sample(voxel_size) # adapt_voxel_size
# candidate_inst.pc = candidate_inst.pc.farthest_point_down_sample(500)
# candidate_inst.pc = candidate_inst.pc.random_down_sample(np.minimum(len(candidate_inst.pc.points)/500.,1))
# print("downsample time ", time.time() - downsample_time) # 0.03s even
# bbox_time = time.time()
try:
candidate_inst.bbox3D = open3d.geometry.OrientedBoundingBox.create_from_points(candidate_inst.pc.points)
except RuntimeError:
# print("too few pcs obj ", inst_id)
inst_data[inst_mask] = -1
continue
# enlarge
candidate_inst.bbox3D.scale(bbox3d_scale, candidate_inst.bbox3D.get_center())
else: # new inst
# init new inst and new sem
new_inst = InstData()
new_inst.inst_id = inst_id
smaller_mask = cv2.erode(inst_mask.astype(np.uint8), np.ones((5, 5)), iterations=3).astype(bool)
if np.sum(smaller_mask) < min_pixels:
# print("too few pcs obj ", inst_id)
inst_data[inst_mask] = 0
continue
inst_depth_small = depth.copy()
inst_depth_small[~smaller_mask] = 0
inst_pc_small = unproject_pointcloud(inst_depth_small, intrinsic_open3d, T_CW)
new_inst.pc = inst_pc_small
new_inst.pc = new_inst.pc.voxel_down_sample(voxel_size)
try:
inst_bbox3D = open3d.geometry.OrientedBoundingBox.create_from_points(new_inst.pc.points)
except RuntimeError:
# print("too few pcs obj ", inst_id)
inst_data[inst_mask] = 0
continue
# scale up
inst_bbox3D.scale(bbox3d_scale, inst_bbox3D.get_center())
new_inst.bbox3D = inst_bbox3D
# update inst_dict
inst_dict.update({inst_id: new_inst}) # init new sem
# update inst_data
inst_data[inst_mask] = inst_id
if diff_mask is not None:
inst_data[diff_mask] = -1 # unsure area
return inst_data
def load_matrix_from_txt(path, shape=(4, 4)):
with open(path) as f:
txt = f.readlines()
txt = ''.join(txt).replace('\n', ' ')
matrix = [float(v) for v in txt.split()]
return np.array(matrix).reshape(shape)
def check_mask_order(obj_masks, depth_np, obj_ids):
print(len(obj_masks))
print(len(obj_ids))
assert len(obj_masks) == len(obj_ids)
depth = torch.from_numpy(depth_np)
obj_masked_modified = copy.deepcopy(obj_masks[:])
for i in range(len(obj_masks) - 1):
mask1 = obj_masks[i].float()
mask1_ = obj_masked_modified[i].float()
for j in range(i + 1, len(obj_masks)):
mask2 = obj_masks[j].float()
mask2_ = obj_masked_modified[j].float()
# case 1: if they don't intersect we don't touch them
if ((mask1 + mask2) == 2).sum() == 0:
continue
# case 2: the entire object 1 is inside of object 2, we say object 1 is in front of object 2:
elif (((mask1 + mask2) == 2).float() - mask1).sum() == 0:
mask2_ -= mask1_
# case 3: the entire object 2 is inside of object 1, we say object 2 is in front of object 1:
elif (((mask1 + mask2) == 2).float() - mask2).sum() == 0:
mask1_ -= mask2_
# case 4: use depth to check object order:
else:
# object 1 is closer
if (depth * mask1).sum() / mask1.sum() > (depth * mask2).sum() / mask2.sum():
mask2_ -= ((mask1 + mask2) == 2).float()
# object 2 is closer
if (depth * mask1).sum() / mask1.sum() < (depth * mask2).sum() / mask2.sum():
mask1_ -= ((mask1 + mask2) == 2).float()
final_mask = torch.zeros_like(depth, dtype=torch.int)
# instance_labels = {}
for i in range(len(obj_masked_modified)):
final_mask = final_mask.masked_fill(obj_masked_modified[i] > 0, obj_ids[i])
# instance_labels[i] = obj_ids[i].item()
return final_mask.cpu().numpy()
def unproject_pointcloud(depth, intrinsic_open3d, T_CW):
# depth, mask, intrinsic, extrinsic -> point clouds
pc_sample = open3d.geometry.PointCloud.create_from_depth_image(depth=open3d.geometry.Image(depth),
intrinsic=intrinsic_open3d,
extrinsic=T_CW,
depth_scale=1.0,
project_valid_depth_only=True)
return pc_sample
def check_inside_ratio(pc, bbox3D):
# pc, bbox3d -> inside ratio
indices = bbox3D.get_point_indices_within_bounding_box(pc.points)
assert len(pc.points) > 0
ratio = len(indices) / len(pc.points)
# print("ratio ", ratio)
return ratio, indices
def track_instance(masks, classes, depth, inst_list, sem_dict, intrinsic_open3d, T_CW, IoU_thresh=0.5, voxel_size=0.1,
min_pixels=2000, erode=True, clip_features=None, class_names=None):
device = masks.device
inst_data_dict = {}
inst_data_dict.update({0: torch.zeros(depth.shape, dtype=torch.int, device=device)})
inst_ids = []
bbox3d_scale = 1.0 # todo 1.0
min_extent = 0.05
depth = torch.from_numpy(depth).to(device)
for i in range(len(masks)):
inst_data = torch.zeros(depth.shape, dtype=torch.int, device=device)
smaller_mask = cv2.erode(masks[i].detach().cpu().numpy().astype(np.uint8), np.ones((5, 5)), iterations=3).astype(bool)
inst_depth_small = depth.detach().cpu().numpy()
inst_depth_small[~smaller_mask] = 0
inst_pc_small = unproject_pointcloud(inst_depth_small, intrinsic_open3d, T_CW)
diff_mask = None
if np.sum(smaller_mask) <= min_pixels: # too small 20 400 # todo use sem to set background
inst_data[masks[i]] = 0 # set to background
continue
inst_pc_voxel = inst_pc_small.voxel_down_sample(voxel_size)
if len(inst_pc_voxel.points) <= 10: # too small 20 400 # todo use sem to set background
inst_data[masks[i]] = 0 # set to background
continue
is_merged = False
inst_id = None
inst_mask = masks[i] #smaller_mask #masks[i] # todo only
inst_class = classes[i]
inst_depth = depth.detach().cpu().numpy()
inst_depth[~masks[i].detach().cpu().numpy()] = 0. # inst_mask
inst_pc = unproject_pointcloud(inst_depth, intrinsic_open3d, T_CW)
sem_inst_list = []
if clip_features is not None: # check similar sems based on clip feature distance
sem_thr = 200 #300. for table #320. # 260.
for sem_exist in sem_dict.keys():
if torch.abs(clip_features[class_names[inst_class]] - clip_features[class_names[sem_exist]]).sum() < sem_thr:
sem_inst_list.extend(sem_dict[sem_exist])
else: # no clip features, only do strictly sem check
if inst_class in sem_dict.keys():
sem_inst_list.extend(sem_dict[inst_class])
for candidate_inst in sem_inst_list:
# if True: # only consider 3D bbox, merge them if they are spatial together
IoU, indices = check_inside_ratio(inst_pc, candidate_inst.bbox3D)
candidate_inst.cmp_cnt += 1
if IoU > IoU_thresh:
# merge inst to candidate
is_merged = True
candidate_inst.merge_cnt += 1
candidate_inst.pc += inst_pc.select_by_index(indices)
# inst_uv = inst_pc.select_by_index(indices).project_to_depth_image(masks[i].shape[1], masks[i].shape[0], intrinsic_open3d, T_CW, depth_scale=1.0, depth_max=10.0)
# # inst_uv = torch.utils.dlpack.from_dlpack(uv_opencv.as_tensor().to_dlpack())
# valid_mask = inst_uv.squeeze() > 0. # shape --> H, W
# diff_mask = (inst_depth > 0.) & (~valid_mask)
diff_mask = torch.zeros_like(inst_mask)
uv_opencv, _ = cv2.projectPoints(np.array(inst_pc.select_by_index(indices).points), T_CW[:3, :3],
T_CW[:3, 3], intrinsic_open3d.intrinsic_matrix[:3,:3], None)
uv = np.round(uv_opencv).squeeze().astype(int)
u = uv[:, 0].reshape(-1, 1)
v = uv[:, 1].reshape(-1, 1)
vu = np.concatenate([v, u], axis=-1)
valid_mask = np.zeros(inst_mask.shape, dtype=np.bool)
valid_mask[tuple(vu.T)] = True
diff_mask[(inst_depth!=0) & (~valid_mask)] = True
# downsample pcs
candidate_inst.pc = candidate_inst.pc.voxel_down_sample(voxel_size)
# candidate_inst.pc.random_down_sample(np.minimum(500//len(candidate_inst.pc.points),1))
candidate_inst.bbox3D = open3d.geometry.OrientedBoundingBox.create_from_points(candidate_inst.pc.points)
# enlarge
candidate_inst.bbox3D.scale(bbox3d_scale, candidate_inst.bbox3D.get_center())
candidate_inst.bbox3D.extent = np.maximum(candidate_inst.bbox3D.extent, min_extent) # at least bigger than min_extent
inst_id = candidate_inst.inst_id
break
# if candidate_inst.cmp_cnt >= 20 and candidate_inst.merge_cnt <= 5:
# sem_inst_list.remove(candidate_inst)
if not is_merged:
# init new inst and new sem
new_inst = InstData()
new_inst.inst_id = len(inst_list) + 1
new_inst.class_id = inst_class
new_inst.pc = inst_pc_small
new_inst.pc = new_inst.pc.voxel_down_sample(voxel_size)
inst_bbox3D = open3d.geometry.OrientedBoundingBox.create_from_points(new_inst.pc.points)
# scale up
inst_bbox3D.scale(bbox3d_scale, inst_bbox3D.get_center())
inst_bbox3D.extent = np.maximum(inst_bbox3D.extent, min_extent)
new_inst.bbox3D = inst_bbox3D
inst_list.append(new_inst)
inst_id = new_inst.inst_id
# update sem_dict
if inst_class in sem_dict.keys():
sem_dict[inst_class].append(new_inst) # append new inst to exist sem
else:
sem_dict.update({inst_class: [new_inst]}) # init new sem
# update inst_data
inst_data[inst_mask] = inst_id
if diff_mask is not None:
inst_data[diff_mask] = -1 # unsure area
if inst_id not in inst_ids:
inst_data_dict.update({inst_id: inst_data})
else:
continue
# idx = inst_ids.index(inst_id)
# inst_data_list[idx] = inst_data_list[idx] & torch.from_numpy(inst_data) # merge them? todo
# return inst_data
mask_bg = torch.stack(list(inst_data_dict.values())).sum(0) != 0
inst_data_dict.update({0: mask_bg.int()})
return inst_data_dict
================================================
FILE: vis.py
================================================
import skimage.measure
import trimesh
import open3d as o3d
import numpy as np
def marching_cubes(occupancy, level=0.5):
try:
vertices, faces, vertex_normals, _ = skimage.measure.marching_cubes( #marching_cubes_lewiner( #marching_cubes(
occupancy, level=level, gradient_direction='ascent')
except (RuntimeError, ValueError):
return None
dim = occupancy.shape[0]
vertices = vertices / (dim - 1)
mesh = trimesh.Trimesh(vertices=vertices,
vertex_normals=vertex_normals,
faces=faces)
return mesh
def trimesh_to_open3d(src):
dst = o3d.geometry.TriangleMesh()
dst.vertices = o3d.utility.Vector3dVector(src.vertices)
dst.triangles = o3d.utility.Vector3iVector(src.faces)
vertex_colors = src.visual.vertex_colors[:, :3].astype(np.float) / 255.0
dst.vertex_colors = o3d.utility.Vector3dVector(vertex_colors)
dst.compute_vertex_normals()
return dst
================================================
FILE: vmap.py
================================================
import random
import numpy as np
import torch
from time import perf_counter_ns
from tqdm import tqdm
import trainer
import open3d
import trimesh
import scipy
from bidict import bidict
import copy
import os
import utils
class performance_measure:
def __init__(self, name) -> None:
self.name = name
def __enter__(self):
self.start_time = perf_counter_ns()
def __exit__(self, type, value, tb):
self.end_time = perf_counter_ns()
self.exec_time = self.end_time - self.start_time
print(f"{self.name} excution time: {(self.exec_time)/1000000:.2f} ms")
def origin_dirs_W(T_WC, dirs_C):
assert T_WC.shape[0] == dirs_C.shape[0]
assert T_WC.shape[1:] == (4, 4)
assert dirs_C.shape[2] == 3
dirs_W = (T_WC[:, None, :3, :3] @ dirs_C[..., None]).squeeze()
origins = T_WC[:, :3, -1]
return origins, dirs_W
# @torch.jit.script
def stratified_bins(min_depth, max_depth, n_bins, n_rays, type=torch.float32, device = "cuda:0"):
# type: (Tensor, Tensor, int, int) -> Tensor
bin_limits_scale = torch.linspace(0, 1, n_bins+1, dtype=type, device=device)
if not torch.is_tensor(min_depth):
min_depth = torch.ones(n_rays, dtype=type, device=device) * min_depth
if not torch.is_tensor(max_depth):
max_depth = torch.ones(n_rays, dtype=type, device=device) * max_depth
depth_range = max_depth - min_depth
lower_limits_scale = depth_range[..., None] * bin_limits_scale + min_depth[..., None]
lower_limits_scale = lower_limits_scale[:, :-1]
assert lower_limits_scale.shape == (n_rays, n_bins)
bin_length_scale = depth_range / n_bins
increments_scale = torch.rand(
n_rays, n_bins, device=device,
dtype=torch.float32) * bin_length_scale[..., None]
z_vals_scale = lower_limits_scale + increments_scale
assert z_vals_scale.shape == (n_rays, n_bins)
return z_vals_scale
# @torch.jit.script
def normal_bins_sampling(depth, n_bins, n_rays, delta, device = "cuda:0"):
# type: (Tensor, int, int, float) -> Tensor
# device = "cpu"
# bins = torch.normal(0.0, delta / 3., size=[n_rays, n_bins], devi
# self.keyframes_batch = torch.empty(self.n_keyframes,ce=device).sort().values
bins = torch.empty(n_rays, n_bins, dtype=torch.float32, device=device).normal_(mean=0.,std=delta / 3.).sort().values
bins = torch.clip(bins, -delta, delta)
z_vals = depth[:, None] + bins
assert z_vals.shape == (n_rays, n_bins)
return z_vals
class sceneObject:
"""
object instance mapping,
updating keyframes, get training samples, optimizing MLP map
"""
def __init__(self, cfg, obj_id, rgb:torch.tensor, depth:torch.tensor, mask:torch.tensor, bbox_2d:torch.tensor, t_wc:torch.tensor, live_frame_id) -> None:
self.do_bg = cfg.do_bg
self.obj_id = obj_id
self.data_device = cfg.data_device
self.training_device = cfg.training_device
assert rgb.shape[:2] == depth.shape
assert rgb.shape[:2] == mask.shape
assert bbox_2d.shape == (4,)
assert t_wc.shape == (4, 4,)
if self.do_bg and self.obj_id == 0: # do seperate bg
self.obj_scale = cfg.bg_scale
self.hidden_feature_size = cfg.hidden_feature_size_bg
self.n_bins_cam2surface = cfg.n_bins_cam2surface_bg
self.keyframe_step = cfg.keyframe_step_bg
else:
self.obj_scale = cfg.obj_scale
self.hidden_feature_size = cfg.hidden_feature_size
self.n_bins_cam2surface = cfg.n_bins_cam2surface
self.keyframe_step = cfg.keyframe_step
self.frames_width = rgb.shape[0]
self.frames_height = rgb.shape[1]
self.min_bound = cfg.min_depth
self.max_bound = cfg.max_depth
self.n_bins = cfg.n_bins
self.n_unidir_funcs = cfg.n_unidir_funcs
self.surface_eps = cfg.surface_eps
self.stop_eps = cfg.stop_eps
self.n_keyframes = 1 # Number of keyframes
self.kf_pointer = None
self.keyframe_buffer_size = cfg.keyframe_buffer_size
self.kf_id_dict = bidict({live_frame_id:0})
self.kf_buffer_full = False
self.frame_cnt = 0 # number of frames taken in
self.lastest_kf_queue = []
self.bbox = torch.empty( # obj bounding bounding box in the frame
self.keyframe_buffer_size,
4,
device=self.data_device) # [u low, u high, v low, v high]
self.bbox[0] = bbox_2d
# RGB + pixel state batch
self.rgb_idx = slice(0, 3)
self.state_idx = slice(3, 4)
self.rgbs_batch = torch.empty(self.keyframe_buffer_size,
self.frames_width,
self.frames_height,
4,
dtype=torch.uint8,
device=self.data_device)
# Pixel states:
self.other_obj = 0 # pixel doesn't belong to obj
self.this_obj = 1 # pixel belong to obj
self.unknown_obj = 2 # pixel state is unknown
# Initialize first frame rgb and pixel state
self.rgbs_batch[0, :, :, self.rgb_idx] = rgb
self.rgbs_batch[0, :, :, self.state_idx] = mask[..., None]
self.depth_batch = torch.empty(self.keyframe_buffer_size,
self.frames_width,
self.frames_height,
dtype=torch.float32,
device=self.data_device)
# Initialize first frame's depth
self.depth_batch[0] = depth
self.t_wc_batch = torch.empty(
self.keyframe_buffer_size, 4, 4,
dtype=torch.float32,
device=self.data_device) # world to camera transform
# Initialize first frame's world2cam transform
self.t_wc_batch[0] = t_wc
# neural field map
trainer_cfg = copy.deepcopy(cfg)
trainer_cfg.obj_id = self.obj_id
trainer_cfg.hidden_feature_size = self.hidden_feature_size
trainer_cfg.obj_scale = self.obj_scale
self.trainer = trainer.Trainer(trainer_cfg)
# 3D boundary
self.bbox3d = None
self.pc = []
# init obj local frame
# self.obj_center = self.init_obj_center(intrinsic, depth, mask, t_wc)
self.obj_center = torch.tensor(0.0) # shouldn't make any difference because of frequency embedding
def init_obj_center(self, intrinsic_open3d, depth, mask, t_wc):
obj_depth = depth.cpu().clone()
obj_depth[mask!=self.this_obj] = 0
T_CW = np.linalg.inv(t_wc.cpu().numpy())
pc_obj_init = open3d.geometry.PointCloud.create_from_depth_image(
depth=open3d.geometry.Image(np.asarray(obj_depth.permute(1,0).numpy(), order="C")),
intrinsic=intrinsic_open3d,
extrinsic=T_CW,
depth_trunc=self.max_bound,
depth_scale=1.0)
obj_center = torch.from_numpy(np.mean(pc_obj_init.points, axis=0)).float()
return obj_center
# @profile
def append_keyframe(self, rgb:torch.tensor, depth:torch.tensor, mask:torch.tensor, bbox_2d:torch.tensor, t_wc:torch.tensor, frame_id:np.uint8=1):
assert rgb.shape[:2] == depth.shape
assert rgb.shape[:2] == mask.shape
assert bbox_2d.shape == (4,)
assert t_wc.shape == (4, 4,)
assert self.n_keyframes <= self.keyframe_buffer_size - 1
assert rgb.dtype == torch.uint8
assert mask.dtype == torch.uint8
assert depth.dtype == torch.float32
# every kf_step choose one kf
is_kf = (self.frame_cnt % self.keyframe_step == 0) or self.n_keyframes == 1
# print("---------------------")
# print("self.kf_id_dict ", self.kf_id_dict)
# print("live frame id ", frame_id)
# print("n_frames ", self.n_keyframes)
if self.n_keyframes == self.keyframe_buffer_size - 1: # kf buffer full, need to prune
self.kf_buffer_full = True
if self.kf_pointer is None:
self.kf_pointer = self.n_keyframes
self.rgbs_batch[self.kf_pointer, :, :, self.rgb_idx] = rgb
self.rgbs_batch[self.kf_pointer, :, :, self.state_idx] = mask[..., None]
self.depth_batch[self.kf_pointer, ...] = depth
self.t_wc_batch[self.kf_pointer, ...] = t_wc
self.bbox[self.kf_pointer, ...] = bbox_2d
self.kf_id_dict.inv[self.kf_pointer] = frame_id
if is_kf:
self.lastest_kf_queue.append(self.kf_pointer)
pruned_frame_id, pruned_kf_id = self.prune_keyframe()
self.kf_pointer = pruned_kf_id
print("pruned kf id ", self.kf_pointer)
else:
if not is_kf: # not kf, replace
self.rgbs_batch[self.n_keyframes-1, :, :, self.rgb_idx] = rgb
self.rgbs_batch[self.n_keyframes-1, :, :, self.state_idx] = mask[..., None]
self.depth_batch[self.n_keyframes-1, ...] = depth
self.t_wc_batch[self.n_keyframes-1, ...] = t_wc
self.bbox[self.n_keyframes-1, ...] = bbox_2d
self.kf_id_dict.inv[self.n_keyframes-1] = frame_id
else: # is kf, add new kf
self.kf_id_dict[frame_id] = self.n_keyframes
self.rgbs_batch[self.n_keyframes, :, :, self.rgb_idx] = rgb
self.rgbs_batch[self.n_keyframes, :, :, self.state_idx] = mask[..., None]
self.depth_batch[self.n_keyframes, ...] = depth
self.t_wc_batch[self.n_keyframes, ...] = t_wc
self.bbox[self.n_keyframes, ...] = bbox_2d
self.lastest_kf_queue.append(self.n_keyframes)
self.n_keyframes += 1
# print("self.kf_id_dic ", self.kf_id_dict)
self.frame_cnt += 1
if len(self.lastest_kf_queue) > 2: # keep latest two frames
self.lastest_kf_queue = self.lastest_kf_queue[-2:]
def prune_keyframe(self):
# simple strategy to prune, randomly choose
key, value = random.choice(list(self.kf_id_dict.items())[:-2]) # do not prune latest two frames
return key, value
def get_bound(self, intrinsic_open3d):
# get 3D boundary from posed depth img todo update sparse pc when append frame
pcs = open3d.geometry.PointCloud()
for kf_id in range(self.n_keyframes):
mask = self.rgbs_batch[kf_id, :, :, self.state_idx].squeeze() == self.this_obj
depth = self.depth_batch[kf_id].cpu().clone()
twc = self.t_wc_batch[kf_id].cpu().numpy()
depth[~mask] = 0
depth = depth.permute(1,0).numpy().astype(np.float32)
T_CW = np.linalg.inv(twc)
pc = open3d.geometry.PointCloud.create_from_depth_image(depth=open3d.geometry.Image(np.asarray(depth, order="C")), intrinsic=intrinsic_open3d, extrinsic=T_CW)
# self.pc += pc
pcs += pc
# # get minimal oriented 3d bbox
# try:
# bbox3d = open3d.geometry.OrientedBoundingBox.create_from_points(pcs.points)
# except RuntimeError:
# print("too few pcs obj ")
# return None
# trimesh has a better minimal bbox implementation than open3d
try:
transform, extents = trimesh.bounds.oriented_bounds(np.array(pcs.points)) # pc
transform = np.linalg.inv(transform)
except scipy.spatial._qhull.QhullError:
print("too few pcs obj ")
return None
for i in range(extents.shape[0]):
extents[i] = np.maximum(extents[i], 0.10) # at least rendering 10cm
bbox = utils.BoundingBox()
bbox.center = transform[:3, 3]
bbox.R = transform[:3, :3]
bbox.extent = extents
bbox3d = open3d.geometry.OrientedBoundingBox(bbox.center, bbox.R, bbox.extent)
min_extent = 0.05
bbox3d.extent = np.maximum(min_extent, bbox3d.extent)
bbox3d.color = (255,0,0)
self.bbox3d = utils.bbox_open3d2bbox(bbox_o3d=bbox3d)
# self.pc = []
print("obj ", self.obj_id)
print("bound ", bbox3d)
print("kf id dict ", self.kf_id_dict)
# open3d.visualization.draw_geometries([bbox3d, pcs])
return bbox3d
def get_training_samples(self, n_frames, n_samples, cached_rays_dir):
# Sample pixels
if self.n_keyframes > 2: # make sure latest 2 frames are sampled todo if kf pruned, this is not the latest frame
keyframe_ids = torch.randint(low=0,
high=self.n_keyframes,
size=(n_frames - 2,),
dtype=torch.long,
device=self.data_device)
# if self.kf_buffer_full:
# latest_frame_ids = list(self.kf_id_dict.values())[-2:]
latest_frame_ids = self.lastest_kf_queue[-2:]
keyframe_ids = torch.cat([keyframe_ids,
torch.tensor(latest_frame_ids, device=keyframe_ids.device)])
# print("latest_frame_ids", latest_frame_ids)
# else: # sample last 2 frames
# keyframe_ids = torch.cat([keyframe_ids,
# torch.tensor([self.n_keyframes-2, self.n_keyframes-1], device=keyframe_ids.device)])
else:
keyframe_ids = torch.randint(low=0,
high=self.n_keyframes,
size=(n_frames,),
dtype=torch.long,
device=self.data_device)
keyframe_ids = torch.unsqueeze(keyframe_ids, dim=-1)
idx_w = torch.rand(n_frames, n_samples, device=self.data_device)
idx_h = torch.rand(n_frames, n_samples, device=self.data_device)
# resizing idx_w and idx_h to be in the bbox range
idx_w = idx_w * (self.bbox[keyframe_ids, 1] - self.bbox[keyframe_ids, 0]) + self.bbox[keyframe_ids, 0]
idx_h = idx_h * (self.bbox[keyframe_ids, 3] - self.bbox[keyframe_ids, 2]) + self.bbox[keyframe_ids, 2]
idx_w = idx_w.long()
idx_h = idx_h.long()
sampled_rgbs = self.rgbs_batch[keyframe_ids, idx_w, idx_h]
sampled_depth = self.depth_batch[keyframe_ids, idx_w, idx_h]
# Get ray directions for sampled pixels
sampled_ray_dirs = cached_rays_dir[idx_w, idx_h]
# Get sampled keyframe poses
sampled_twc = self.t_wc_batch[keyframe_ids[:, 0], :, :]
origins, dirs_w = origin_dirs_W(sampled_twc, sampled_ray_dirs)
return self.sample_3d_points(sampled_rgbs, sampled_depth, origins, dirs_w)
def sample_3d_points(self, sampled_rgbs, sampled_depth, origins, dirs_w):
"""
3D sampling strategy
* For pixels with invalid depth:
- N+M from minimum bound to max (stratified)
* For pixels with valid depth:
# Pixel belongs to this object
- N from cam to surface (stratified)
- M around surface (stratified/normal)
# Pixel belongs that don't belong to this object
- N from cam to surface (stratified)
- M around surface (stratified)
# Pixel with unknown state
- Do nothing!
"""
n_bins_cam2surface = self.n_bins_cam2surface
n_bins = self.n_bins
eps = self.surface_eps
other_objs_max_eps = self.stop_eps #0.05 # todo 0.02
# print("max depth ", torch.max(sampled_depth))
sampled_z = torch.zeros(
sampled_rgbs.shape[0] * sampled_rgbs.shape[1],
n_bins_cam2surface + n_bins,
dtype=self.depth_batch.dtype,
device=self.data_device) # shape (N*n_rays, n_bins_cam2surface + n_bins)
invalid_depth_mask = (sampled_depth <= self.min_bound).view(-1)
# max_bound = self.max_bound
max_bound = torch.max(sampled_depth)
# sampling for points with invalid depth
invalid_depth_count = invalid_depth_mask.count_nonzero()
if invalid_depth_count:
sampled_z[invalid_depth_mask, :] = stratified_bins(
self.min_bound, max_bound,
n_bins_cam2surface + n_bins, invalid_depth_count,
device=self.data_device)
# sampling for valid depth rays
valid_depth_mask = ~invalid_depth_mask
valid_depth_count = valid_depth_mask.count_nonzero()
if valid_depth_count:
# Sample between min bound and depth for all pixels with valid depth
sampled_z[valid_depth_mask, :n_bins_cam2surface] = stratified_bins(
self.min_bound, sampled_depth.view(-1)[valid_depth_mask]-eps,
n_bins_cam2surface, valid_depth_count, device=self.data_device)
# sampling around depth for this object
obj_mask = (sampled_rgbs[..., -1] == self.this_obj).view(-1) & valid_depth_mask # todo obj_mask
assert sampled_z.shape[0] == obj_mask.shape[0]
obj_count = obj_mask.count_nonzero()
if obj_count:
sampling_method = "normal" # stratified or normal
if sampling_method == "stratified":
sampled_z[obj_mask, n_bins_cam2surface:] = stratified_bins(
sampled_depth.view(-1)[obj_mask] - eps, sampled_depth.view(-1)[obj_mask] + eps,
n_bins, obj_count, device=self.data_device)
elif sampling_method == "normal":
sampled_z[obj_mask, n_bins_cam2surface:] = normal_bins_sampling(
sampled_depth.view(-1)[obj_mask],
n_bins,
obj_count,
delta=eps,
device=self.data_device)
else:
raise (
f"sampling method not implemented {sampling_method}, \
stratified and normal sampling only currenty implemented."
)
# sampling around depth of other objects
other_obj_mask = (sampled_rgbs[..., -1] != self.this_obj).view(-1) & valid_depth_mask
other_objs_count = other_obj_mask.count_nonzero()
if other_objs_count:
sampled_z[other_obj_mask, n_bins_cam2surface:] = stratified_bins(
sampled_depth.view(-1)[other_obj_mask] - eps,
sampled_depth.view(-1)[other_obj_mask] + other_objs_max_eps,
n_bins, other_objs_count, device=self.data_device)
sampled_z = sampled_z.view(sampled_rgbs.shape[0],
sampled_rgbs.shape[1],
-1) # view as (n_rays, n_samples, 10)
input_pcs = origins[..., None, None, :] + (dirs_w[:, :, None, :] *
sampled_z[..., None])
input_pcs -= self.obj_center
obj_labels = sampled_rgbs[..., -1].view(-1)
return sampled_rgbs[..., :3], sampled_depth, valid_depth_mask, obj_labels, input_pcs, sampled_z
def save_checkpoints(self, path, epoch):
obj_id = self.obj_id
chechpoint_load_file = (path + "/obj_" + str(obj_id) + "_frame_" + str(epoch) + ".pth")
torch.save(
{
"epoch": epoch,
"FC_state_dict": self.trainer.fc_occ_map.state_dict(),
"PE_state_dict": self.trainer.pe.state_dict(),
"obj_id": self.obj_id,
"bbox": self.bbox3d,
"obj_scale": self.trainer.obj_scale
},
chechpoint_load_file,
)
# optimiser?
def load_checkpoints(self, ckpt_file):
checkpoint_load_file = (ckpt_file)
if not os.path.exists(checkpoint_load_file):
print("ckpt not exist ", checkpoint_load_file)
return
checkpoint = torch.load(checkpoint_load_file)
self.trainer.fc_occ_map.load_state_dict(checkpoint["FC_state_dict"])
self.trainer.pe.load_state_dict(checkpoint["PE_state_dict"])
self.obj_id = checkpoint["obj_id"]
self.bbox3d = checkpoint["bbox"]
self.trainer.obj_scale = checkpoint["obj_scale"]
self.trainer.fc_occ_map.to(self.training_device)
self.trainer.pe.to(self.training_device)
class cameraInfo:
def __init__(self, cfg) -> None:
self.device = cfg.data_device
self.width = cfg.W # Frame width
self.height = cfg.H # Frame height
self.fx = cfg.fx
self.fy = cfg.fy
self.cx = cfg.cx
self.cy = cfg.cy
self.rays_dir_cache = self.get_rays_dirs()
def get_rays_dirs(self, depth_type="z"):
idx_w = torch.arange(end=self.width, device=self.device)
idx_h = torch.arange(end=self.height, device=self.device)
dirs = torch.ones((self.width, self.height, 3), device=self.device)
dirs[:, :, 0] = ((idx_w - self.cx) / self.fx)[:, None]
dirs[:, :, 1] = ((idx_h - self.cy) / self.fy)
if depth_type == "euclidean":
raise Exception(
"Get camera rays directions with euclidean depth not yet implemented"
)
norm = torch.norm(dirs, dim=-1)
dirs = dirs * (1. / norm)[:, :, :, None]
return dirs
gitextract_se2unowv/ ├── .gitignore ├── LICENSE ├── README.md ├── cfg.py ├── configs/ │ ├── Replica/ │ │ ├── config_replica_room0_iMAP.json │ │ └── config_replica_room0_vMAP.json │ └── ScanNet/ │ ├── config_scannet0000_iMAP.json │ ├── config_scannet0000_vMAP.json │ ├── config_scannet0024_iMAP.json │ └── config_scannet0024_vMAP.json ├── data_generation/ │ ├── README.md │ ├── extract_inst_obj.py │ ├── habitat_renderer.py │ ├── replica_render_config_vMAP.yaml │ ├── settings.py │ └── transformation.py ├── dataset.py ├── embedding.py ├── environment.yml ├── image_transforms.py ├── loss.py ├── metric/ │ ├── eval_3D_obj.py │ ├── eval_3D_scene.py │ └── metrics.py ├── model.py ├── render_rays.py ├── train.py ├── trainer.py ├── utils.py ├── vis.py └── vmap.py
SYMBOL INDEX (105 symbols across 17 files)
FILE: cfg.py
class Config (line 6) | class Config:
method __init__ (line 7) | def __init__(self, config_file):
FILE: data_generation/habitat_renderer.py
function init_habitat (line 27) | def init_habitat(config) :
function save_renders (line 117) | def save_renders(save_path, observation, enable_semantic, suffix=""):
function render (line 143) | def render(sim, config):
function set_agent_position (line 189) | def set_agent_position(sim, pose):
function main (line 202) | def main():
FILE: data_generation/settings.py
function make_cfg (line 48) | def make_cfg(settings):
FILE: data_generation/transformation.py
function habitat_world_transformations (line 5) | def habitat_world_transformations():
function opencv_to_opengl_camera (line 18) | def opencv_to_opengl_camera(transform=None):
function opengl_to_opencv_camera (line 25) | def opengl_to_opencv_camera(transform=None):
function Twc_to_Thc (line 32) | def Twc_to_Thc(T_wc): # opencv-camera to world transformation ---> habi...
function Thc_to_Twc (line 38) | def Thc_to_Twc(T_hc): # habitat-caemra to habitat world transformation ...
function combine_pose (line 44) | def combine_pose(t: np.array, q: quaternion.quaternion) -> np.array:
FILE: dataset.py
function next_live_data (line 14) | def next_live_data(track_to_map_IDT, inited):
function init_loader (line 39) | def init_loader(cfg, multi_worker=True):
class Replica (line 63) | class Replica(Dataset):
method __init__ (line 64) | def __init__(self, cfg):
method __len__ (line 78) | def __len__(self):
method __getitem__ (line 81) | def __getitem__(self, idx):
class ScanNet (line 150) | class ScanNet(Dataset):
method __init__ (line 151) | def __init__(self, cfg):
method load_poses (line 191) | def load_poses(self, path):
method __len__ (line 205) | def __len__(self):
method __getitem__ (line 208) | def __getitem__(self, index):
FILE: embedding.py
function positional_encoding (line 4) | def positional_encoding(
class UniDirsEmbed (line 43) | class UniDirsEmbed(torch.nn.Module):
method __init__ (line 44) | def __init__(self, min_deg=0, max_deg=2, scale=2.):
method forward (line 82) | def forward(self, x):
FILE: image_transforms.py
class BGRtoRGB (line 5) | class BGRtoRGB(object):
method __call__ (line 8) | def __call__(self, image):
class DepthScale (line 13) | class DepthScale(object):
method __init__ (line 16) | def __init__(self, scale):
method __call__ (line 19) | def __call__(self, depth):
class DepthFilter (line 24) | class DepthFilter(object):
method __init__ (line 27) | def __init__(self, max_depth):
method __call__ (line 30) | def __call__(self, depth):
class Undistort (line 36) | class Undistort(object):
method __init__ (line 39) | def __init__(self,
method __call__ (line 58) | def __call__(self, im):
FILE: loss.py
function step_batch_loss (line 5) | def step_batch_loss(alpha, color, gt_depth, gt_color, sem_labels, mask_d...
FILE: metric/eval_3D_obj.py
function calc_3d_metric (line 8) | def calc_3d_metric(mesh_rec, mesh_gt, N=200000):
function get_gt_bg_mesh (line 43) | def get_gt_bg_mesh(gt_dir, background_cls_list):
function get_obj_ids (line 57) | def get_obj_ids(obj_dir):
FILE: metric/eval_3D_scene.py
function calc_3d_metric (line 7) | def calc_3d_metric(mesh_rec, mesh_gt, N=200000):
FILE: metric/metrics.py
function completion_ratio (line 4) | def completion_ratio(gt_points, rec_points, dist_th=0.01):
function accuracy (line 11) | def accuracy(gt_points, rec_points):
function completion (line 18) | def completion(gt_points, rec_points):
function chamfer (line 25) | def chamfer(gt_points, rec_points):
FILE: model.py
function init_weights (line 4) | def init_weights(m, init_fn=torch.nn.init.xavier_normal_):
function fc_block (line 9) | def fc_block(in_f, out_f):
class OccupancyMap (line 16) | class OccupancyMap(torch.nn.Module):
method __init__ (line 17) | def __init__(
method forward (line 54) | def forward(self, x,
FILE: render_rays.py
function occupancy_activation (line 4) | def occupancy_activation(alpha, distances=None):
function alpha_to_occupancy (line 10) | def alpha_to_occupancy(depths, dirs, alpha, add_last=False):
function occupancy_to_termination (line 26) | def occupancy_to_termination(occupancy, is_batch=False):
function render (line 47) | def render(termination, vals, dim=-1):
function render_loss (line 53) | def render_loss(render, gt, loss="L1", normalise=False):
function reduce_batch_loss (line 67) | def reduce_batch_loss(loss_mat, var=None, avg=True, mask=None, loss_type...
function make_3D_grid (line 98) | def make_3D_grid(occ_range=[-1., 1.], dim=256, device="cuda:0", transfor...
FILE: trainer.py
class Trainer (line 9) | class Trainer:
method __init__ (line 10) | def __init__(self, cfg):
method load_network (line 26) | def load_network(self):
method meshing (line 35) | def meshing(self, bound, obj_center, grid_dim=256):
method eval_points (line 77) | def eval_points(self, points, chunk_size=100000):
FILE: utils.py
class BoundingBox (line 11) | class BoundingBox():
method __init__ (line 12) | def __init__(self):
function bbox_open3d2bbox (line 19) | def bbox_open3d2bbox(bbox_o3d):
function bbox_bbox2open3d (line 26) | def bbox_bbox2open3d(bbox):
function update_vmap (line 30) | def update_vmap(models, optimiser):
function enlarge_bbox (line 36) | def enlarge_bbox(bbox, scale, w, h):
function get_bbox2d (line 59) | def get_bbox2d(obj_mask, bbox_scale=1.0):
function get_bbox2d_batch (line 75) | def get_bbox2d_batch(img):
function get_latest_queue (line 86) | def get_latest_queue(q):
class InstData (line 101) | class InstData:
method __init__ (line 102) | def __init__(self):
function box_filter (line 112) | def box_filter(masks, classes, depth, inst_dict, intrinsic_open3d, T_CW,...
function load_matrix_from_txt (line 210) | def load_matrix_from_txt(path, shape=(4, 4)):
function check_mask_order (line 217) | def check_mask_order(obj_masks, depth_np, obj_ids):
function unproject_pointcloud (line 257) | def unproject_pointcloud(depth, intrinsic_open3d, T_CW):
function check_inside_ratio (line 266) | def check_inside_ratio(pc, bbox3D):
function track_instance (line 274) | def track_instance(masks, classes, depth, inst_list, sem_dict, intrinsic...
FILE: vis.py
function marching_cubes (line 6) | def marching_cubes(occupancy, level=0.5):
function trimesh_to_open3d (line 21) | def trimesh_to_open3d(src):
FILE: vmap.py
class performance_measure (line 17) | class performance_measure:
method __init__ (line 19) | def __init__(self, name) -> None:
method __enter__ (line 22) | def __enter__(self):
method __exit__ (line 25) | def __exit__(self, type, value, tb):
function origin_dirs_W (line 31) | def origin_dirs_W(T_WC, dirs_C):
function stratified_bins (line 45) | def stratified_bins(min_depth, max_depth, n_bins, n_rays, type=torch.flo...
function normal_bins_sampling (line 75) | def normal_bins_sampling(depth, n_bins, n_rays, delta, device = "cuda:0"):
class sceneObject (line 90) | class sceneObject:
method __init__ (line 96) | def __init__(self, cfg, obj_id, rgb:torch.tensor, depth:torch.tensor, ...
method init_obj_center (line 194) | def init_obj_center(self, intrinsic_open3d, depth, mask, t_wc):
method append_keyframe (line 208) | def append_keyframe(self, rgb:torch.tensor, depth:torch.tensor, mask:t...
method prune_keyframe (line 265) | def prune_keyframe(self):
method get_bound (line 270) | def get_bound(self, intrinsic_open3d):
method get_training_samples (line 319) | def get_training_samples(self, n_frames, n_samples, cached_rays_dir):
method sample_3d_points (line 366) | def sample_3d_points(self, sampled_rgbs, sampled_depth, origins, dirs_w):
method save_checkpoints (line 461) | def save_checkpoints(self, path, epoch):
method load_checkpoints (line 478) | def load_checkpoints(self, ckpt_file):
class cameraInfo (line 494) | class cameraInfo:
method __init__ (line 496) | def __init__(self, cfg) -> None:
method get_rays_dirs (line 508) | def get_rays_dirs(self, depth_type="z"):
Condensed preview — 31 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (170K chars).
[
{
"path": ".gitignore",
"chars": 1915,
"preview": "/.anaconda3/\n\n# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribu"
},
{
"path": "LICENSE",
"chars": 11802,
"preview": "vMAP SOFTWARE\n\nLICENCE AGREEMENT\n\nWE (Imperial College of Science, Technology and Medicine, (“Imperial College\nLondon”))"
},
{
"path": "README.md",
"chars": 5085,
"preview": "[comment]: <> (# vMAP: Vectorised Object Mapping for Neural Field SLAM)\n\n<!-- PROJECT LOGO -->\n\n<p align=\"center\">\n\n <h"
},
{
"path": "cfg.py",
"chars": 4274,
"preview": "import json\nimport numpy as np\nimport os\nimport utils\n\nclass Config:\n def __init__(self, config_file):\n # sett"
},
{
"path": "configs/Replica/config_replica_room0_iMAP.json",
"chars": 1690,
"preview": "{\n \"dataset\": {\n \"live\": 0,\n \"path\": \"/home/xin/data/Replica/vmap/room_0/imap/00\",\n \"format\": \"R"
},
{
"path": "configs/Replica/config_replica_room0_vMAP.json",
"chars": 1690,
"preview": "{\n \"dataset\": {\n \"live\": 0,\n \"path\": \"/home/xin/data/Replica/vmap/room_0/imap/00\",\n \"format\": \"R"
},
{
"path": "configs/ScanNet/config_scannet0000_iMAP.json",
"chars": 1610,
"preview": "{\n \"dataset\": {\n \"live\": 0,\n \"path\": \"/home/xin/data/ScanNet/NICESLAM/scene0000_00\",\n \"format\": "
},
{
"path": "configs/ScanNet/config_scannet0000_vMAP.json",
"chars": 1615,
"preview": "{\n \"dataset\": {\n \"live\": 0,\n \"path\": \"/home/xin/data/ScanNet/NICESLAM/scene0000_00\",\n \"format\": "
},
{
"path": "configs/ScanNet/config_scannet0024_iMAP.json",
"chars": 1609,
"preview": "{\n \"dataset\": {\n \"live\": 0,\n \"path\": \"/home/xin/data/ScanNet/obj-imap/scene0024_00\",\n \"format\": "
},
{
"path": "configs/ScanNet/config_scannet0024_vMAP.json",
"chars": 1615,
"preview": "{\n \"dataset\": {\n \"live\": 0,\n \"path\": \"/home/xin/data/ScanNet/obj-imap/scene0024_00\",\n \"format\": "
},
{
"path": "data_generation/README.md",
"chars": 1551,
"preview": "## Replica Data Generation\n\n### Download Replica Dataset\nDownload 3D models and info files from [Replica](https://github"
},
{
"path": "data_generation/extract_inst_obj.py",
"chars": 1576,
"preview": "# reference https://github.com/facebookresearch/Replica-Dataset/issues/17#issuecomment-538757418\n\nfrom plyfile import *\n"
},
{
"path": "data_generation/habitat_renderer.py",
"chars": 9909,
"preview": "#!/usr/bin/env python3\nimport os, sys, argparse\nos.environ[\"CUDA_DEVICE_ORDER\"] = \"PCI_BUS_ID\"\nos.environ[\"CUDA_VISIBLE_"
},
{
"path": "data_generation/replica_render_config_vMAP.yaml",
"chars": 847,
"preview": "# Agent settings\ndefault_agent: 0\ngpu_id: 0\nwidth: 1200 #1280\nheight: 680 #960\nsensor_height: 0\n\ncolor_sensor: true # R"
},
{
"path": "data_generation/settings.py",
"chars": 9849,
"preview": "# Copyright (c) Facebook, Inc. and its affiliates.\n# This source code is licensed under the MIT license found in the\n# L"
},
{
"path": "data_generation/transformation.py",
"chars": 1620,
"preview": "import numpy as np\nimport quaternion\nimport trimesh\n\ndef habitat_world_transformations():\n import habitat_sim\n # T"
},
{
"path": "dataset.py",
"chars": 12860,
"preview": "import imgviz\nfrom torch.utils.data import Dataset, DataLoader\nimport torch\nimport numpy as np\nimport cv2\nimport os\nfrom"
},
{
"path": "embedding.py",
"chars": 3194,
"preview": "import torch\nimport numpy as np\n\ndef positional_encoding(\n tensor,\n B_layer=None,\n num_encoding_functions=6,\n "
},
{
"path": "environment.yml",
"chars": 2950,
"preview": "name: vmap\nchannels:\n - pytorch\n - defaults\ndependencies:\n - _libgcc_mutex=0.1=main\n - _openmp_mutex=5.1=1_gnu\n - b"
},
{
"path": "image_transforms.py",
"chars": 1400,
"preview": "import cv2\nimport numpy as np\n\n\nclass BGRtoRGB(object):\n \"\"\"bgr format to rgb\"\"\"\n\n def __call__(self, image):\n "
},
{
"path": "loss.py",
"chars": 3112,
"preview": "import torch\nimport render_rays\nimport torch.nn.functional as F\n\ndef step_batch_loss(alpha, color, gt_depth, gt_color, s"
},
{
"path": "metric/eval_3D_obj.py",
"chars": 4897,
"preview": "import numpy as np\nfrom tqdm import tqdm\nimport trimesh\nfrom metrics import accuracy, completion, completion_ratio\nimpor"
},
{
"path": "metric/eval_3D_scene.py",
"chars": 3807,
"preview": "import numpy as np\nfrom tqdm import tqdm\nimport trimesh\nfrom metrics import accuracy, completion, completion_ratio\nimpor"
},
{
"path": "metric/metrics.py",
"chars": 1253,
"preview": "import numpy as np\nfrom scipy.spatial import cKDTree as KDTree\n\ndef completion_ratio(gt_points, rec_points, dist_th=0.01"
},
{
"path": "model.py",
"chars": 2728,
"preview": "import torch\nimport torch.nn as nn\n\ndef init_weights(m, init_fn=torch.nn.init.xavier_normal_):\n if type(m) == torch.n"
},
{
"path": "render_rays.py",
"chars": 4028,
"preview": "import torch\nimport numpy as np\n\ndef occupancy_activation(alpha, distances=None):\n # occ = 1.0 - torch.exp(-alpha * d"
},
{
"path": "train.py",
"chars": 20298,
"preview": "import time\nimport loss\nfrom vmap import *\nimport utils\nimport open3d\nimport dataset\nimport vis\nfrom functorch import vm"
},
{
"path": "trainer.py",
"chars": 3597,
"preview": "import torch\nimport model\nimport embedding\nimport render_rays\nimport numpy as np\nimport vis\nfrom tqdm import tqdm\n\nclass"
},
{
"path": "utils.py",
"chars": 17361,
"preview": "import cv2\nimport imgviz\nimport numpy as np\nimport torch\nfrom functorch import combine_state_for_ensemble\nimport open3d\n"
},
{
"path": "vis.py",
"chars": 981,
"preview": "import skimage.measure\nimport trimesh\nimport open3d as o3d\nimport numpy as np\n\ndef marching_cubes(occupancy, level=0.5):"
},
{
"path": "vmap.py",
"chars": 21684,
"preview": "import random\nimport numpy as np\nimport torch\nfrom time import perf_counter_ns\nfrom tqdm import tqdm\nimport trainer\nimpo"
}
]
About this extraction
This page contains the full source code of the kxhit/vMAP GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 31 files (158.6 KB), approximately 42.2k tokens, and a symbol index with 105 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.