Showing preview only (2,568K chars total). Download the full file or copy to clipboard to get everything.
Repository: MrNeRF/awesome-3D-gaussian-splatting
Branch: main
Commit: c692febe4ec8
Files: 40
Total size: 2.4 MB
Directory structure:
gitextract__e_uw1ef/
├── .gitattributes
├── .github/
│ ├── CODEOWNERS
│ ├── FUNDING.yml
│ └── workflows/
│ ├── generate-html.yml
│ └── validate-pr.yml
├── .gitignore
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── awesome_3dgs_papers.yaml
├── editor.py
├── index.html
├── requirements.txt
└── src/
├── __init__.py
├── arxiv_integration.py
├── components/
│ ├── __init__.py
│ ├── dialogs.py
│ ├── thumbnail.py
│ └── widgets.py
├── fix_date.py
├── generate.py
├── helper.py
├── paper_generator.py
├── paper_schema.py
├── static/
│ ├── css/
│ │ ├── base.css
│ │ ├── components.css
│ │ └── responsive.css
│ └── js/
│ ├── filters.js
│ ├── main.js
│ ├── navigation.js
│ ├── selection.js
│ ├── sharing.js
│ ├── state.js
│ └── utils.js
├── template_engine.py
├── templates/
│ ├── index.html
│ └── paper_card.html
├── utils.py
├── validate_yaml.py
└── yaml_editor.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitattributes
================================================
assets filter=lfs diff=lfs merge=lfs -text
================================================
FILE: .github/CODEOWNERS
================================================
# Require review for all changes
* @MrNeRF
================================================
FILE: .github/FUNDING.yml
================================================
================================================
FILE: .github/workflows/generate-html.yml
================================================
name: Generate HTML
on:
pull_request:
types: [closed]
push:
branches: [main]
paths:
- 'awesome_3dgs_papers.yaml'
jobs:
build:
if: github.event.pull_request.merged == true || github.event_name == 'push'
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
ref: main
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Setup Git
run: |
git config --global user.name 'github-actions[bot]'
git config --global user.email '41898282+github-actions[bot]@users.noreply.github.com'
- name: Update HTML
run: |
git fetch origin update-html || true
git checkout -B update-html
python src/generate.py awesome_3dgs_papers.yaml index.html
git add index.html
if ! git diff --staged --quiet; then
git commit -m "Update generated HTML"
git push --force origin update-html
fi
- name: Create Pull Request
uses: actions/github-script@v6
with:
script: |
const { data: pulls } = await github.rest.pulls.list({
owner: context.repo.owner,
repo: context.repo.repo,
head: `${context.repo.owner}:update-html`,
base: 'main',
state: 'open'
});
if (pulls.length === 0) {
await github.rest.pulls.create({
owner: context.repo.owner,
repo: context.repo.repo,
head: 'update-html',
base: 'main',
title: 'Update generated HTML',
body: 'Auto-generated PR to update the HTML based on recent YAML changes.'
});
}
================================================
FILE: .github/workflows/validate-pr.yml
================================================
name: Validate PR Changes
on:
pull_request:
branches: [ main ]
paths:
- 'awesome_3dgs_papers.yaml'
jobs:
validate:
runs-on: ubuntu-latest
permissions:
pull-requests: read
contents: read
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Validate Changed YAML entries
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
PR_NUMBER: ${{ github.event.pull_request.number }}
REPO: ${{ github.repository }}
run: |
python src/validate_yaml.py
================================================
FILE: .gitignore
================================================
.venv
__pycache__/
*.pyc
================================================
FILE: CONTRIBUTING.md
================================================
# Contributing Guide
Thank you for your interest in contributing to the Awesome 3D Gaussian Splatting repository! This document will guide you through the contribution process.
## Adding Papers
We use a custom YAML editor to maintain the paper database. To add or edit papers:
1. Clone the repository:
```bash
git clone https://github.com/MrNeRF/awesome-3D-gaussian-splatting.git
cd awesome-3D-gaussian-splatting
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Install Poppler (required for PDF processing):
- **Ubuntu/Debian:**
```bash
sudo apt-get install poppler-utils
```
- **macOS:**
```bash
brew install poppler
```
- **Windows:**
- Download and install from: https://github.com/oschwartz10612/poppler-windows/releases/
- Add the `bin` directory to your system PATH
4. Run the YAML editor:
```bash
python src/yaml_editor.py
```
5. Use the editor to:
- Add new papers using the "Add from arXiv" button
- Edit existing entries
- Add tags, links, and other metadata
- Preview thumbnails
6. The editor will automatically save changes to `awesome_3dgs_papers.yaml`
## Adding Other Resources
For adding other resources (implementations, tools, tutorials, etc.):
1. Fork the repository
2. Create a new branch (`git checkout -b feature/new-resource`)
3. Edit the README.md file
4. Commit your changes (`git commit -m 'Add new resource'`)
5. Push to your fork (`git push origin feature/new-resource`)
6. Open a Pull Request
Please ensure your additions:
- Are related to 3D Gaussian Splatting
- Have working links
- Are placed in the appropriate section
- Follow the existing formatting
---
By contributing to this repository, you agree to abide by its terms and conditions.
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2023 janusch
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
# Awesome 3D Gaussian Splatting
<div align="center">
A curated collection of resources focused on 3D Gaussian Splatting (3DGS) and related technologies.
[**Browse the Paper List**](https://mrnerf.github.io/awesome-3D-gaussian-splatting/) | [**Contribute**](CONTRIBUTING.md) | [**MrNeRF**](https://www.mrnerf.com)
</div>
## Contents
- [Papers & Documentation](#papers--documentation)
- [Implementations](#implementations)
- [Viewers & Game Engine Support](#viewers--game-engine-support)
- [Tools & Utilities](#tools--utilities)
- [Learning Resources](#learning-resources)
- [Sponsors](#sponsors)
## Papers & Documentation
### Papers Database
Visit our comprehensive, searchable database of 3D Gaussian Splatting papers:
[Papers Database](https://mrnerf.github.io/awesome-3D-gaussian-splatting/)
### Courses
- [MIT Inverse Rendering Lectures (Module 2)](https://www.scenerepresentations.org/courses/inverse-graphics-23/) - Academic deep dive into inverse rendering
### Datasets
- [NERDS 360 Multi-View dataset](https://zubair-irshad.github.io/projects/neo360.html) - High-quality outdoor scene dataset
## Implementations
### Official Reference
- [Original Gaussian Splatting](https://github.com/graphdeco-inria/gaussian-splatting) - The reference implementation by the original authors
### Community Implementations
| Implementation | Language | License | Description |
| ------------------------------------------------------------------------ | ----------- | ---------- | ------------------------------- |
| [LichtFeld-Studio](https://github.com/MrNeRF/LichtFeld-Studio) | C++/CUDA | GPL-3.0 | High-performance implementation |
| [Taichi 3D GS](https://github.com/wanmeihuali/taichi_3d_gaussian_splatting) | Taichi | Apache-2.0 | Taichi-based implementation |
| [Nerfstudio gsplat](https://github.com/nerfstudio-project/gsplat) | Python/CUDA | Apache-2.0 | Integration with Nerfstudio |
| [OpenSplat](https://github.com/pierotofy/OpenSplat) | C++/CPU/GPU | AGPL-3.0 | Cross-platform solution |
| [Grendel](https://github.com/nyu-systems/Grendel-GS) | Python/CUDA | Apache-2.0 | Distributed computing focus |
| [Warp 3DGS](https://github.com/guoriyue/3dgs-warp-scratch) | Warp/Python | AGPL-3.0 | Warp-based implementation |
### Frameworks
- [Pointrix](https://github.com/pointrix-project/pointrix) - Differentiable point-based rendering
- [GauStudio](https://github.com/GAP-LAB-CUHK-SZ/gaustudio) - Unified framework with multiple implementations
- [DriveStudio](https://github.com/ziyc/drivestudio) - Urban scene reconstruction framework
- [GSCodecStudio](https://github.com/JasonLSC/GSCodec_Studio) - Compression and Dynamic splattings
## Viewers & Game Engine Support
### Game Engines
- [Unity Plugin](https://github.com/aras-p/UnityGaussianSplatting)
- [Unity Plugin (gsplat-unity)](https://github.com/wuyize25/gsplat-unity)
- [Unity Plugin (DynGsplat-unity)](https://github.com/HiFi-Human/DynGsplat-unity) - For dynamic splattings
- [Unreal Plugin (MLSLabsGaussianSplattingRenderer-UE) ](https://github.com/mlslabs/MLSLabsGaussianSplattingRenderer-UE))
- [Unreal Plugin (XV3DGS-UEPlugin)](https://github.com/xverse-engine/XV3DGS-UEPlugin)
- [PlayCanvas Engine](https://github.com/playcanvas/engine)
### Web Viewers
**WebGL**
- [Splat Viewer](https://github.com/antimatter15/splat)
- [Gauzilla](https://github.com/BladeTransformerLLC/gauzilla)
- [Interactive Viewer](https://github.com/kishimisu/Gaussian-Splatting-WebGL)
- [GaussianSplats3D](https://github.com/mkkellogg/GaussianSplats3D)
- [PlayCanvas Model Viewer](https://github.com/playcanvas/model-viewer)
- [SuperSplat Viewer](https://github.com/playcanvas/supersplat-viewer)
**WebGPU**
- [EPFL Viewer](https://github.com/cvlab-epfl/gaussian-splatting-web)
- [WebGPU Splat](https://github.com/KeKsBoTer/web-splat)
### Desktop Viewers
**Linux**
- [DearGaussianGUI](https://github.com/leviome/DearGaussianGUI)
- [LiteViz-GS](https://github.com/panxkun/liteviz-gs)
### Native Applications
- [Blender Add-on](https://github.com/ReshotAI/gaussian-splatting-blender-addon)
- [Blender Add-on (KIRI)](https://github.com/Kiri-Innovation/3dgs-render-blender-addon)
- [Blender Add-on (404—GEN)](https://github.com/404-Repo/three-gen-blender-plugin)
- [iOS Metal Viewer](https://github.com/laanlabs/metal-splats)
- [OpenGL Viewer](https://github.com/limacv/GaussianSplattingViewer)
- [VR Support (OpenXR)](https://github.com/hyperlogic/splatapult)
- [ROS2 Support](https://github.com/shadygm/ROSplat)
## Tools & Utilities
### Data Processing
- [Kapture](https://github.com/naver/kapture) - Unified data format for visual localization
- [3DGS Converter](https://github.com/francescofugazzi/3dgsconverter) - Format conversion tool
- [Point Cloud Editor](https://github.com/JohannesKrueger/pointcloudeditor) - Web-based point cloud editing
- [SPZ Converter](https://github.com/stytim/spz) - SPZ conversion tool
- [gsbox Converter](https://github.com/gotoeasy/gsbox) - PLY SPLAT SPZ SPX conversion tool
- [SplatTransform](https://github.com/playcanvas/splat-transform) - CLI tool for converting and editing splats
- [GaussForge](https://github.com/3dgscloud/GaussForge) - C++/WASM-based conversion between PLY, SPZ, SPLAT, and KSPLAT.
### Development Tools
- [GSOPs for Houdini](https://github.com/david-rhodes/GSOPs) - Houdini integration tools
- [camorph](https://github.com/Fraunhofer-IIS/camorph) - Camera parameter conversion
- [SuperSplat](https://github.com/playcanvas/supersplat) - Browser-based 3DGS editor
## Learning Resources
### Blog Posts
- [3DGS Introduction](https://huggingface.co/blog/gaussian-splatting) - HuggingFace guide
- [Implementation Details](https://github.com/kwea123/gaussian_splatting_notes) - Technical deep dive
- [Mathematical Foundation](https://github.com/chiehwangs/3d-gaussian-theory) - Theory explanation
- [Capture Guide](https://medium.com/@heyulei/capture-images-for-gaussian-splatting-81d081bbc826) - Image capture tutorial
- [PyTorch Implementation](https://myasincifci.github.io/) - Curated implementation of Vanilla 3DGS in PyTorch
### Talks
- [Gaussian Splats: Ready for Standardization?](https://www.youtube.com/watch?v=0xdPpKSkO3I) - Metaverse Standards Forum 1/28/2025
- [Unity Integration Guide](https://www.youtube.com/watch?v=pM_HV2TU4rU&t=5298s) - Metaverse Standards Forum 5/6/2025
### Video Tutorials
- [Getting Started (Windows)](https://youtu.be/UXtuigy_wYc)
- [Gaussian Splats Town Hall - Part 2](https://youtu.be/5_GaPYBHqOo)
- [Two-Minute Explanation](https://youtu.be/HVv_IQKlafQ)
- [Jupyter Tutorial](https://www.youtube.com/watch?v=OcvA7fmiZYM)
<br>
## Data
- [NERDS 360 Multi-View dataset for Outdoor Scenes](https://zubair-irshad.github.io/projects/neo360.html)
<br>
## Courses
- [MIT Inverse Rendering Lectures (Module 2)](https://www.scenerepresentations.org/courses/inverse-graphics-23/)
<br>
## Open Source Implementations
### Reference
- [Gaussian Splatting](https://github.com/graphdeco-inria/gaussian-splatting)
### Unofficial Implementations
| | Language | License |
|---------------------------------------------------------------------------------------------|----------------|------------|
| [Taichi 3D Gaussian Splatting](https://github.com/wanmeihuali/taichi_3d_gaussian_splatting) | taichi | Apache-2.0 |
| [Gaussian Splatting 3D](https://github.com/heheyas/gaussian_splatting_3d) | Python/CUDA | |
| [3D Gaussian Splatting](https://github.com/WangFeng18/3d-gaussian-splatting) | Python/CUDA | MIT |
| [fast](https://github.com/MrNeRF/gaussian-splatting-cuda) | C++/CUDA | Inria/MPII |
| [nerfstudio](https://github.com/nerfstudio-project/gsplat) | Python/CUDA | Apache-2.0 |
| [taichi-splatting](https://github.com/uc-vision/taichi-splatting) | taichi/PyTorch | Apache-2.0 |
| [OpenSplat](https://github.com/pierotofy/OpenSplat) | C++/CPU or GPU | AGPL-3.0 |
| [3D Gaussian Splatting](https://github.com/joeyan/gaussian_splatting) | Python/CUDA | MIT |
| [Grendel Distributed 3DGS](https://github.com/nyu-systems/Grendel-GS) | Python/CUDA | Apache-2.0 |
### 2D Gaussian Splatting
- [jupyter notebook 2D GS splatting](https://github.com/OutofAi/2D-Gaussian-Splatting)
### Gaussian Style Transfer
- [Direct Gaussian Style Optimization (DGSO): Stylizing 3D Gaussian Splats](https://github.com/An-u-rag/stylized-gaussian-splatting) - Applying style transfer during gaussian optimization to produce stylized gaussian splats of a scene.
### Game Engines
- [Unity](https://github.com/aras-p/UnityGaussianSplatting)
- [PlayCanvas](https://github.com/playcanvas/engine/tree/main/src/scene/gsplat)
- [Unreal](https://github.com/xverse-engine/XV3DGS-UEPlugin)
### Viewers
- [WebGL Viewer 1](https://github.com/antimatter15/splat)
- [WebGL Viewer 2](https://github.com/kishimisu/Gaussian-Splatting-WebGL)
- [WebGL Viewer 3](https://github.com/BladeTransformerLLC/gauzilla)
- [WebGPU Viewer 1](https://github.com/cvlab-epfl/gaussian-splatting-web)
- [WebGPU Viewer 2](https://github.com/MarcusAndreasSvensson/gaussian-splatting-webgpu)
- [WebGPU Viewer 3](https://github.com/KeKsBoTer/web-splat)
- [Three.js](https://github.com/mkkellogg/GaussianSplats3D)
- [A-Frame](https://github.com/quadjr/aframe-gaussian-splatting)
- [Nerfstudio Unofficial](https://github.com/yzslab/nerfstudio/tree/gaussian_splatting)
- [Nerfstudio Viser](https://github.com/nerfstudio-project/viser)
- [Blender (Editor)](https://github.com/ReshotAI/gaussian-splatting-blender-addon/tree/master)
- [WebRTC viewer](https://github.com/dylanebert/gaussian-viewer)
- [iOS & Metal viewer](https://github.com/laanlabs/metal-splats)
- [jupyter notebook](https://github.com/shumash/gaussian-splatting/blob/mshugrina/interactive/interactive.ipynb)
- [PyOpenGL viewer (also with official CUDA backend)](https://github.com/limacv/GaussianSplattingViewer.git)
- [PlayCanvas Viewer](https://github.com/playcanvas/model-viewer)
- [gsplat.js](https://github.com/dylanebert/gsplat.js)
- [Splatapult](https://github.com/hyperlogic/splatapult) - 3d gaussian splatting renderer in C++ and OpenGL, works with OpenXR for tethered VR
- [3DGS.cpp](https://github.com/shg8/3DGS.cpp) - cross-platform, high performance 3DGS renderer in C++ and Vulkan Compute, supporting Windows, macOS, Linux, iOS, and visionOS
- [vkgs](https://github.com/jaesung-cs/vkgs) - cross-platform, high performance 3DGS renderer in C++ and Vulkan Compute/Graphics
- [spaTV](https://github.com/antimatter15/splaTV) - WebGL Viewer for 4D Gaussians (based on SpaceTime Gaussian) with demo [here](http://antimatter15.com/splaTV/)
- [Taichi Viewer](https://github.com/uc-vision/splat-viewer)
- [uc-vision-splat-viewer](https://github.com/uc-vision/splat-viewer)(3D gaussin splatting renderer with benchmarking capability)
- [splatviz](https://github.com/Florian-Barthel/splatviz) - Viewer that allows you to edit the rendering code during runtime or to display multiple scenes at once.
- [Houdini Gaussian Splatting Viewport Renderer](https://github.com/rubendhz/houdini-gsplat-renderer) - A HDK/GLSL implementation of Gaussian Splatting in Houdini
### Utilities
- [Kapture](https://github.com/naver/kapture) - A unified data format to facilitate visual localization and structure from motion e.g. for bundler to colmap model conversion
- [Kapture image cropper script](https://gist.github.com/jo-chemla/258e6e40d3d6c2220b29518ff3c17c40) - Undistorted image cropper script to remove black borders with included conversion instructions
- [camorph](https://github.com/Fraunhofer-IIS/camorph) - A toolbox for conversion between camera parameter conventions e.g. Reality Capture to colmap model
- [3DGS Converter](https://github.com/francescofugazzi/3dgsconverter) - A tool for converting 3D Gaussian Splatting .ply files into a format suitable for Cloud Compare and vice-versa
- [SuperSplat](https://github.com/playcanvas/super-splat) - Open source browser-based tool to clean/filter, reorient and compress .ply/.splat files
- [SpectacularAI](https://github.com/SpectacularAI/point-cloud-tools) - Conversion scripts for different 3DGS conventions
- [GSOPs](https://github.com/david-rhodes/GSOPs) - GSOPs (Gaussian Splat Operators) for SideFX Houdini. Import, edit, and export models, or generate synthetic training data
- [Point Cloud Editor](https://github.com/JohannesKrueger/pointcloudeditor) - Clean and edit pointclouds from that are in colmap sparse format in a browser to improve reconstruction results
### Tutorial
- [Tutorial from the authors of 3DGS](https://3dgstutorial.github.io/)
### Framework
- [Pointrix](https://github.com/pointrix-project/pointrix) - A differentiable point-based rendering framework.
- [msplat](https://github.com/pointrix-project/msplat) - A modular differential gaussian rasterization library.
- [GauStudio](https://github.com/GAP-LAB-CUHK-SZ/gaustudio) - Unified framework with different paper implementations
- [DriveStudio](https://github.com/ziyc/drivestudio) - A 3DGS framework for omni urban scene reconstruction and simulation.
- [gaussian-splatting-lightning](https://github.com/yzslab/gaussian-splatting-lightning) - A 3D Gaussian Splatting framework with various derived algorithms and an interactive web viewer
### Other
- [My-exp-Gaussians](https://github.com/ingra14m/My-exp-Gaussians) - Enhancing the ability of 3D Gaussians to model complex scenes
- [360-gaussian-splatting](https://github.com/inuex35/360-gaussian-splatting) - Generate gaussian splatting directly from 360 images
## Blog Posts
1. [Gaussian Splatting is pretty cool](https://aras-p.info/blog/2023/09/05/Gaussian-Splatting-is-pretty-cool/)
2. [Making Gaussian Splats smaller](https://aras-p.info/blog/2023/09/13/Making-Gaussian-Splats-smaller/)
3. [Making Gaussian Splats more smaller](https://aras-p.info/blog/2023/09/27/Making-Gaussian-Splats-more-smaller/)
4. [Introduction to 3D Gaussian Splatting](https://huggingface.co/blog/gaussian-splatting)
5. [Very good (technical) intro to 3D Gaussian Splatting](https://medium.com/@AriaLeeNotAriel/numbynum-3d-gaussian-splatting-for-real-time-radiance-field-rendering-kerbl-et-al-60c0b25e5544)
6. [Write up on some mathematical details of the 3DGS implementation](https://github.com/kwea123/gaussian_splatting_notes)
7. [Discussion about gs universal format](https://github.com/mkkellogg/GaussianSplats3D/issues/47#issuecomment-1801360116)
8. [Math explanation to understand 3DGS](https://github.com/chiehwangs/3d-gaussian-theory)
9. [Compressing Gaussian Splats](https://blog.playcanvas.com/compressing-gaussian-splats/)
10. [Comprehensive overview of Gaussian Splatting](https://towardsdatascience.com/a-comprehensive-overview-of-gaussian-splatting-e7d570081362)
11. [Gaussian Head Avatars: A Summary](https://towardsdatascience.com/gaussian-head-avatars-a-summary-2bd17bd48500)
12. [NeRFs vs. 3DGS](https://edwardahn.me/writing/NeRFvs3DGS/)
13. [Howto capture images for 3DGS](https://medium.com/@heyulei/capture-images-for-gaussian-splatting-81d081bbc826)
14. [Mathematical details of forward and backward passes](https://github.com/joeyan/gaussian_splatting/blob/main/MATH.md)
15. [3D in Geospatial: NeRFs, Gaussian Splatting, and Spatial Computing](https://ckoziol.com/blog/2024/radiance_methods/)
## Tutorial Videos
1. [Getting Started with 3DGS for Windows](https://youtu.be/UXtuigy_wYc?si=j1vfORNspcocSH-b)
2. [How to view 3DGS Scenes in Unity](https://youtu.be/5_GaPYBHqOo?si=6u9j1HqXwF_5WSUL)
3. [Two-minute explanation of 3DGS](https://youtu.be/HVv_IQKlafQ?si=w5c9XKHfKIBuXDLW)
4. [Jupyter notebook tutorial](https://www.youtube.com/watch?v=OcvA7fmiZYM&t=2s)
5. [Intro to gaussian splatting (and Unity plugin)](https://www.xuanprada.com/blog/2023/10/22/intro-to-gaussian-splatting)
6. [Computerphile 3DGS explanation](https://youtu.be/VkIJbpdTujE?si=1GLjzBfF9LCuT22o)
## Credits
- Thanks to [Leonid Keselman](https://github.com/leonidk) for informing me about the release of the paper "Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting".
- Thanks to [Eric Haines](https://github.com/erich666) for suggesting the jupyter notebook viewer, windows tutorial and for fixing text hyphenations and other issues.
- Thanks to [Henry Pearce](https://github.com/henrypearce4D) for maintaining contributions.
=======
- [Yehe Liu](https://x.com/YeheLiu)
>>>>>>> 7656f5e7ed3bc239fae0e9a8e1990be82bd7daa9
================================================
FILE: awesome_3dgs_papers.yaml
================================================
- id: ren2025fastgs
title: 'FastGS: Training 3D Gaussian Splatting in 100 Seconds'
authors: Shiwei Ren, Tianci Wen, Yongchun Fang, Biao Lu
year: '2025'
abstract: 'The dominant 3D Gaussian splatting (3DGS) acceleration methods fail to
properly regulate the number of Gaussians during training, causing redundant computational
time overhead. In this paper, we propose FastGS, a novel, simple, and general
acceleration framework that fully considers the importance of each Gaussian based
on multi-view consistency, efficiently solving the trade-off between training
time and rendering quality. We innovatively design a densification and pruning
strategy based on multi-view consistency, dispensing with the budgeting mechanism.
Extensive experiments on Mip-NeRF 360, Tanks & Temples, and Deep Blending datasets
demonstrate that our method significantly outperforms the state-of-the-art methods
in training speed, achieving a 3.32$\times$ training acceleration and comparable
rendering quality compared with DashGaussian on the Mip-NeRF 360 dataset and a
15.45$\times$ acceleration compared with vanilla 3DGS on the Deep Blending dataset.
We demonstrate that FastGS exhibits strong generality, delivering 2-7$\times$
training acceleration across various tasks, including dynamic scene reconstruction,
surface reconstruction, sparse-view reconstruction, large-scale reconstruction,
and simultaneous localization and mapping. The project page is available at https://fastgs.github.io/
'
project_page: https://fastgs.github.io/
paper: https://arxiv.org/pdf/2511.04283.pdf
code: https://github.com/fastgs/FastGS
video: null
tags:
- Acceleration
- Code
- Densification
- Dynamic
- Project
- Sparse
thumbnail: assets/thumbnails/ren2025fastgs.jpg
publication_date: '2025-11-06T11:21:16+00:00'
date_source: arxiv
- id: pan2025diff4splat
title: 'Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction
Models'
authors: Panwang Pan, Chenguo Lin, Jingjing Zhao, Chenxin Li, Yuchen Lin, Haopeng
Li, Honglei Yan, Kairun Wen, Yunlong Lin, Yixuan Yuan, Yadong Mu
year: '2025'
abstract: 'We introduce Diff4Splat, a feed-forward method that synthesizes controllable
and explicit 4D scenes from a single image. Our approach unifies the generative
priors of video diffusion models with geometry and motion constraints learned
from large-scale 4D datasets. Given a single input image, a camera trajectory,
and an optional text prompt, Diff4Splat directly predicts a deformable 3D Gaussian
field that encodes appearance, geometry, and motion, all in a single forward pass,
without test-time optimization or post-hoc refinement. At the core of our framework
lies a video latent transformer, which augments video diffusion models to jointly
capture spatio-temporal dependencies and predict time-varying 3D Gaussian primitives.
Training is guided by objectives on appearance fidelity, geometric accuracy, and
motion consistency, enabling Diff4Splat to synthesize high-quality 4D scenes in
30 seconds. We demonstrate the effectiveness of Diff4Splatacross video generation,
novel view synthesis, and geometry extraction, where it matches or surpasses optimization-based
methods for dynamic scene synthesis while being significantly more efficient.
'
project_page: https://paulpanwang.github.io/Diff4Splat/
paper: https://arxiv.org/pdf/2511.00503.pdf
code: https://github.com/paulpanwang/Diff4Splat
video: https://www.youtube.com/watch?v=IZKt-pvCLd0
tags:
- Code
- Diffusion
- Dynamic
- Feed-Forward
- Gaussian Video
- Project
- Video
- Virtual Reality
- World Generation
thumbnail: assets/thumbnails/pan2025diff4splat.jpg
publication_date: '2025-11-01T11:16:25+00:00'
date_source: arxiv
- id: xin2025learning
title: Learning Unified Representation of 3D Gaussian Splatting
authors: Yuelin Xin, Yuheng Liu, Xiaohui Xie, Xinke Li
year: '2025'
abstract: 'A well-designed vectorized representation is crucial for the learning
systems natively based on 3D Gaussian Splatting. While 3DGS enables efficient
and explicit 3D reconstruction, its parameter-based representation remains hard
to learn as features, especially for neural-network-based models. Directly feeding
raw Gaussian parameters into learning frameworks fails to address the non-unique
and heterogeneous nature of the Gaussian parameterization, yielding highly data-dependent
models. This challenge motivates us to explore a more principled approach to represent
3D Gaussian Splatting in neural networks that preserves the underlying color and
geometric structure while enforcing unique mapping and channel homogeneity. In
this paper, we propose an embedding representation of 3DGS based on continuous
submanifold fields that encapsulate the intrinsic information of Gaussian primitives,
thereby benefiting the learning of 3DGS.
'
project_page: https://cilix-ai.github.io/gs-embedding-page/
paper: https://arxiv.org/pdf/2509.22917.pdf
code: https://github.com/cilix-ai/gs-embedding
video: null
tags:
- Code
- Compression
- Feed-Forward
- Point Cloud
- Project
- Segmentation
thumbnail: assets/thumbnails/xin2025learning.jpg
publication_date: '2025-09-26T20:44:59+00:00'
date_source: arxiv
- id: chang2025meshsplat
title: 'MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian
Splatting'
authors: Hanzhi Chang, Ruijie Zhu, Wenjie Chang, Mulin Yu, Yanzhe Liang, Jiahao
Lu, Zhuoyuan Li, Tianzhu Zhang
year: '2025'
abstract: 'Surface reconstruction has been widely studied in computer vision and
graphics. However, existing surface reconstruction works struggle to recover accurate
scene geometry when the input views are extremely sparse. To address this issue,
we propose MeshSplat, a generalizable sparse-view surface reconstruction framework
via Gaussian Splatting. Our key idea is to leverage 2DGS as a bridge, which connects
novel view synthesis to learned geometric priors and then transfers these priors
to achieve surface reconstruction. Specifically, we incorporate a feed-forward
network to predict per-view pixel-aligned 2DGS, which enables the network to synthesize
novel view images and thus eliminates the need for direct 3D ground-truth supervision.
To improve the accuracy of 2DGS position and orientation prediction, we propose
a Weighted Chamfer Distance Loss to regularize the depth maps, especially in overlapping
areas of input views, and also a normal prediction network to align the orientation
of 2DGS with normal vectors predicted by a monocular normal estimator. Extensive
experiments validate the effectiveness of our proposed improvement, demonstrating
that our method achieves state-of-the-art performance in generalizable sparse-view
mesh reconstruction tasks.
'
project_page: https://hanzhichang.github.io/meshsplat_web/
paper: https://arxiv.org/pdf/2508.17811.pdf
code: null
video: https://hanzhichang.github.io/meshsplat_web/static/images/meshsplat/demo.mp4
tags:
- 2DGS
- Feed-Forward
- Meshing
thumbnail: assets/thumbnails/chang2025meshsplat.jpg
publication_date: '2025-08-25T09:04:20+00:00'
date_source: arxiv
- id: zhu2025objectgs
title: 'ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via
Gaussian Splatting'
authors: Ruijie Zhu, Mulin Yu, Linning Xu, Lihan Jiang, Yixuan Li, Tianzhu Zhang,
Jiangmiao Pang, Bo Dai
year: '2025'
abstract: '3D Gaussian Splatting is renowned for its high-fidelity reconstructions
and real-time novel view synthesis, yet its lack of semantic understanding limits
object-level perception. In this work, we propose ObjectGS, an object-aware framework
that unifies 3D scene reconstruction with semantic understanding. Instead of treating
the scene as a unified whole, ObjectGS models individual objects as local anchors
that generate neural Gaussians and share object IDs, enabling precise object-level
reconstruction. During training, we dynamically grow or prune these anchors and
optimize their features, while a one-hot ID encoding with a classification loss
enforces clear semantic constraints. We show through extensive experiments that
ObjectGS not only outperforms state-of-the-art methods on open-vocabulary and
panoptic segmentation tasks, but also integrates seamlessly with applications
like mesh extraction and scene editing.
'
project_page: https://ruijiezhu94.github.io/ObjectGS_page/
paper: https://arxiv.org/pdf/2507.15454.pdf
code: https://github.com/RuijieZhu94/ObjectGS
video: null
tags:
- Code
- Project
- Segmentation
- Language Embedding
thumbnail: assets/thumbnails/zhu2025objectgs.jpg
publication_date: '2025-07-21T10:06:23+00:00'
date_source: arxiv
- id: wang2025adaptive
title: Adaptive Voxelization for Transform coding of 3D Gaussian splatting data
authors: Chenjunjie Wang, Shashank N. Sridhara, Eduardo Pavez, Antonio Ortega, Cheng
Chang
year: '2025'
abstract: 'We present a novel compression framework for 3D Gaussian splatting (3DGS)
data that leverages transform coding tools originally developed for point clouds.
Contrary to existing 3DGS compression methods, our approach can produce compressed
3DGS models at multiple bitrates in a computationally efficient way. Point cloud
voxelization is a discretization technique that point cloud codecs use to improve
coding efficiency while enabling the use of fast transform coding algorithms.
We propose an adaptive voxelization algorithm tailored to 3DGS data, to avoid
the inefficiencies introduced by uniform voxelization used in point cloud codecs.
We ensure the positions of larger volume Gaussians are represented at high resolution,
as these significantly impact rendering quality. Meanwhile, a low-resolution representation
is used for dense regions with smaller Gaussians, which have a relatively lower
impact on rendering quality. This adaptive voxelization approach significantly
reduces the number of Gaussians and the bitrate required to encode the 3DGS data.
After voxelization, many Gaussians are moved or eliminated. Thus, we propose to
fine-tune/recolor the remaining 3DGS attributes with an initialization that can
reduce the amount of retraining required. Experimental results on pre-trained
datasets show that our proposed compression framework outperforms existing methods.
'
project_page: null
paper: https://arxiv.org/pdf/2506.00271.pdf
code: https://github.com/STAC-USC/3DGS_Compression_Adaptive_Voxelization
video: https://www.youtube.com/watch?v=o92Bj0k1izA
tags:
- Code
- Compression
- Video
thumbnail: assets/thumbnails/wang2025adaptive.jpg
publication_date: '2025-05-30T22:12:33+00:00'
date_source: arxiv
- id: howil2025clipgaussian
title: 'CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian
Splatting'
authors: Kornel Howil, Joanna Waczyńska, Piotr Borycki, Tadeusz Dziarmaga, Marcin
Mazur, Przemysław Spurek
year: '2025'
abstract: 'Gaussian Splatting (GS) has recently emerged as an efficient representation
for rendering 3D scenes from 2D images and has been extended to images, videos,
and dynamic 4D content. However, applying style transfer to GS-based representations,
especially beyond simple color changes, remains challenging. In this work, we
introduce CLIPGaussians, the first unified style transfer framework that supports
text- and image-guided stylization across multiple modalities: 2D images, videos,
3D objects, and 4D scenes. Our method operates directly on Gaussian primitives
and integrates into existing GS pipelines as a plug-in module, without requiring
large generative models or retraining from scratch. CLIPGaussians approach enables
joint optimization of color and geometry in 3D and 4D settings, and achieves temporal
coherence in videos, while preserving a model size. We demonstrate superior style
fidelity and consistency across all tasks, validating CLIPGaussians as a universal
and efficient solution for multimodal style transfer.
'
project_page: https://kornelhowil.github.io/CLIPGaussian/
paper: https://arxiv.org/pdf/2505.22854.pdf
code: ' https://github.com/kornelhowil/CLIPGaussian'
video: null
tags:
- Code
- Project
- Style Transfer
thumbnail: assets/thumbnails/howil2025clipgaussian.jpg
publication_date: '2025-05-28T20:41:24+00:00'
date_source: arxiv
- id: gomez2025foci
title: 'FOCI: Trajectory Optimization on Gaussian Splats'
authors: Mario Gomez Andreu, Maximum Wilder-Smith, Victor Klemm, Vaishakh Patil,
Jesus Tordesillas, Marco Hutter
year: '2025'
abstract: '3D Gaussian Splatting (3DGS) has recently gained popularity as a faster
alternative to Neural Radiance Fields (NeRFs) in 3D reconstruction and view synthesis
methods. Leveraging the spatial information encoded in 3DGS, this work proposes
FOCI (Field Overlap Collision Integral), an algorithm that is able to optimize
trajectories directly on the Gaussians themselves. FOCI leverages a novel and
interpretable collision formulation for 3DGS using the notion of the overlap integral
between Gaussians. Contrary to other approaches, which represent the robot with
conservative bounding boxes that underestimate the traversability of the environment,
we propose to represent the environment and the robot as Gaussian Splats. This
not only has desirable computational properties, but also allows for orientation-aware
planning, allowing the robot to pass through very tight and narrow spaces. We
extensively test our algorithm in both synthetic and real Gaussian Splats, showcasing
that collision-free trajectories for the ANYmal legged robot that can be computed
in a few seconds, even with hundreds of thousands of Gaussians making up the environment.
The project page and code are available at https://rffr.leggedrobotics.com/works/foci/
'
project_page: null
paper: https://arxiv.org/pdf/2505.08510.pdf
code: null
video: null
tags:
- Optimization
- Robotics
thumbnail: assets/thumbnails/andreu2025foci.jpg
publication_date: '2025-05-13T12:40:19+00:00'
date_source: arxiv
- id: liu2025abcgs
title: 'ABC-GS: Alignment-Based Controllable Style Transfer for 3D Gaussian Splatting'
authors: Wenjie Liu, Zhongliang Liu, Xiaoyan Yang, Man Sha, Yang Li
year: '2025'
abstract: '3D scene stylization approaches based on Neural Radiance Fields (NeRF)
achieve promising results by optimizing with Nearest Neighbor Feature Matching
(NNFM) loss. However, NNFM loss does not consider global style information. In
addition, the implicit representation of NeRF limits their fine-grained control
over the resulting scenes. In this paper, we introduce ABC-GS, a novel framework
based on 3D Gaussian Splatting to achieve high-quality 3D style transfer. To this
end, a controllable matching stage is designed to achieve precise alignment between
scene content and style features through segmentation masks. Moreover, a style
transfer loss function based on feature alignment is proposed to ensure that the
outcomes of style transfer accurately reflect the global style of the reference
image. Furthermore, the original geometric information of the scene is preserved
with the depth loss and Gaussian regularization terms. Extensive experiments show
that our ABC-GS provides controllability of style transfer and achieves stylization
results that are more faithfully aligned with the global style of the chosen artistic
reference. Our homepage is available at https://vpx-ecnu.github.io/ABC-GS-website.
'
project_page: https://vpx-ecnu.github.io/ABC-GS-website/
paper: https://arxiv.org/pdf/2503.22218.pdf
code: https://github.com/vpx-ecnu/ABC-GS
video: null
tags:
- Code
- Project
- Style Transfer
thumbnail: assets/thumbnails/liu2025abcgs.jpg
publication_date: '2025-03-28T08:07:57+00:00'
date_source: arxiv
- id: huang2025stdloc
title: 'From Sparse to Dense: Camera Relocalization with Scene-Specific Detector
from Feature Gaussian Splatting'
authors: Zhiwei Huang, Hailin Yu, Yichun Shentu, Jin Yuan, Guofeng Zhang
year: '2025'
abstract: 'This paper presents a novel camera relocalization method, STDLoc, which
leverages Feature Gaussian as scene representation. STDLoc is a full relocalization
pipeline that can achieve accurate relocalization without relying on any pose
prior. Unlike previous coarse-to-fine localization methods that require image
retrieval first and then feature matching, we propose a novel sparse-to-dense
localization paradigm. Based on this scene representation, we introduce a novel
matching-oriented Gaussian sampling strategy and a scene-specific detector to
achieve efficient and robust initial pose estimation. Furthermore, based on the
initial localization results, we align the query feature map to the Gaussian feature
field by dense feature matching to enable accurate localization. The experiments
on indoor and outdoor datasets show that STDLoc outperforms current state-of-the-art
localization methods in terms of localization accuracy and recall.
'
project_page: https://zju3dv.github.io/STDLoc/
paper: https://arxiv.org/pdf/2503.19358.pdf
code: https://github.com/zju3dv/STDLoc
video: null
tags:
- Code
- Poses
- Project
thumbnail: assets/thumbnails/huang2025from.jpg
publication_date: '2025-03-25T05:18:19+00:00'
date_source: arxiv
- id: zhang2025motion
title: Motion Blender Gaussian Splatting for Dynamic Scene Reconstruction
authors: Xinyu Zhang, Haonan Chang, Yuhan Liu, Abdeslam Boularias
year: '2025'
abstract: 'Gaussian splatting has emerged as a powerful tool for high-fidelity reconstruction
of dynamic scenes. However, existing methods primarily rely on implicit motion
representations, such as encoding motions into neural networks or per-Gaussian
parameters, which makes it difficult to further manipulate the reconstructed motions.
This lack of explicit controllability limits existing methods to replaying recorded
motions only, which hinders a wider application in robotics. To address this,
we propose Motion Blender Gaussian Splatting (MBGS), a novel framework that uses
motion graphs as an explicit and sparse motion representation. The motion of a
graph''s links is propagated to individual Gaussians via dual quaternion skinning,
with learnable weight painting functions that determine the influence of each
link. The motion graphs and 3D Gaussians are jointly optimized from input videos
via differentiable rendering. Experiments show that MBGS achieves state-of-the-art
performance on the highly challenging iPhone dataset while being competitive on
HyperNeRF. We demonstrate the application potential of our method in animating
novel object poses, synthesizing real robot demonstrations, and predicting robot
actions through visual planning. The source code, models, video demonstrations
can be found at http://mlzxy.github.io/motion-blender-gs.
'
project_page: https://mlzxy.github.io/motion-blender-gs/
paper: https://arxiv.org/pdf/2503.09040.pdf
code: https://github.com/mlzxy/motion-blender-gs
video: null
tags:
- Code
- Dynamic
- Editing
- Project
- Robotics
- Segmentation
thumbnail: assets/thumbnails/zhang2025motion.jpg
publication_date: '2025-03-12T03:55:16+00:00'
date_source: arxiv
- id: yu2025persistent
title: Persistent Object Gaussian Splat (POGS) for Tracking Human and Robot Manipulation
of Irregularly Shaped Objects
authors: Justin Yu, Kush Hari, Karim El-Refai, Arnav Dalal, Justin Kerr, Chung Min
Kim, Richard Cheng, Muhammad Zubair Irshad, Ken Goldberg
year: '2025'
abstract: 'Tracking and manipulating irregularly-shaped, previously unseen objects
in dynamic environments is important for robotic applications in manufacturing,
assembly, and logistics. Recently introduced Gaussian Splats efficiently model
object geometry, but lack persistent state estimation for task-oriented manipulation.
We present Persistent Object Gaussian Splat (POGS), a system that embeds semantics,
self-supervised visual features, and object grouping features into a compact representation
that can be continuously updated to estimate the pose of scanned objects. POGS
updates object states without requiring expensive rescanning or prior CAD models
of objects. After an initial multi-view scene capture and training phase, POGS
uses a single stereo camera to integrate depth estimates along with self-supervised
vision encoder features for object pose estimation. POGS supports grasping, reorientation,
and natural language-driven manipulation by refining object pose estimates, facilitating
sequential object reset operations with human-induced object perturbations and
tool servoing, where robots recover tool pose despite tool perturbations of up
to 30{\deg}. POGS achieves up to 12 consecutive successful object resets and recovers
from 80% of in-grasp tool perturbations.
'
project_page: https://berkeleyautomation.github.io/POGS/
paper: https://arxiv.org/pdf/2503.05189.pdf
code: https://github.com/uynitsuj/pogs
video: null
tags:
- Code
- Language Embedding
- Project
- Robotics
- Segmentation
thumbnail: assets/thumbnails/yu2025persistent.jpg
publication_date: '2025-03-07T07:20:25+00:00'
date_source: arxiv
- id: chacko2025lifting
title: 'Lifting by Gaussians: A Simple, Fast and Flexible Method for 3D Instance
Segmentation'
authors: Rohan Chacko, Nicolai Haeni, Eldar Khaliullin, Lin Sun, Douglas Lee
year: '2025'
abstract: 'We introduce Lifting By Gaussians (LBG), a novel approach for open-world
instance segmentation of 3D Gaussian Splatted Radiance Fields (3DGS). Recently,
3DGS Fields have emerged as a highly efficient and explicit alternative to Neural
Field-based methods for high-quality Novel View Synthesis. Our 3D instance segmentation
method directly lifts 2D segmentation masks from SAM (alternately FastSAM, etc.),
together with features from CLIP and DINOv2, directly fusing them onto 3DGS (or
similar Gaussian radiance fields such as 2DGS). Unlike previous approaches, LBG
requires no per-scene training, allowing it to operate seamlessly on any existing
3DGS reconstruction. Our approach is not only an order of magnitude faster and
simpler than existing approaches; it is also highly modular, enabling 3D semantic
segmentation of existing 3DGS fields without requiring a specific parametrization
of the 3D Gaussians. Furthermore, our technique achieves superior semantic segmentation
for 2D semantic novel view synthesis and 3D asset extraction results while maintaining
flexibility and efficiency. We further introduce a novel approach to evaluate
individually segmented 3D assets from 3D radiance field segmentation methods.
'
project_page: null
paper: https://arxiv.org/pdf/2502.00173.pdf
code: null
video: null
tags:
- Language Embedding
- Segmentation
- Virtual Reality
thumbnail: assets/thumbnails/chacko2025lifting.jpg
publication_date: '2025-01-31T21:30:59+00:00'
date_source: arxiv
- id: lin2025omniphysgs
title: 'OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics
Generation'
authors: Yuchen Lin, Chenguo Lin, Jianjin Xu, Yadong Mu
year: '2025'
abstract: 'Recently, significant advancements have been made in the reconstruction
and generation of 3D assets, including static cases and those with physical interactions.
To recover the physical properties of 3D assets, existing methods typically assume
that all materials belong to a specific predefined category (e.g., elasticity).
However, such assumptions ignore the complex composition of multiple heterogeneous
objects in real scenarios and tend to render less physically plausible animation
given a wider range of objects. We propose OmniPhysGS for synthesizing a physics-based
3D dynamic scene composed of more general objects. A key design of OmniPhysGS
is treating each 3D asset as a collection of constitutive 3D Gaussians. For each
Gaussian, its physical material is represented by an ensemble of 12 physical domain-expert
sub-models (rubber, metal, honey, water, etc.), which greatly enhances the flexibility
of the proposed model. In the implementation, we define a scene by user-specified
prompts and supervise the estimation of material weighting factors via a pretrained
video diffusion model. Comprehensive experiments demonstrate that OmniPhysGS achieves
more general and realistic physical dynamics across a broader spectrum of materials,
including elastic, viscoelastic, plastic, and fluid substances, as well as interactions
between different materials. Our method surpasses existing methods by approximately
3% to 16% in metrics of visual quality and text alignment.
'
project_page: https://wgsxm.github.io/projects/omniphysgs/
paper: https://arxiv.org/pdf/2501.18982.pdf
code: https://github.com/wgsxm/omniphysgs
video: https://wgsxm.github.io/videos/omniphysgs.mp4
tags:
- Code
- Dynamic
- Physics
- Project
- Video
thumbnail: assets/thumbnails/lin2025omniphysgs.jpg
publication_date: '2025-01-31T09:28:07+00:00'
date_source: arxiv
- id: guizilini2025zeroshot
title: Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion
authors: Vitor Guizilini, Muhammad Zubair Irshad, Dian Chen, Greg Shakhnarovich,
Rares Ambrus
year: '2025'
abstract: 'Current methods for 3D scene reconstruction from sparse posed images
employ intermediate 3D representations such as neural fields, voxel grids, or
3D Gaussians, to achieve multi-view consistent scene appearance and geometry.
In this paper we introduce MVGD, a diffusion-based architecture capable of direct
pixel-level generation of images and depth maps from novel viewpoints, given an
arbitrary number of input views. Our method uses raymap conditioning to both augment
visual features with spatial information from different viewpoints, as well as
to guide the generation of images and depth maps from novel views. A key aspect
of our approach is the multi-task generation of images and depth maps, using learnable
task embeddings to guide the diffusion process towards specific modalities. We
train this model on a collection of more than 60 million multi-view samples from
publicly available datasets, and propose techniques to enable efficient and consistent
learning in such diverse conditions. We also propose a novel strategy that enables
the efficient training of larger models by incrementally fine-tuning smaller ones,
with promising scaling behavior. Through extensive experiments, we report state-of-the-art
results in multiple novel view synthesis benchmarks, as well as multi-view stereo
and video depth estimation.
'
project_page: https://mvgd.github.io/
paper: https://arxiv.org/pdf/2501.18804.pdf
code: null
video: null
tags:
- 360 degree
- Diffusion
- Feed-Forward
- Large-Scale
- Point Cloud
- Project
thumbnail: assets/thumbnails/guizilini2025zeroshot.jpg
publication_date: '2025-01-30T23:43:06+00:00'
date_source: arxiv
- id: lin2025diffsplat
title: 'DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat
Generation'
authors: Chenguo Lin, Panwang Pan, Bangbang Yang, Zeming Li, Yadong Mu
year: '2025'
abstract: 'Recent advancements in 3D content generation from text or a single image
struggle with limited high-quality 3D datasets and inconsistency from 2D multi-view
generation. We introduce DiffSplat, a novel 3D generative framework that natively
generates 3D Gaussian splats by taming large-scale text-to-image diffusion models.
It differs from previous 3D generative models by effectively utilizing web-scale
2D priors while maintaining 3D consistency in a unified model. To bootstrap the
training, a lightweight reconstruction model is proposed to instantly produce
multi-view Gaussian splat grids for scalable dataset curation. In conjunction
with the regular diffusion loss on these grids, a 3D rendering loss is introduced
to facilitate 3D coherence across arbitrary views. The compatibility with image
diffusion models enables seamless adaptions of numerous techniques for image generation
to the 3D realm. Extensive experiments reveal the superiority of DiffSplat in
text- and image-conditioned generation tasks and downstream applications. Thorough
ablation studies validate the efficacy of each critical design choice and provide
insights into the underlying mechanism.
'
project_page: https://chenguolin.github.io/projects/DiffSplat/
paper: https://arxiv.org/pdf/2501.16764.pdf
code: https://github.com/chenguolin/DiffSplat
video: null
tags:
- Diffusion
- Project
thumbnail: assets/thumbnails/lin2025diffsplat.jpg
publication_date: '2025-01-28T07:38:59+00:00'
date_source: arxiv
- id: armagan2025trickgs
title: 'Trick-GS: A Balanced Bag of Tricks for Efficient Gaussian Splatting'
authors: Anil Armagan, Albert Saà-Garriga, Bruno Manganelli, Mateusz Nowak, Mehmet
Kerim Yucel
year: '2025'
abstract: 'Gaussian splatting (GS) for 3D reconstruction has become quite popular
due to their fast training, inference speeds and high quality reconstruction.
However, GS-based reconstructions generally consist of millions of Gaussians,
which makes them hard to use on computationally constrained devices such as smartphones.
In this paper, we first propose a principled analysis of advances in efficient
GS methods. Then, we propose Trick-GS, which is a careful combination of several
strategies including (1) progressive training with resolution, noise and Gaussian
scales, (2) learning to prune and mask primitives and SH bands by their significance,
and (3) accelerated GS training framework. Trick-GS takes a large step towards
resource-constrained GS, where faster run-time, smaller and faster-convergence
of models is of paramount concern. Our results on three datasets show that Trick-GS
achieves up to 2x faster training, 40x smaller disk size and 2x faster rendering
speed compared to vanilla GS, while having comparable accuracy.
'
project_page: null
paper: https://arxiv.org/pdf/2501.14534.pdf
code: null
video: null
tags:
- Acceleration
thumbnail: assets/thumbnails/armagan2025trickgs.jpg
publication_date: '2025-01-24T14:40:40+00:00'
date_source: arxiv
- id: lee2025densesfm
title: 'Dense-SfM: Structure from Motion with Dense Consistent Matching'
authors: JongMin Lee, Sungjoo Yoo
year: '2025'
abstract: 'We present Dense-SfM, a novel Structure from Motion (SfM) framework designed
for dense and accurate 3D reconstruction from multi-view images. Sparse keypoint
matching, which traditional SfM methods often rely on, limits both accuracy and
point density, especially in texture-less areas. Dense-SfM addresses this limitation
by integrating dense matching with a Gaussian Splatting (GS) based track extension
which gives more consistent, longer feature tracks. To further improve reconstruction
accuracy, Dense-SfM is equipped with a multi-view kernelized matching module leveraging
transformer and Gaussian Process architectures, for robust track refinement across
multi-views. Evaluations on the ETH3D and Texture-Poor SfM datasets show that
Dense-SfM offers significant improvements in accuracy and density over state-of-the-art
methods.
'
project_page: null
paper: https://arxiv.org/pdf/2501.14277.pdf
code: null
video: null
tags:
- Point Cloud
- Poses
thumbnail: assets/thumbnails/lee2025densesfm.jpg
publication_date: '2025-01-24T06:45:12+00:00'
date_source: arxiv
- id: li2025micromacro
title: Micro-macro Wavelet-based Gaussian Splatting for 3D Reconstruction from Unconstrained
Images
authors: Yihui Li, Chengxin Lv, Hongyu Yang, Di Huang
year: '2025'
abstract: '3D reconstruction from unconstrained image collections presents substantial
challenges due to varying appearances and transient occlusions. In this paper,
we introduce Micro-macro Wavelet-based Gaussian Splatting (MW-GS), a novel approach
designed to enhance 3D reconstruction by disentangling scene representations into
global, refined, and intrinsic components. The proposed method features two key
innovations: Micro-macro Projection, which allows Gaussian points to capture details
from feature maps across multiple scales with enhanced diversity; and Wavelet-based
Sampling, which leverages frequency domain information to refine feature representations
and significantly improve the modeling of scene appearances. Additionally, we
incorporate a Hierarchical Residual Fusion Network to seamlessly integrate these
features. Extensive experiments demonstrate that MW-GS delivers state-of-the-art
rendering performance, surpassing existing methods.
'
project_page: null
paper: https://arxiv.org/pdf/2501.14231.pdf
code: null
video: null
tags:
- In the Wild
thumbnail: assets/thumbnails/li2025micromacro.jpg
publication_date: '2025-01-24T04:37:57+00:00'
date_source: arxiv
- id: yu2025hammer
title: 'HAMMER: Heterogeneous, Multi-Robot Semantic Gaussian Splatting'
authors: Javier Yu, Timothy Chen, Mac Schwager
year: '2025'
abstract: '3D Gaussian Splatting offers expressive scene reconstruction, modeling
a broad range of visual, geometric, and semantic information. However, efficient
real-time map reconstruction with data streamed from multiple robots and devices
remains a challenge. To that end, we propose HAMMER, a server-based collaborative
Gaussian Splatting method that leverages widely available ROS communication infrastructure
to generate 3D, metric-semantic maps from asynchronous robot data-streams with
no prior knowledge of initial robot positions and varying on-device pose estimators.
HAMMER consists of (i) a frame alignment module that transforms local SLAM poses
and image data into a global frame and requires no prior relative pose knowledge,
and (ii) an online module for training semantic 3DGS maps from streaming data.
HAMMER handles mixed perception modes, adjusts automatically for variations in
image pre-processing among different devices, and distills CLIP semantic codes
into the 3D scene for open-vocabulary language queries. In our real-world experiments,
HAMMER creates higher-fidelity maps (2x) compared to competing baselines and is
useful for downstream tasks, such as semantic goal-conditioned navigation (e.g.,
``go to the couch"). Accompanying content available at hammer-project.github.io.
'
project_page: https://hammer-project.github.io/
paper: https://arxiv.org/pdf/2501.14147.pdf
code: null
video: null
tags:
- Project
- Robotics
- SLAM
thumbnail: assets/thumbnails/yu2025hammer.jpg
publication_date: '2025-01-24T00:21:10+00:00'
date_source: arxiv
- id: yang2025fast3r
title: 'Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass'
authors: Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang, Ang
Cao, Joyce Chai, Franziska Meier, Matt Feiszli
year: '2025'
abstract: 'Multi-view 3D reconstruction remains a core challenge in computer vision,
particularly in applications requiring accurate and scalable representations across
diverse perspectives. Current leading methods such as DUSt3R employ a fundamentally
pairwise approach, processing images in pairs and necessitating costly global
alignment procedures to reconstruct from multiple views. In this work, we propose
Fast 3D Reconstruction (Fast3R), a novel multi-view generalization to DUSt3R that
achieves efficient and scalable 3D reconstruction by processing many views in
parallel. Fast3R''s Transformer-based architecture forwards N images in a single
forward pass, bypassing the need for iterative alignment. Through extensive experiments
on camera pose estimation and 3D reconstruction, Fast3R demonstrates state-of-the-art
performance, with significant improvements in inference speed and reduced error
accumulation. These results establish Fast3R as a robust alternative for multi-view
applications, offering enhanced scalability without compromising reconstruction
accuracy.
'
project_page: https://fast3r-3d.github.io/
paper: https://arxiv.org/pdf/2501.13928.pdf
code: null
video: null
tags:
- 3ster-based
- Project
thumbnail: assets/thumbnails/yang2025fast3r.jpg
publication_date: '2025-01-23T18:59:55+00:00'
date_source: arxiv
- id: sario2025gode
title: 'GoDe: Gaussians on Demand for Progressive Level of Detail and Scalable Compression'
authors: Francesco Di Sario, Riccardo Renzulli, Marco Grangetto, Akihiro Sugimoto,
Enzo Tartaglione
year: '2025'
abstract: '3D Gaussian Splatting enhances real-time performance in novel view synthesis
by representing scenes with mixtures of Gaussians and utilizing differentiable
rasterization. However, it typically requires large storage capacity and high
VRAM, demanding the design of effective pruning and compression techniques. Existing
methods, while effective in some scenarios, struggle with scalability and fail
to adapt models based on critical factors such as computing capabilities or bandwidth,
requiring to re-train the model under different configurations. In this work,
we propose a novel, model-agnostic technique that organizes Gaussians into several
hierarchical layers, enabling progressive Level of Detail (LoD) strategy. This
method, combined with recent approach of compression of 3DGS, allows a single
model to instantly scale across several compression ratios, with minimal to none
impact to quality compared to a single non-scalable model and without requiring
re-training. We validate our approach on typical datasets and benchmarks, showcasing
low distortion and substantial gains in terms of scalability and adaptability.
'
project_page: null
paper: https://arxiv.org/pdf/2501.13558.pdf
code: null
video: null
tags:
- Compression
- LoD
thumbnail: assets/thumbnails/sario2025gode.jpg
publication_date: '2025-01-23T11:05:45+00:00'
date_source: arxiv
- id: lan20253dgs2
title: '3DGS$^2$: Near Second-order Converging 3D Gaussian Splatting'
authors: Lei Lan, Tianjia Shao, Zixuan Lu, Yu Zhang, Chenfanfu Jiang, Yin Yang
year: '2025'
abstract: '3D Gaussian Splatting (3DGS) has emerged as a mainstream solution for
novel view synthesis and 3D reconstruction. By explicitly encoding a 3D scene
using a collection of Gaussian kernels, 3DGS achieves high-quality rendering with
superior efficiency. As a learning-based approach, 3DGS training has been dealt
with the standard stochastic gradient descent (SGD) method, which offers at most
linear convergence. Consequently, training often requires tens of minutes, even
with GPU acceleration. This paper introduces a (near) second-order convergent
training algorithm for 3DGS, leveraging its unique properties. Our approach is
inspired by two key observations. First, the attributes of a Gaussian kernel contribute
independently to the image-space loss, which endorses isolated and local optimization
algorithms. We exploit this by splitting the optimization at the level of individual
kernel attributes, analytically constructing small-size Newton systems for each
parameter group, and efficiently solving these systems on GPU threads. This achieves
Newton-like convergence per training image without relying on the global Hessian.
Second, kernels exhibit sparse and structured coupling across input images. This
property allows us to effectively utilize spatial information to mitigate overshoot
during stochastic training. Our method converges an order faster than standard
GPU-based 3DGS training, requiring over $10\times$ fewer iterations while maintaining
or surpassing the quality of the compared with the SGD-based 3DGS reconstructions.
'
project_page: null
paper: https://arxiv.org/pdf/2501.13975.pdf
code: null
video: null
tags:
- Optimization
thumbnail: assets/thumbnails/lan20253dgs2.jpg
publication_date: '2025-01-22T22:28:11+00:00'
date_source: arxiv
- id: shi2025sketch
title: 'Sketch and Patch: Efficient 3D Gaussian Representation for Man-Made Scenes'
authors: Yuang Shi, Simone Gasparini, Géraldine Morin, Chenggang Yang, Wei Tsang
Ooi
year: '2025'
abstract: '3D Gaussian Splatting (3DGS) has emerged as a promising representation
for photorealistic rendering of 3D scenes. However, its high storage requirements
pose significant challenges for practical applications. We observe that Gaussians
exhibit distinct roles and characteristics that are analogous to traditional artistic
techniques -- Like how artists first sketch outlines before filling in broader
areas with color, some Gaussians capture high-frequency features like edges and
contours; While other Gaussians represent broader, smoother regions, that are
analogous to broader brush strokes that add volume and depth to a painting. Based
on this observation, we propose a novel hybrid representation that categorizes
Gaussians into (i) Sketch Gaussians, which define scene boundaries, and (ii) Patch
Gaussians, which cover smooth regions. Sketch Gaussians are efficiently encoded
using parametric models, leveraging their geometric coherence, while Patch Gaussians
undergo optimized pruning, retraining, and vector quantization to maintain volumetric
consistency and storage efficiency. Our comprehensive evaluation across diverse
indoor and outdoor scenes demonstrates that this structure-aware approach achieves
up to 32.62% improvement in PSNR, 19.12% in SSIM, and 45.41% in LPIPS at equivalent
model sizes, and correspondingly, for an indoor scene, our model maintains the
visual quality with 2.3% of the original model size.
'
project_page: null
paper: https://arxiv.org/pdf/2501.13045.pdf
code: null
video: null
tags:
- Densification
thumbnail: assets/thumbnails/shi2025sketch.jpg
publication_date: '2025-01-22T17:52:45+00:00'
date_source: arxiv
- id: arunan2025darbsplatting
title: 'DARB-Splatting: Generalizing Splatting with Decaying Anisotropic Radial
Basis Functions'
authors: Vishagar Arunan, Saeedha Nazar, Hashiru Pramuditha, Vinasirajan Viruthshaan,
Sameera Ramasinghe, Simon Lucey, Ranga Rodrigo
year: '2025'
abstract: 'Splatting-based 3D reconstruction methods have gained popularity with
the advent of 3D Gaussian Splatting, efficiently synthesizing high-quality novel
views. These methods commonly resort to using exponential family functions, such
as the Gaussian function, as reconstruction kernels due to their anisotropic nature,
ease of projection, and differentiability in rasterization. However, the field
remains restricted to variations within the exponential family, leaving generalized
reconstruction kernels largely underexplored, partly due to the lack of easy integrability
in 3D to 2D projections. In this light, we show that a class of decaying anisotropic
radial basis functions (DARBFs), which are non-negative functions of the Mahalanobis
distance, supports splatting by approximating the Gaussian function''s closed-form
integration advantage. With this fresh perspective, we demonstrate up to 34% faster
convergence during training and a 15% reduction in memory consumption across various
DARB reconstruction kernels, while maintaining comparable PSNR, SSIM, and LPIPS
results. We will make the code available.
'
project_page: https://randomnerds.github.io/darbs.github.io/
paper: https://arxiv.org/pdf/2501.12369.pdf
code: null
video: null
tags:
- Project
- Rendering
thumbnail: assets/thumbnails/arunan2025darbsplatting.jpg
publication_date: '2025-01-21T18:49:06+00:00'
date_source: arxiv
- id: chen2025hac
title: 'HAC++: Towards 100X Compression of 3D Gaussian Splatting'
authors: Yihang Chen, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, Jianfei Cai
year: '2025'
abstract: '3D Gaussian Splatting (3DGS) has emerged as a promising framework for
novel view synthesis, boasting rapid rendering speed with high fidelity. However,
the substantial Gaussians and their associated attributes necessitate effective
compression techniques. Nevertheless, the sparse and unorganized nature of the
point cloud of Gaussians (or anchors in our paper) presents challenges for compression.
To achieve a compact size, we propose HAC++, which leverages the relationships
between unorganized anchors and a structured hash grid, utilizing their mutual
information for context modeling. Additionally, HAC++ captures intra-anchor contextual
relationships to further enhance compression performance. To facilitate entropy
coding, we utilize Gaussian distributions to precisely estimate the probability
of each quantized attribute, where an adaptive quantization module is proposed
to enable high-precision quantization of these attributes for improved fidelity
restoration. Moreover, we incorporate an adaptive masking strategy to eliminate
invalid Gaussians and anchors. Overall, HAC++ achieves a remarkable size reduction
of over 100X compared to vanilla 3DGS when averaged on all datasets, while simultaneously
improving fidelity. It also delivers more than 20X size reduction compared to
Scaffold-GS. Our code is available at https://github.com/YihangChen-ee/HAC-plus.
'
project_page: https://yihangchen-ee.github.io/project_hac++/
paper: https://arxiv.org/pdf/2501.12255.pdf
code: https://github.com/YihangChen-ee/HAC-plus
video: null
tags:
- Code
- Compression
- Project
thumbnail: assets/thumbnails/chen2025hac.jpg
publication_date: '2025-01-21T16:23:05+00:00'
date_source: arxiv
- id: li2025cargs
title: 'Car-GS: Addressing Reflective and Transparent Surface Challenges in 3D Car
Reconstruction'
authors: Congcong Li, Jin Wang, Xiaomeng Wang, Xingchen Zhou, Wei Wu, Yuzhi Zhang,
Tongyi Cao
year: '2025'
abstract: '3D car modeling is crucial for applications in autonomous driving systems,
virtual and augmented reality, and gaming. However, due to the distinctive properties
of cars, such as highly reflective and transparent surface materials, existing
methods often struggle to achieve accurate 3D car reconstruction.To address these
limitations, we propose Car-GS, a novel approach designed to mitigate the effects
of specular highlights and the coupling of RGB and geometry in 3D geometric and
shading reconstruction (3DGS). Our method incorporates three key innovations:
First, we introduce view-dependent Gaussian primitives to effectively model surface
reflections. Second, we identify the limitations of using a shared opacity parameter
for both image rendering and geometric attributes when modeling transparent objects.
To overcome this, we assign a learnable geometry-specific opacity to each 2D Gaussian
primitive, dedicated solely to rendering depth and normals. Third, we observe
that reconstruction errors are most prominent when the camera view is nearly orthogonal
to glass surfaces. To address this issue, we develop a quality-aware supervision
module that adaptively leverages normal priors from a pre-trained large-scale
normal model.Experimental results demonstrate that Car-GS achieves precise reconstruction
of car surfaces and significantly outperforms prior methods. The project page
is available at https://lcc815.github.io/Car-GS.
'
project_page: null
paper: https://arxiv.org/pdf/2501.11020.pdf
code: https://lcc815.github.io/Car-GS/
video: null
tags:
- Code
- Meshing
- Rendering
thumbnail: assets/thumbnails/li2025cargs.jpg
publication_date: '2025-01-19T11:49:35+00:00'
date_source: arxiv
- id: zheng2025gstar
title: 'GSTAR: Gaussian Surface Tracking and Reconstruction'
authors: Chengwei Zheng, Lixin Xue, Juan Zarate, Jie Song
year: '2025'
abstract: '3D Gaussian Splatting techniques have enabled efficient photo-realistic
rendering of static scenes. Recent works have extended these approaches to support
surface reconstruction and tracking. However, tracking dynamic surfaces with 3D
Gaussians remains challenging due to complex topology changes, such as surfaces
appearing, disappearing, or splitting. To address these challenges, we propose
GSTAR, a novel method that achieves photo-realistic rendering, accurate surface
reconstruction, and reliable 3D tracking for general dynamic scenes with changing
topology. Given multi-view captures as input, GSTAR binds Gaussians to mesh faces
to represent dynamic objects. For surfaces with consistent topology, GSTAR maintains
the mesh topology and tracks the meshes using Gaussians. In regions where topology
changes, GSTAR adaptively unbinds Gaussians from the mesh, enabling accurate registration
and the generation of new surfaces based on these optimized Gaussians. Additionally,
we introduce a surface-based scene flow method that provides robust initialization
for tracking between frames. Experiments demonstrate that our method effectively
tracks and reconstructs dynamic surfaces, enabling a range of applications. Our
project page with the code release is available at https://eth-ait.github.io/GSTAR/.
'
project_page: chengwei-zheng.github.io/GSTAR/
paper: https://arxiv.org/pdf/2501.10283.pdf
code: null
video: https://www.youtube.com/watch?v=Fwby4PrjFeM
tags:
- Avatar
- Dynamic
- Meshing
- Project
- Video
thumbnail: assets/thumbnails/zheng2025gstar.jpg
publication_date: '2025-01-17T16:26:24+00:00'
date_source: arxiv
- id: ma2025cityloc
title: 'CityLoc: 6 DoF Localization of Text Descriptions in Large-Scale Scenes with
Gaussian Representation'
authors: Qi Ma, Runyi Yang, Bin Ren, Ender Konukoglu, Luc Van Gool, Danda Pani Paudel
year: '2025'
abstract: 'Localizing text descriptions in large-scale 3D scenes is inherently an
ambiguous task. This nonetheless arises while describing general concepts, e.g.
all traffic lights in a city. To facilitate reasoning based on such concepts,
text localization in the form of distribution is required. In this paper, we generate
the distribution of the camera poses conditioned upon the textual description.
To facilitate such generation, we propose a diffusion-based architecture that
conditionally diffuses the noisy 6DoF camera poses to their plausible locations.
The conditional signals are derived from the text descriptions, using the pre-trained
text encoders. The connection between text descriptions and pose distribution
is established through pretrained Vision-Language-Model, i.e. CLIP. Furthermore,
we demonstrate that the candidate poses for the distribution can be further refined
by rendering potential poses using 3D Gaussian splatting, guiding incorrectly
posed samples towards locations that better align with the textual description,
through visual reasoning. We demonstrate the effectiveness of our method by
comparing it with both standard retrieval methods and learning-based approaches.
Our proposed method consistently outperforms these baselines across all five large-scale
datasets. Our source code and dataset will be made publicly available.
'
project_page: null
paper: https://arxiv.org/pdf/2501.08982.pdf
code: null
video: null
tags:
- Language Embedding
- Large-Scale
thumbnail: assets/thumbnails/ma2025cityloc.jpg
publication_date: '2025-01-15T17:59:32+00:00'
date_source: arxiv
- id: hong2025gslivo
title: 'GS-LIVO: Real-Time LiDAR, Inertial, and Visual Multi-sensor Fused Odometry
with Gaussian Mapping'
authors: Sheng Hong, Chunran Zheng, Yishu Shen, Changze Li, Fu Zhang, Tong Qin,
Shaojie Shen
year: '2025'
abstract: 'In recent years, 3D Gaussian splatting (3D-GS) has emerged as a novel
scene representation approach. However, existing vision-only 3D-GS methods often
rely on hand-crafted heuristics for point-cloud densification and face challenges
in handling occlusions and high GPU memory and computation consumption. LiDAR-Inertial-Visual
(LIV) sensor configuration has demonstrated superior performance in localization
and dense mapping by leveraging complementary sensing characteristics: rich texture
information from cameras, precise geometric measurements from LiDAR, and high-frequency
motion data from IMU. Inspired by this, we propose a novel real-time Gaussian-based
simultaneous localization and mapping (SLAM) system. Our map system comprises
a global Gaussian map and a sliding window of Gaussians, along with an IESKF-based
odometry. The global Gaussian map consists of hash-indexed voxels organized in
a recursive octree, effectively covering sparse spatial volumes while adapting
to different levels of detail and scales. The Gaussian map is initialized through
multi-sensor fusion and optimized with photometric gradients. Our system incrementally
maintains a sliding window of Gaussians, significantly reducing GPU computation
and memory consumption by only optimizing the map within the sliding window. Moreover,
we implement a tightly coupled multi-sensor fusion odometry with an iterative
error state Kalman filter (IESKF), leveraging real-time updating and rendering
of the Gaussian map. Our system represents the first real-time Gaussian-based
SLAM framework deployable on resource-constrained embedded systems, demonstrated
on the NVIDIA Jetson Orin NX platform. The framework achieves real-time performance
while maintaining robust multi-sensor fusion capabilities. All implementation
algorithms, hardware designs, and CAD models will be publicly available.
'
project_page: null
paper: https://arxiv.org/pdf/2501.08672.pdf
code: null
video: null
tags:
- Large-Scale
- Lidar
thumbnail: assets/thumbnails/hong2025gslivo.jpg
publication_date: '2025-01-15T09:04:56+00:00'
date_source: arxiv
- id: wu2025vingsmono
title: 'VINGS-Mono: Visual-Inertial Gaussian Splatting Monocular SLAM in Large Scenes'
authors: Ke Wu, Zicheng Zhang, Muer Tie, Ziqing Ai, Zhongxue Gan, Wenchao Ding
year: '2025'
abstract: 'VINGS-Mono is a monocular (inertial) Gaussian Splatting (GS) SLAM framework
designed for large scenes. The framework comprises four main components: VIO Front
End, 2D Gaussian Map, NVS Loop Closure, and Dynamic Eraser. In the VIO Front End,
RGB frames are processed through dense bundle adjustment and uncertainty estimation
to extract scene geometry and poses. Based on this output, the mapping module
incrementally constructs and maintains a 2D Gaussian map. Key components of the
2D Gaussian Map include a Sample-based Rasterizer, Score Manager, and Pose Refinement,
which collectively improve mapping speed and localization accuracy. This enables
the SLAM system to handle large-scale urban environments with up to 50 million
Gaussian ellipsoids. To ensure global consistency in large-scale scenes, we design
a Loop Closure module, which innovatively leverages the Novel View Synthesis (NVS)
capabilities of Gaussian Splatting for loop closure detection and correction of
the Gaussian map. Additionally, we propose a Dynamic Eraser to address the inevitable
presence of dynamic objects in real-world outdoor scenes. Extensive evaluations
in indoor and outdoor environments demonstrate that our approach achieves localization
performance on par with Visual-Inertial Odometry while surpassing recent GS/NeRF
SLAM methods. It also significantly outperforms all existing methods in terms
of mapping and rendering quality. Furthermore, we developed a mobile app and verified
that our framework can generate high-quality Gaussian maps in real time using
only a smartphone camera and a low-frequency IMU sensor. To the best of our knowledge,
VINGS-Mono is the first monocular Gaussian SLAM method capable of operating in
outdoor environments and supporting kilometer-scale large scenes.
'
project_page: null
paper: https://arxiv.org/pdf/2501.08286.pdf
code: null
video: null
tags:
- Large-Scale
- Meshing
- SLAM
thumbnail: assets/thumbnails/wu2025vingsmono.jpg
publication_date: '2025-01-14T18:01:15+00:00'
date_source: arxiv
- id: rogge2025objectcentric
title: 'Object-Centric 2D Gaussian Splatting: Background Removal and Occlusion-Aware
Pruning for Compact Object Models'
authors: Marcel Rogge, Didier Stricker
year: '2025'
abstract: 'Current Gaussian Splatting approaches are effective for reconstructing
entire scenes but lack the option to target specific objects, making them computationally
expensive and unsuitable for object-specific applications. We propose a novel
approach that leverages object masks to enable targeted reconstruction, resulting
in object-centric models. Additionally, we introduce an occlusion-aware pruning
strategy to minimize the number of Gaussians without compromising quality. Our
method reconstructs compact object models, yielding object-centric Gaussian and
mesh representations that are up to 96\% smaller and up to 71\% faster to train
compared to the baseline while retaining competitive quality. These representations
are immediately usable for downstream applications such as appearance editing
and physics simulation without additional processing.
'
project_page: null
paper: https://arxiv.org/pdf/2501.08174.pdf
code: null
video: null
tags:
- Compression
- Densification
- Editing
thumbnail: assets/thumbnails/rogge2025objectcentric.jpg
publication_date: '2025-01-14T14:56:31+00:00'
date_source: arxiv
- id: liu2025uncommon
title: UnCommon Objects in 3D
authors: Xingchen Liu, Piyush Tayal, Jianyuan Wang, Jesus Zarzar, Tom Monnier, Konstantinos
Tertikas, Jiali Duan, Antoine Toisoul, Jason Y. Zhang, Natalia Neverova, Andrea
Vedaldi, Roman Shapovalov, David Novotny
year: '2025'
abstract: 'We introduce Uncommon Objects in 3D (uCO3D), a new object-centric dataset
for 3D deep learning and 3D generative AI. uCO3D is the largest publicly-available
collection of high-resolution videos of objects with 3D annotations that ensures
full-360$^{\circ}$ coverage. uCO3D is significantly more diverse than MVImgNet
and CO3Dv2, covering more than 1,000 object categories. It is also of higher quality,
due to extensive quality checks of both the collected videos and the 3D annotations.
Similar to analogous datasets, uCO3D contains annotations for 3D camera poses,
depth maps and sparse point clouds. In addition, each object is equipped with
a caption and a 3D Gaussian Splat reconstruction. We train several large 3D models
on MVImgNet, CO3Dv2, and uCO3D and obtain superior results using the latter, showing
that uCO3D is better for learning applications.
'
project_page: https://uco3d.github.io/
paper: https://arxiv.org/pdf/2501.07574.pdf
code: https://github.com/facebookresearch/uco3d
video: null
tags:
- Code
- Project
thumbnail: assets/thumbnails/liu2025uncommon.jpg
publication_date: '2025-01-13T18:59:20+00:00'
date_source: arxiv
- id: stuart20253dgstopc
title: '3DGS-to-PC: Convert a 3D Gaussian Splatting Scene into a Dense Point Cloud
or Mesh'
authors: Lewis A G Stuart, Michael P Pound
year: '2025'
abstract: '3D Gaussian Splatting (3DGS) excels at producing highly detailed 3D reconstructions,
but these scenes often require specialised renderers for effective visualisation.
In contrast, point clouds are a widely used 3D representation and are compatible
with most popular 3D processing software, yet converting 3DGS scenes into point
clouds is a complex challenge. In this work we introduce 3DGS-to-PC, a flexible
and highly customisable framework that is capable of transforming 3DGS scenes
into dense, high-accuracy point clouds. We sample points probabilistically from
each Gaussian as a 3D density function. We additionally threshold new points using
the Mahalanobis distance to the Gaussian centre, preventing extreme outliers.
The result is a point cloud that closely represents the shape encoded into the
3D Gaussian scene. Individual Gaussians use spherical harmonics to adapt colours
depending on view, and each point may contribute only subtle colour hints to the
resulting rendered scene. To avoid spurious or incorrect colours that do not fit
with the final point cloud, we recalculate Gaussian colours via a customised image
rendering approach, assigning each Gaussian the colour of the pixel to which it
contributes most across all views. 3DGS-to-PC also supports mesh generation through
Poisson Surface Reconstruction, applied to points sampled from predicted surface
Gaussians. This allows coloured meshes to be generated from 3DGS scenes without
the need for re-training. This package is highly customisable and capability of
simple integration into existing 3DGS pipelines. 3DGS-to-PC provides a powerful
tool for converting 3DGS data into point cloud and surface-based formats.
'
project_page: null
paper: https://arxiv.org/pdf/2501.07478.pdf
code: https://github.com/Lewis-Stuart-11/3DGS-to-PC
video: null
tags:
- Code
- Point Cloud
thumbnail: assets/thumbnails/stuart20253dgstopc.jpg
publication_date: '2025-01-13T16:52:28+00:00'
date_source: arxiv
- id: zhang2025evaluating
title: 'Evaluating Human Perception of Novel View Synthesis: Subjective Quality
Assessment of Gaussian Splatting and NeRF in Dynamic Scenes'
authors: Yuhang Zhang, Joshua Maraval, Zhengyu Zhang, Nicolas Ramin, Shishun Tian,
Lu Zhang
year: '2025'
abstract: 'Gaussian Splatting (GS) and Neural Radiance Fields (NeRF) are two groundbreaking
technologies that have revolutionized the field of Novel View Synthesis (NVS),
enabling immersive photorealistic rendering and user experiences by synthesizing
multiple viewpoints from a set of images of sparse views. The potential applications
of NVS, such as high-quality virtual and augmented reality, detailed 3D modeling,
and realistic medical organ imaging, underscore the importance of quality assessment
of NVS methods from the perspective of human perception. Although some previous
studies have explored subjective quality assessments for NVS technology, they
still face several challenges, especially in NVS methods selection, scenario coverage,
and evaluation methodology. To address these challenges, we conducted two subjective
experiments for the quality assessment of NVS technologies containing both GS-based
and NeRF-based methods, focusing on dynamic and real-world scenes. This study
covers 360{\deg}, front-facing, and single-viewpoint videos while providing a
richer and greater number of real scenes. Meanwhile, it''s the first time to explore
the impact of NVS methods in dynamic scenes with moving objects. The two types
of subjective experiments help to fully comprehend the influences of different
viewing paths from a human perception perspective and pave the way for future
development of full-reference and no-reference quality metrics. In addition, we
established a comprehensive benchmark of various state-of-the-art objective metrics
on the proposed database, highlighting that existing methods still struggle to
accurately capture subjective quality. The results give us some insights into
the limitations of existing NVS methods and may promote the development of new
NVS methods.
'
project_page: null
paper: https://arxiv.org/pdf/2501.08072.pdf
code: null
video: null
tags:
- Dynamic
thumbnail: assets/thumbnails/zhang2025evaluating.jpg
publication_date: '2025-01-13T10:01:27+00:00'
date_source: arxiv
- id: peng2025rmavatar
title: 'RMAvatar: Photorealistic Human Avatar Reconstruction from Monocular Video
Based on Rectified Mesh-embedded Gaussians'
authors: Sen Peng, Weixing Xie, Zilong Wang, Xiaohu Guo, Zhonggui Chen, Baorong
Yang, Xiao Dong
year: '2025'
abstract: 'We introduce RMAvatar, a novel human avatar representation with Gaussian
splatting embedded on mesh to learn clothed avatar from a monocular video. We
utilize the explicit mesh geometry to represent motion and shape of a virtual
human and implicit appearance rendering with Gaussian Splatting. Our method consists
of two main modules: Gaussian initialization module and Gaussian rectification
module. We embed Gaussians into triangular faces and control their motion through
the mesh, which ensures low-frequency motion and surface deformation of the avatar.
Due to the limitations of LBS formula, the human skeleton is hard to control complex
non-rigid transformations. We then design a pose-related Gaussian rectification
module to learn fine-detailed non-rigid deformations, further improving the realism
and expressiveness of the avatar. We conduct extensive experiments on public datasets,
RMAvatar shows state-of-the-art performance on both rendering quality and quantitative
evaluations. Please see our project page at https://rm-avatar.github.io.
'
project_page: https://rm-avatar.github.io/
paper: https://arxiv.org/pdf/2501.07104.pdf
code: https://github.com/RMAvatar/RMAvatar
video: null
tags:
- Avatar
- Code
- Dynamic
- Meshing
- Monocular
- Project
thumbnail: assets/thumbnails/peng2025rmavatar.jpg
publication_date: '2025-01-13T07:32:44+00:00'
date_source: arxiv
- id: zielonka2025synthetic
title: Synthetic Prior for Few-Shot Drivable Head Avatar Inversion
authors: Wojciech Zielonka, Stephan J. Garbin, Alexandros Lattas, George Kopanas,
Paulo Gotardo, Thabo Beeler, Justus Thies, Timo Bolkart
year: '2025'
abstract: 'We present SynShot, a novel method for the few-shot inversion of a drivable
head avatar based on a synthetic prior. We tackle two major challenges. First,
training a controllable 3D generative network requires a large number of diverse
sequences, for which pairs of images and high-quality tracked meshes are not always
available. Second, state-of-the-art monocular avatar models struggle to generalize
to new views and expressions, lacking a strong prior and often overfitting to
a specific viewpoint distribution. Inspired by machine learning models trained
solely on synthetic data, we propose a method that learns a prior model from a
large dataset of synthetic heads with diverse identities, expressions, and viewpoints.
With few input images, SynShot fine-tunes the pretrained synthetic prior to bridge
the domain gap, modeling a photorealistic head avatar that generalizes to novel
expressions and viewpoints. We model the head avatar using 3D Gaussian splatting
and a convolutional encoder-decoder that outputs Gaussian parameters in UV texture
space. To account for the different modeling complexities over parts of the head
(e.g., skin vs hair), we embed the prior with explicit control for upsampling
the number of per-part primitives. Compared to state-of-the-art monocular methods
that require thousands of real training images, SynShot significantly improves
novel view and expression synthesis.
'
project_page: https://zielon.github.io/synshot/
paper: https://arxiv.org/pdf/2501.06903.pdf
code: null
video: https://www.youtube.com/watch?v=4KQQatkaSgc
tags:
- Avatar
- Dynamic
- Project
- Sparse
- Video
thumbnail: assets/thumbnails/zielonka2025synthetic.jpg
publication_date: '2025-01-12T19:01:05+00:00'
date_source: arxiv
- id: chen2025generalized
title: Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution
authors: Du Chen, Liyi Chen, Zhengqiang Zhang, Lei Zhang
year: '2025'
abstract: 'Equipped with the continuous representation capability of Multi-Layer
Perceptron (MLP), Implicit Neural Representation (INR) has been successfully employed
for Arbitrary-scale Super-Resolution (ASR). However, the limited receptive field
of the linear layers in MLP restricts the representation capability of INR, while
it is computationally expensive to query the MLP numerous times to render each
pixel. Recently, Gaussian Splatting (GS) has shown its advantages over INR in
both visual quality and rendering speed in 3D tasks, which motivates us to explore
whether GS can be employed for the ASR task. However, directly applying GS to
ASR is exceptionally challenging because the original GS is an optimization-based
method through overfitting each single scene, while in ASR we aim to learn a single
model that can generalize to different images and scaling factors. We overcome
these challenges by developing two novel techniques. Firstly, to generalize GS
for ASR, we elaborately design an architecture to predict the corresponding image-conditioned
Gaussians of the input low-resolution image in a feed-forward manner. Secondly,
we implement an efficient differentiable 2D GPU/CUDA-based scale-aware rasterization
to render super-resolved images by sampling discrete RGB values from the predicted
contiguous Gaussians. Via end-to-end training, our optimized network, namely GSASR,
can perform ASR for any image and unseen scaling factors. Extensive experiments
validate the effectiveness of our proposed method. The project page can be found
at \url{https://mt-cly.github.io/GSASR.github.io/}.
'
project_page: https://mt-cly.github.io/GSASR.github.io/
paper: https://arxiv.org/pdf/2501.06838.pdf
code: null
video: null
tags:
- Project
- Super Resolution
thumbnail: assets/thumbnails/chen2025generalized.jpg
publication_date: '2025-01-12T15:14:58+00:00'
date_source: arxiv
- id: wang2025f3dgaus
title: 'F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Consistent
Gaussian Splatting'
authors: Yuxin Wang, Qianyi Wu, Dan Xu
year: '2025'
abstract: 'This paper tackles the problem of generalizable 3D-aware generation from
monocular datasets, e.g., ImageNet. The key challenge of this task is learning
a robust 3D-aware representation without multi-view or dynamic data, while ensuring
consistent texture and geometry across different viewpoints. Although some baseline
methods are capable of 3D-aware generation, the quality of the generated images
still lags behind state-of-the-art 2D generation approaches, which excel in producing
high-quality, detailed images. To address this severe limitation, we propose a
novel feed-forward pipeline based on pixel-aligned Gaussian Splatting, coined
as F3D-Gaus, which can produce more realistic and reliable 3D renderings from
monocular inputs. In addition, we introduce a self-supervised cycle-consistent
constraint to enforce cross-view consistency in the learned 3D representation.
This training strategy naturally allows aggregation of multiple aligned Gaussian
primitives and significantly alleviates the interpolation limitations inherent
in single-view pixel-aligned Gaussian Splatting. Furthermore, we incorporate video
model priors to perform geometry-aware refinement, enhancing the generation of
fine details in wide-viewpoint scenarios and improving the model''s capability
to capture intricate 3D textures. Extensive experiments demonstrate that our approach
not only achieves high-quality, multi-view consistent 3D-aware generation from
monocular datasets, but also significantly improves training and inference efficiency.
'
project_page: https://arxiv.org/abs/2501.06714
paper: https://arxiv.org/pdf/2501.06714.pdf
code: https://github.com/W-Ted/F3D-Gaus
video: null
tags:
- Code
- Feed-Forward
- Monocular
- Project
thumbnail: assets/thumbnails/wang2025f3dgaus.jpg
publication_date: '2025-01-12T04:44:44+00:00'
date_source: arxiv
- id: asim2025met3r
title: 'MEt3R: Measuring Multi-View Consistency in Generated Images'
authors: Mohammad Asim, Christopher Wewer, Thomas Wimmer, Bernt Schiele, Jan Eric
Lenssen
year: '2025'
abstract: 'We introduce MEt3R, a metric for multi-view consistency in generated
images. Large-scale generative models for multi-view image generation are rapidly
advancing the field of 3D inference from sparse observations. However, due to
the nature of generative modeling, traditional reconstruction metrics are not
suitable to measure the quality of generated outputs and metrics that are independent
of the sampling procedure are desperately needed. In this work, we specifically
address the aspect of consistency between generated multi-view images, which can
be evaluated independently of the specific scene. Our approach uses DUSt3R to
obtain dense 3D reconstructions from image pairs in a feed-forward manner, which
are used to warp image contents from one view into the other. Then, feature maps
of these images are compared to obtain a similarity score that is invariant to
view-dependent effects. Using MEt3R, we evaluate the consistency of a large set
of previous methods for novel view and video generation, including our open, multi-view
latent diffusion model.
'
project_page: https://geometric-rl.mpi-inf.mpg.de/met3r/
paper: https://arxiv.org/pdf/2501.06336.pdf
code: https://github.com/mohammadasim98/MEt3R
video: https://geometric-rl.mpi-inf.mpg.de/met3r/static/videos/teaser.mp4
tags:
- 3ster-based
- Code
- Diffusion
- Project
- Video
thumbnail: assets/thumbnails/asim2025met3r.jpg
publication_date: '2025-01-10T20:43:33+00:00'
date_source: arxiv
- id: shin2025localityaware
title: Locality-aware Gaussian Compression for Fast and High-quality Rendering
authors: Seungjoo Shin, Jaesik Park, Sunghyun Cho
year: '2025'
abstract: 'We present LocoGS, a locality-aware 3D Gaussian Splatting (3DGS) framework
that exploits the spatial coherence of 3D Gaussians for compact modeling of volumetric
scenes. To this end, we first analyze the local coherence of 3D Gaussian attributes,
and propose a novel locality-aware 3D Gaussian representation that effectively
encodes locally-coherent Gaussian attributes using a neural field representation
with a minimal storage requirement. On top of the novel representation, LocoGS
is carefully designed with additional components such as dense initialization,
an adaptive spherical harmonics bandwidth scheme and different encoding schemes
for different Gaussian attributes to maximize compression performance. Experimental
results demonstrate that our approach outperforms the rendering quality of existing
compact Gaussian representations for representative real-world 3D datasets while
achieving from 54.6$\times$ to 96.6$\times$ compressed storage size and from 2.1$\times$
to 2.4$\times$ rendering speed than 3DGS. Even our approach also demonstrates
an averaged 2.4$\times$ higher rendering speed than the state-of-the-art compression
method with comparable compression performance.
'
project_page: null
paper: https://arxiv.org/pdf/2501.05757.pdf
code: null
video: null
tags:
- Compression
thumbnail: assets/thumbnails/shin2025localityaware.jpg
publication_date: '2025-01-10T07:19:41+00:00'
date_source: arxiv
- id: yan2025consistent
title: Consistent Flow Distillation for Text-to-3D Generation
authors: Runjie Yan, Yinbo Chen, Xiaolong Wang
year: '2025'
abstract: 'Score Distillation Sampling (SDS) has made significant strides in distilling
image-generative models for 3D generation. However, its maximum-likelihood-seeking
behavior often leads to degraded visual quality and diversity, limiting its effectiveness
in 3D applications. In this work, we propose Consistent Flow Distillation (CFD),
which addresses these limitations. We begin by leveraging the gradient of the
diffusion ODE or SDE sampling process to guide the 3D generation. From the gradient-based
sampling perspective, we find that the consistency of 2D image flows across different
viewpoints is important for high-quality 3D generation. To achieve this, we introduce
multi-view consistent Gaussian noise on the 3D object, which can be rendered from
various viewpoints to compute the flow gradient. Our experiments demonstrate that
CFD, through consistent flows, significantly outperforms previous methods in text-to-3D
generation.
'
project_page: https://runjie-yan.github.io/cfd/
paper: https://arxiv.org/pdf/2501.05445.pdf
code: https://github.com/runjie-yan/ConsistentFlowDistillation
video: null
tags:
- Code
- Diffusion
- Project
thumbnail: assets/thumbnails/yan2025consistent.jpg
publication_date: '2025-01-09T18:56:05+00:00'
date_source: arxiv
- id: meng2025zero1tog
title: 'Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation'
authors: Xuyi Meng, Chen Wang, Jiahui Lei, Kostas Daniilidis, Jiatao Gu, Lingjie
Liu
year: '2025'
abstract: 'Recent advances in 2D image generation have achieved remarkable quality,largely
driven by the capacity of diffusion models and the availability of large-scale
datasets. However, direct 3D generation is still constrained by the scarcity and
lower fidelity of 3D datasets. In this paper, we introduce Zero-1-to-G, a novel
approach that addresses this problem by enabling direct single-view generation
on Gaussian splats using pretrained 2D diffusion models. Our key insight is that
Gaussian splats, a 3D representation, can be decomposed into multi-view images
encoding different attributes. This reframes the challenging task of direct 3D
generation within a 2D diffusion framework, allowing us to leverage the rich priors
of pretrained 2D diffusion models. To incorporate 3D awareness, we introduce cross-view
and cross-attribute attention layers, which capture complex correlations and enforce
3D consistency across generated splats. This makes Zero-1-to-G the first direct
image-to-3D generative model to effectively utilize pretrained 2D diffusion priors,
enabling efficient training and improved generalization to unseen objects. Extensive
experiments on both synthetic and in-the-wild datasets demonstrate superior performance
in 3D object generation, offering a new approach to high-quality 3D generation.
'
project_page: https://mengxuyigit.github.io/projects/zero-1-to-G/
paper: https://arxiv.org/pdf/2501.05427.pdf
code: null
video: null
tags:
- Diffusion
- Project
thumbnail: assets/thumbnails/meng2025zero1tog.jpg
publication_date: '2025-01-09T18:37:35+00:00'
date_source: arxiv
- id: gerogiannis2025arc2avatar
title: 'Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID
Guidance'
authors: Dimitrios Gerogiannis, Foivos Paraperas Papantoniou, Rolandos Alexandros
Potamias, Alexandros Lattas, Stefanos Zafeiriou
year: '2025'
abstract: 'Inspired by the effectiveness of 3D Gaussian Splatting (3DGS) in reconstructing
detailed 3D scenes within multi-view setups and the emergence of large 2D human
foundation models, we introduce Arc2Avatar, the first SDS-based method utilizing
a human face foundation model as guidance with just a single image as input. To
achieve that, we extend such a model for diverse-view human head generation by
fine-tuning on synthetic data and modifying its conditioning. Our avatars maintain
a dense correspondence with a human face mesh template, allowing blendshape-based
expression generation. This is achieved through a modified 3DGS approach, connectivity
regularizers, and a strategic initialization tailored for our task. Additionally,
we propose an optional efficient SDS-based correction step to refine the blendshape
expressions, enhancing realism and diversity. Experiments demonstrate that Arc2Avatar
achieves state-of-the-art realism and identity preservation, effectively addressing
color issues by allowing the use of very low guidance, enabled by our strong identity
prior and initialization strategy, without compromising detail.
'
project_page: null
paper: https://arxiv.org/pdf/2501.05379.pdf
code: null
video: null
tags:
- Avatar
- Diffusion
thumbnail: assets/thumbnails/gerogiannis2025arc2avatar.jpg
publication_date: '2025-01-09T17:04:33+00:00'
date_source: arxiv
- id: tianci2025scaffoldslam
title: 'Scaffold-SLAM: Structured 3D Gaussians for Simultaneous Localization and
Photorealistic Mapping'
authors: Wen Tianci, Liu Zhiang, Lu Biao, Fang Yongchun
year: '2025'
abstract: '3D Gaussian Splatting (3DGS) has recently revolutionized novel view synthesis
in the Simultaneous Localization and Mapping (SLAM). However, existing SLAM methods
utilizing 3DGS have failed to provide high-quality novel view rendering for monocular,
stereo, and RGB-D cameras simultaneously. Notably, some methods perform well for
RGB-D cameras but suffer significant degradation in rendering quality for monocular
cameras. In this paper, we present Scaffold-SLAM, which delivers simultaneous
localization and high-quality photorealistic mapping across monocular, stereo,
and RGB-D cameras. We introduce two key innovations to achieve this state-of-the-art
visual quality. First, we propose Appearance-from-Motion embedding, enabling 3D
Gaussians to better model image appearance variations across different camera
poses. Second, we introduce a frequency regularization pyramid to guide the distribution
of Gaussians, allowing the model to effectively capture finer details in the scene.
Extensive experiments on monocular, stereo, and RGB-D datasets demonstrate that
Scaffold-SLAM significantly outperforms state-of-the-art methods in photorealistic
mapping quality, e.g., PSNR is 16.76% higher in the TUM RGB-D datasets for monocular
cameras.
'
project_page: null
paper: https://arxiv.org/pdf/2501.05242.pdf
code: null
video: null
tags:
- SLAM
thumbnail: assets/thumbnails/tianci2025scaffoldslam.jpg
publication_date: '2025-01-09T13:50:26+00:00'
date_source: arxiv
- id: bond2025gaussianvideo
title: 'GaussianVideo: Efficient Video Representation via Hierarchical Gaussian
Splatting'
authors: Andrew Bond, Jui-Hsien Wang, Long Mai, Erkut Erdem, Aykut Erdem
year: '2025'
abstract: 'Efficient neural representations for dynamic video scenes are critical
for applications ranging from video compression to interactive simulations. Yet,
existing methods often face challenges related to high memory usage, lengthy training
times, and temporal consistency. To address these issues, we introduce a novel
neural video representation that combines 3D Gaussian splatting with continuous
camera motion modeling. By leveraging Neural ODEs, our approach learns smooth
camera trajectories while maintaining an explicit 3D scene representation through
Gaussians. Additionally, we introduce a spatiotemporal hierarchical learning strategy,
progressively refining spatial and temporal features to enhance reconstruction
quality and accelerate convergence. This memory-efficient approach achieves high-quality
rendering at impressive speeds. Experimental results show that our hierarchical
learning, combined with robust camera motion modeling, captures complex dynamic
scenes with strong temporal consistency, achieving state-of-the-art performance
across diverse video datasets in both high- and low-motion scenarios.
'
project_page: https://cyberiada.github.io/GaussianVideo/
paper: https://arxiv.org/pdf/2501.04782.pdf
code: null
video: null
tags:
- Gaussian Video
- Project
- Video
thumbnail: assets/thumbnails/bond2025gaussianvideo.jpg
publication_date: '2025-01-08T19:01:12+00:00'
date_source: arxiv
- id: huang2025fatesgs
title: 'FatesGS: Fast and Accurate Sparse-View Surface Reconstruction using Gaussian
Splatting with Depth-Feature Consistency'
authors: Han Huang, Yulun Wu, Chao Deng, Ge Gao, Ming Gu, Yu-Shen Liu
year: '2025'
abstract: 'Recently, Gaussian Splatting has sparked a new trend in the field of
computer vision. Apart from novel view synthesis, it has also been extended to
the area of multi-view reconstruction. The latest methods facilitate complete,
detailed surface reconstruction while ensuring fast training speed. However, these
methods still require dense input views, and their output quality significantly
degrades with sparse views. We observed that the Gaussian primitives tend to overfit
the few training views, leading to noisy floaters and incomplete reconstruction
surfaces. In this paper, we present an innovative sparse-view reconstruction framework
that leverages intra-view depth and multi-view feature consistency to achieve
remarkably accurate surface reconstruction. Specifically, we utilize monocular
depth ranking information to supervise the consistency of depth distribution within
patches and employ a smoothness loss to enhance the continuity of the distribution.
To achieve finer surface reconstruction, we optimize the absolute position of
depth through multi-view projection features. Extensive experiments on DTU and
BlendedMVS demonstrate that our method outperforms state-of-the-art methods with
a speedup of 60x to 200x, achieving swift and fine-grained mesh reconstruction
without the need for costly pre-training.
'
project_page: https://alvin528.github.io/FatesGS/
paper: https://arxiv.org/pdf/2501.04628.pdf
code: null
video: null
tags:
- Meshing
- Project
- Sparse
thumbnail: assets/thumbnails/huang2025fatesgs.jpg
publication_date: '2025-01-08T17:19:35+00:00'
date_source: arxiv
- id: kwak2025modecgs
title: 'MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment
for Compact Dynamic 3D Gaussian Splatting'
authors: Sangwoon Kwak, Joonsoo Kim, Jun Young Jeong, Won-Sik Cheong, Jihyong Oh,
Munchurl Kim
year: '2025'
abstract: '3D Gaussian Splatting (3DGS) has made significant strides in scene representation
and neural rendering, with intense efforts focused on adapting it for dynamic
scenes. Despite delivering remarkable rendering quality and speed, existing methods
struggle with storage demands and representing complex real-world motions. To
tackle these issues, we propose MoDecGS, a memory-efficient Gaussian splatting
framework designed for reconstructing novel views in challenging scenarios with
complex motions. We introduce GlobaltoLocal Motion Decomposition (GLMD) to effectively
capture dynamic motions in a coarsetofine manner. This approach leverages Global
Canonical Scaffolds (Global CS) and Local Canonical Scaffolds (Local CS), extending
static Scaffold representation to dynamic video reconstruction. For Global CS,
we propose Global Anchor Deformation (GAD) to efficiently represent global dynamics
along complex motions, by directly deforming the implicit Scaffold attributes
which are anchor position, offset, and local context features. Next, we finely
adjust local motions via the Local Gaussian Deformation (LGD) of Local CS explicitly.
Additionally, we introduce Temporal Interval Adjustment (TIA) to automatically
control the temporal coverage of each Local CS during training, allowing MoDecGS
to find optimal interval assignments based on the specified number of temporal
segments. Extensive evaluations demonstrate that MoDecGS achieves an average 70%
reduction in model size over stateoftheart methods for dynamic 3D Gaussians from
realworld dynamic videos while maintaining or even improving rendering quality.
'
project_page: 'MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval
Adjustment for Compact Dynamic 3D Gaussian Splatting'
paper: https://arxiv.org/pdf/2501.03714.pdf
code: null
video: https://youtu.be/5L6gzc5-cw8?si=L6v6XLZFQrYK50iV
tags:
- Compression
- Dynamic
- Project
- Video
thumbnail: assets/thumbnails/kwak2025modecgs.jpg
publication_date: '2025-01-07T11:43:13+00:00'
date_source: arxiv
- id: yu2025dehazegs
title: 'DehazeGS: Seeing Through Fog with 3D Gaussian Splatting'
authors: Jinze Yu, Yiqun Wang, Zhengda Lu, Jianwei Guo, Yong Li, Hongxing Qin, Xiaopeng
Zhang
year: '2025'
abstract: 'Current novel view synthesis tasks primarily rely on high-quality and
clear images. However, in foggy scenes, scattering and attenuation can significantly
degrade the reconstruction and rendering quality. Although NeRF-based dehazing
reconstruction algorithms have been developed, their use of deep fully connected
neural networks and per-ray sampling strategies leads to high computational costs.
Moreover, NeRF''s implicit representation struggles to recover fine details from
hazy scenes. In contrast, recent advancements in 3D Gaussian Splatting achieve
high-quality 3D scene reconstruction by explicitly modeling point clouds into
3D Gaussians. In this paper, we propose leveraging the explicit Gaussian representation
to explain the foggy image formation process through a physically accurate forward
rendering process. We introduce DehazeGS, a method capable of decomposing and
rendering a fog-free background from participating media using only muti-view
foggy images as input. We model the transmission within each Gaussian distribution
to simulate the formation of fog. During this process, we jointly learn the atmospheric
light and scattering coefficient while optimizing the Gaussian representation
of the hazy scene. In the inference stage, we eliminate the effects of scattering
and attenuation on the Gaussians and directly project them onto a 2D plane to
obtain a clear view. Experiments on both synthetic and real-world foggy datasets
demonstrate that DehazeGS achieves state-of-the-art performance in terms of both
rendering quality and computational efficiency.
'
project_page: null
paper: https://arxiv.org/pdf/2501.03659.pdf
code: null
video: null
tags:
- In the Wild
- Rendering
thumbnail: assets/thumbnails/yu2025dehazegs.jpg
publication_date: '2025-01-07T09:47:46+00:00'
date_source: arxiv
- id: lee2025compression
title: Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard
Video Codecs
authors: Soonbin Lee, Fangwen Shu, Yago Sanchez, Thomas Schierl, Cornelius Hellge
year: '2025'
abstract: '3D Gaussian Splatting is a recognized method for 3D scene representation,
known for its high rendering quality and speed. However, its substantial data
requirements present challenges for practical applications. In this paper, we
introduce an efficient compression technique that significantly reduces storage
overhead by using compact representation. We propose a unified architecture that
combines point cloud data and feature planes through a progressive tri-plane structure.
Our method utilizes 2D feature planes, enabling continuous spatial representation.
To further optimize these representations, we incorporate entropy modeling in
the frequency domain, specifically designed for standard video codecs. We also
propose channel-wise bit allocation to achieve a better trade-off between bitrate
consumption and feature plane representation. Consequently, our model effectively
leverages spatial correlations within the feature planes to enhance rate-distortion
performance using standard, non-differentiable video codecs. Experimental results
demonstrate that our method outperforms existing methods in data compactness while
maintaining high rendering quality. Our project page is available at https://fraunhoferhhi.github.io/CodecGS
'
project_page: null
paper: https://arxiv.org/pdf/2501.03399.pdf
code: null
video: null
tags:
- Compression
thumbnail: assets/thumbnails/lee2025compression.jpg
publication_date: '2025-01-06T21:37:30+00:00'
date_source: arxiv
- id: rajasegaran2025gaussian
title: Gaussian Masked Autoencoders
authors: Jathushan Rajasegaran, Xinlei Chen, Rulilong Li, Christoph Feichtenhofer,
Jitendra Malik, Shiry Ginosar
year: '2025'
abstract: 'This paper explores Masked Autoencoders (MAE) with Gaussian Splatting.
While reconstructive self-supervised learning frameworks such as MAE learns good
semantic abstractions, it is not trained for explicit spatial awareness. Our approach,
named Gaussian Masked Autoencoder, or GMAE, aims to learn semantic abstractions
and spatial understanding jointly. Like MAE, it reconstructs the image end-to-end
in the pixel space, but beyond MAE, it also introduces an intermediate, 3D Gaussian-based
representation and renders images via splatting. We show that GMAE can enable
various zero-shot learning capabilities of spatial understanding (e.g., figure-ground
segmentation, image layering, edge detection, etc.) while preserving the high-level
semantics of self-supervised representation quality from MAE. To our knowledge,
we are the first to employ Gaussian primitives in an image representation learning
framework beyond optimization-based single-scene reconstructions. We believe GMAE
will inspire further research in this direction and contribute to developing next-generation
techniques for modeling high-fidelity visual data. More details at https://brjathu.github.io/gmae
'
project_page: null
paper: https://arxiv.org/pdf/2501.03229.pdf
code: https://github.com/darshanmakwana412/gaussian-mae
video: null
tags:
- Code
- Transformer
thumbnail: assets/thumbnails/rajasegaran2025gaussian.jpg
publication_date: '2025-01-06T18:59:57+00:00'
date_source: arxiv
- id: nguyen2025pointmapconditioned
title: Pointmap-Conditioned Diffusion for Consistent Novel View Synthesis
authors: Thang-Anh-Quan Nguyen, Nathan Piasco, Luis Roldão, Moussab Bennehar, Dzmitry
Tsishkou, Laurent Caraffa, Jean-Philippe Tarel, Roland Brémond
year: '2025'
abstract: 'In this paper, we present PointmapDiffusion, a novel framework for single-image
novel view synthesis (NVS) that utilizes pre-trained 2D diffusion models. Our
method is the first to leverage pointmaps (i.e. rasterized 3D scene coordinates)
as a conditioning signal, capturing geometric prior from the reference images
to guide the diffusion process. By embedding reference attention blocks and a
ControlNet for pointmap features, our model balances between generative capability
and geometric consistency, enabling accurate view synthesis across varying viewpoints.
Extensive experiments on diverse real-world datasets demonstrate that PointmapDiffusion
achieves high-quality, multi-view consistent results with significantly fewer
trainable parameters compared to other baselines for single-image NVS tasks.
'
project_page: null
paper: https://arxiv.org/pdf/2501.02913.pdf
code: null
video: null
tags:
- Diffusion
thumbnail: assets/thumbnails/nguyen2025pointmapconditioned.jpg
publication_date: '2025-01-06T10:48:31+00:00'
date_source: arxiv
- id: bian2025gsdit
title: 'GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through
Efficient Dense 3D Point Tracking'
authors: Weikang Bian, Zhaoyang Huang, Xiaoyu Shi, Yijin Li, Fu-Yun Wang, Hongsheng
Li
year: '2025'
abstract: '4D video control is essential in video generation as it enables the use
of sophisticated lens techniques, such as multi-camera shooting and dolly zoom,
which are currently unsupported by existing methods. Training a video Diffusion
Transformer (DiT) directly to control 4D content requires expensive multi-view
videos. Inspired by Monocular Dynamic novel View Synthesis (MDVS) that optimizes
a 4D representation and renders videos according to different 4D elements, such
as camera pose and object motion editing, we bring pseudo 4D Gaussian fields to
video generation. Specifically, we propose a novel framework that constructs a
pseudo 4D Gaussian field with dense 3D point tracking and renders the Gaussian
field for all video frames. Then we finetune a pretrained DiT to generate videos
following the guidance of the rendered video, dubbed as GS-DiT. To boost the training
of the GS-DiT, we also propose an efficient Dense 3D Point Tracking (D3D-PT) method
for the pseudo 4D Gaussian field construction. Our D3D-PT outperforms SpatialTracker,
the state-of-the-art sparse 3D point tracking method, in accuracy and accelerates
the inference speed by two orders of magnitude. During the inference stage, GS-DiT
can generate videos with the same dynamic content while adhering to different
camera parameters, addressing a significant limitation of current video generation
models. GS-DiT demonstrates strong generalization capabilities and extends the
4D controllability of Gaussian splatting to video generation beyond just camera
poses. It supports advanced cinematic effects through the manipulation of the
Gaussian field and camera intrinsics, making it a powerful tool for creative video
production. Demos are available at https://wkbian.github.io/Projects/GS-DiT/.
'
project_page: null
paper: https://arxiv.org/pdf/2501.02690.pdf
code: null
video: null
tags:
- Year 2025
thumbnail: assets/thumbnails/bian2025gsdit.jpg
publication_date: '2025-01-05T23:55:33+00:00'
date_source: arxiv
- id: cong2025videolifter
title: 'VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment'
authors: Wenyan Cong, Kevin Wang, Jiahui Lei, Colton Stearns, Yuanhao Cai, Dilin
Wang, Rakesh Ranjan, Matt Feiszli, Leonidas Guibas, Zhangyang Wang, Weiyao Wang,
Zhiwen Fan
year: '2025'
abstract: 'Efficiently reconstructing accurate 3D models from monocular video is
a key challenge in computer vision, critical for advancing applications in virtual
reality, robotics, and scene understanding. Existing approaches typically require
pre-computed camera parameters and frame-by-frame reconstruction pipelines, which
are prone to error accumulation and entail significant computational overhead.
To address these limitations, we introduce VideoLifter, a novel framework that
leverages geometric priors from a learnable model to incrementally optimize a
globally sparse to dense 3D representation directly from video sequences. VideoLifter
segments the video sequence into local windows, where it matches and registers
frames, constructs consistent fragments, and aligns them hierarchically to produce
a unified 3D model. By tracking and propagating sparse point correspondences across
frames and fragments, VideoLifter incrementally refines camera poses and 3D structure,
minimizing reprojection error for improved accuracy and robustness. This approach
significantly accelerates the reconstruction process, reducing training time by
over 82% while surpassing current state-of-the-art methods in visual fidelity
and computational efficiency.
'
project_page: https://videolifter.github.io/
paper: https://arxiv.org/pdf/2501.01949.pdf
code: https://github.com/VITA-Group/VideoLifter
video: null
tags:
- Acceleration
- Code
- Diffusion
- Project
thumbnail: assets/thumbnails/cong2025videolifter.jpg
publication_date: '2025-01-03T18:52:36+00:00'
date_source: arxiv
- id: huang2025enerverse
title: 'EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation'
authors: Siyuan Huang, Liliang Chen, Pengfei Zhou, Shengcong Chen, Zhengkai Jiang,
Yue Hu, Peng Gao, Hongsheng Li, Maoqing Yao, Guanghui Ren
year: '2025'
abstract: 'We introduce EnerVerse, a comprehensive framework for embodied future
space generation specifically designed for robotic manipulation tasks. EnerVerse
seamlessly integrates convolutional and bidirectional attention mechanisms for
inner-chunk space modeling, ensuring low-level consistency and continuity. Recognizing
the inherent redundancy in video data, we propose a sparse memory context combined
with a chunkwise unidirectional generative paradigm to enable the generation of
infinitely long sequences. To further augment robotic capabilities, we introduce
the Free Anchor View (FAV) space, which provides flexible perspectives to enhance
observation and analysis. The FAV space mitigates motion modeling ambiguity, removes
physical constraints in confined environments, and significantly improves the
robot''s generalization and adaptability across various tasks and settings. To
address the prohibitive costs and labor intensity of acquiring multi-camera observations,
we present a data engine pipeline that integrates a generative model with 4D Gaussian
Splatting (4DGS). This pipeline leverages the generative model''s robust generalization
capabilities and the spatial constraints provided by 4DGS, enabling an iterative
enhancement of data quality and diversity, thus creating a data flywheel effect
that effectively narrows the sim-to-real gap. Finally, our experiments demonstrate
that the embodied future space generation prior substantially enhances policy
predictive capabilities, resulting in improved overall performance, particularly
in long-range robotic manipulation tasks.
'
project_page: https://sites.google.com/view/enerverse
paper: https://arxiv.org/pdf/2501.01895.pdf
code: null
video: null
tags:
- Dynamic
- Project
- Robotics
thumbnail: assets/thumbnails/huang2025enerverse.jpg
publication_date: '2025-01-03T17:00:33+00:00'
- id: longhini2024clothsplatting
title: 'Cloth-Splatting: 3D Cloth State Estimation from RGB Supervision'
authors: Alberta Longhini, Marcel Büsching, Bardienus Pieter Duisterhof, Jens Lundell,
Jeffrey Ichnowski, Mårten Björkman, Danica Kragic
year: '2024'
abstract: Recently, 3D Gaussian Splatting (3DGS) has revolutionized radiance field
reconstruction, manifesting efficient and high-fidelity novel view synthesis.
However, accurately We introduce Cloth-Splatting, a method for estimating 3D states
of cloth from RGB images through a prediction-update framework. Cloth-Splatting
leverages an action-conditioned dynamics model for predicting future states and
uses 3D Gaussian Splatting to update the predicted states. Our key insight is
that coupling a 3D mesh-based representation with Gaussian Splatting allows us
to define a differentiable map between the cloth's state space and the image space.
This enables the use of gradient-based optimization techniques to refine inaccurate
state estimates using only RGB supervision. Our experiments demonstrate that Cloth-Splatting
not only improves state estimation accuracy over current baselines but also reduces
convergence time by ~85%.
project_page: https://kth-rpl.github.io/cloth-splatting/
paper: https://arxiv.org/pdf/2501.01715.pdf
code: https://github.com/KTH-RPL/cloth-splatting
video: null
tags:
- Code
- Meshing
- Project
- Rendering
thumbnail: assets/thumbnails/longhini2024clothsplatting.jpg
publication_date: '2025-01-03T09:17:30+00:00'
date_source: arxiv
- id: zhang2025crossviewgs
title: 'CrossView-GS: Cross-view Gaussian Splatting For Large-scale Scene Reconstruction'
authors: Chenhao Zhang, Yuanping Cao, Lei Zhang
year: '2025'
abstract: 3D Gaussian Splatting (3DGS) has emerged as a prominent method for scene
representation and reconstruction, leveraging densely distributed Gaussian primitives
to enable real-time rendering of high-resolution images. While existing 3DGS methods
perform well in scenes with minor view variation, large view changes in cross-view
scenes pose optimization challenges for these methods. To address these issues,
we propose a novel cross-view Gaussian Splatting method for large-scale scene
reconstruction, based on dual-branch fusion. Our method independently reconstructs
models from aerial and ground views as two independent branches to establish the
baselines of Gaussian distribution, providing reliable priors for cross-view reconstruction
during both initialization and densification. Specifically, a gradient-aware regularization
strategy is introduced to mitigate smoothing issues caused by significant view
disparities. Additionally, a unique Gaussian supplementation strategy is utilized
to incorporate complementary information of dual-branch into the cross-view model.
Extensive experiments on benchmark datasets demonstrate that our method achieves
superior performance in novel view synthesis compared to state-of-the-art methods.
project_page: null
paper: https://arxiv.org/pdf/2501.01695.pdf
code: null
video: null
tags:
- Large-Scale
- Optimization
thumbnail: assets/thumbnails/zhang2025crossviewgs.jpg
publication_date: '2025-01-03T08:24:59+00:00'
- id: wang2025pgsag
title: 'PG-SAG: Parallel Gaussian Splatting for Fine-Grained Large-Scale Urban Buildings
Reconstruction via Semantic-Aware Grouping'
authors: Tengfei Wang, Xin Wang, Yongmao Hou, Yiwei Xu, Wendi Zhang, Zongqian Zhan
year: '2025'
abstract: 3D Gaussian Splatting (3DGS) has emerged as a transformative method in
the field of real-time novel synthesis. Based on 3DGS, recent advancements cope
with large-scale scenes via spatial-based partition strategy to reduce video memory
and optimization time costs. In this work, we introduce a parallel Gaussian splatting
method, termed PG-SAG, which fully exploits semantic cues for both partitioning
and Gaussian kernel optimization, enabling fine-grained building surface reconstruction
of large-scale urban areas without downsampling the original image resolution.
First, the Cross-modal model - Language Segment Anything is leveraged to segment
building masks. Then, the segmented building regions is grouped into sub-regions
according to the visibility check across registered images. The Gaussian kernels
for these sub-regions are optimized in parallel with masked pixels. In addition,
the normal loss is re-formulated for the detected edges of masks to alleviate
the ambiguities in normal vectors on edges. Finally, to improve the optimization
of 3D Gaussians, we introduce a gradient-constrained balance-load loss that accounts
for the complexity of the corresponding scenes, effectively minimizing the thread
waiting time in the pixel-parallel rendering stage as well as the reconstruction
lost. Extensive experiments are tested on various urban datasets, the results
demonstrated the superior performance of our PG-SAG on building surface reconstruction,
compared to several state-of-the-art 3DGS-based methods.
project_page: null
paper: https://arxiv.org/pdf/2501.01677.pdf
code: null
video: null
tags:
- Large-Scale
- Meshing
- Optimization
thumbnail: assets/thumbnails/wang2025pgsag.jpg
publication_date: '2025-01-03T07:40:16+00:00'
- id: gao2025easysplat
title: 'EasySplat: View-Adaptive Learning makes 3D Gaussian Splatting Easy'
authors: Ao Gao, Luosong Guo, Tao Chen, Zhao Wang, Ying Tai, Jian Yang, Zhenyu Zhang
year: '2025'
abstract: 3D Gaussian Splatting (3DGS) techniques have achieved satisfactory 3D
scene representation. Despite their impressive performance, they confront challenges
due to the limitation of structure-from-motion (SfM) methods on acquiring accurate
scene initialization, or the inefficiency of densification strategy. In this paper,
we introduce a novel framework EasySplat to achieve high-quality 3DGS modeling.
Instead of using SfM for scene initialization, we employ a novel method to release
the power of large-scale pointmap approaches. Specifically, we propose an efficient
grouping strategy based on view similarity, and use robust pointmap priors to
obtain high-quality point clouds and camera poses for 3D scene initialization.
After obtaining a reliable scene structure, we propose a novel densification approach
that adaptively splits Gaussian primitives based on the average shape of neighboring
Gaussian ellipsoids, utilizing KNN scheme. In this way, the proposed method tackles
the limitation on initialization and optimization, leading to an efficient and
accurate 3DGS modeling. Extensive experiments demonstrate that EasySplat outperforms
the current state-of-the-art (SOTA) in handling novel view synthesis.
project_page: null
paper: https://arxiv.org/pdf/2501.01003.pdf
code: null
video: null
tags:
- 3ster-based
- Acceleration
- Densification
- Rendering
thumbnail: assets/thumbnails/gao2025easysplat.jpg
publication_date: '2025-01-02T01:56:58+00:00'
- id: yang2024storm
title: 'STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes'
authors: Jiawei Yang, Jiahui Huang, Yuxiao Chen, Yan Wang, Boyi Li, Yurong You,
Apoorva Sharma, Maximilian Igl, Peter Karkus, Danfei Xu, Boris Ivanovic, Yue Wang,
Marco Pavone
year: '2024'
abstract: We present STORM, a spatio-temporal reconstruction model designed for
reconstructing dynamic outdoor scenes from sparse observations. Existing dynamic
reconstruction methods often rely on per-scene optimization, dense observations
across space and time, and strong motion supervision, resulting in lengthy optimization
times, limited generalization to novel views or scenes, and degenerated quality
caused by noisy pseudo-labels for dynamics. To address these challenges, STORM
leverages a data-driven Transformer architecture that directly infers dynamic
3D scene representations--parameterized by 3D Gaussians and their velocities--in
a single forward pass. Our key design is to aggregate 3D Gaussians from all frames
using self-supervised scene flows, transforming them to the target timestep to
enable complete (i.e., "amodal") reconstructions from arbitrary viewpoints at
any moment in time. As an emergent property, STORM automatically captures dynamic
instances and generates high-quality masks using only reconstruction losses. Extensive
experiments on public datasets show that STORM achieves precise dynamic scene
reconstruction, surpassing state-of-the-art per-scene optimization methods (+4.3
to 6.6 PSNR) and existing feed-forward approaches (+2.1 to 4.7 PSNR) in dynamic
regions. STORM reconstructs large-scale outdoor scenes in 200ms, supports real-time
rendering, and outperforms competitors in scene flow estimation, improving 3D
EPE by 0.422m and Acc5 by 28.02%. Beyond reconstruction, we showcase four additional
applications of our model, illustrating the potential of self-supervised learning
for broader dynamic scene understanding.
project_page: null
paper: https://arxiv.org/pdf/2501.00602.pdf
code: null
video: https://jiawei-yang.github.io/STORM/
tags:
- Autonomous Driving
- Dynamic
- Large-Scale
- Video
thumbnail: assets/thumbnails/yang2024storm.jpg
publication_date: '2024-12-31T18:59:58+00:00'
- id: mao2024dreamdrive
title: 'DreamDrive: Generative 4D Scene Modeling from Street View Images'
authors: Jiageng Mao, Boyi Li, Boris Ivanovic, Yuxiao Chen, Yan Wang, Yurong You,
Chaowei Xiao, Danfei Xu, Marco Pavone, Yue Wang
year: '2024'
abstract: Synthesizing photo-realistic visual observations from an ego vehicle's
driving trajectory is a critical step towards scalable training of self-driving
models. Reconstruction-based methods create 3D scenes from driving logs and synthesize
geometry-consistent driving videos through neural rendering, but their dependence
on costly object annotations limits their ability to generalize to in-the-wild
driving scenarios. On the other hand, generative models can synthesize action-conditioned
driving videos in a more generalizable way but often struggle with maintaining
3D visual consistency. In this paper, we present DreamDrive, a 4D spatial-temporal
scene generation approach that combines the merits of generation and reconstruction,
to synthesize generalizable 4D driving scenes and dynamic driving videos with
3D consistency. Specifically, we leverage the generative power of video diffusion
models to synthesize a sequence of visual references and further elevate them
to 4D with a novel hybrid Gaussian representation. Given a driving trajectory,
we then render 3D-consistent driving videos via Gaussian splatting. The use of
generative priors allows our method to produce high-quality 4D scenes from in-the-wild
driving data, while neural rendering ensures 3D-consistent video generation from
the 4D scenes. Extensive experiments on nuScenes and street view images demonstrate
that DreamDrive can generate controllable and generalizable 4D driving scenes,
synthesize novel views of driving videos with high fidelity and 3D consistency,
decompose static and dynamic elements in a self-supervised manner, and enhance
perception and planning tasks for autonomous driving.
project_page: null
paper: https://arxiv.org/pdf/2501.00601.pdf
code: null
video: null
tags:
- Autonomous Driving
- Dynamic
- Feed-Forward
thumbnail: assets/thumbnails/mao2024dreamdrive.jpg
publication_date: '2024-12-31T18:59:57+00:00'
- id: wang2024sgsplatting
title: 'SG-Splatting: Accelerating 3D Gaussian Splatting with Spherical Gaussians'
authors: Yiwen Wang, Siyuan Chen, Ran Yi
year: '2024'
abstract: '3D Gaussian Splatting is emerging as a state-of-the-art technique in
novel view synthesis, recognized for its impressive balance between visual quality,
speed, and rendering efficiency. However, reliance on third-degree spherical harmonics
for color representation introduces significant storage demands and computational
overhead, resulting in a large memory footprint and slower rendering speed. We
introduce SG-Splatting with Spherical Gaussians based color representation, a
novel approach to enhance rendering speed and quality in novel view synthesis.
Our method first represents view-dependent color using Spherical Gaussians, instead
of three degree spherical harmonics, which largely reduces the number of parameters
used for color representation, and significantly accelerates the rendering process.
We then develop an efficient strategy for organizing multiple Spherical Gaussians,
optimizing their arrangement to achieve a balanced and accurate scene representation.
To further improve rendering quality, we propose a mixed representation that combines
Spherical Gaussians with low-degree spherical harmonics, capturing both high-
and low-frequency color information effectively. SG-Splatting also has plug-and-play
capability, allowing it to be easily integrated into existing systems. This approach
improves computational efficiency and overall visual fidelity, making it a practical
solution for real-time applications.
'
project_page: null
paper: https://arxiv.org/pdf/2501.00342.pdf
code: null
video: null
tags:
- Acceleration
thumbnail: assets/thumbnails/wang2024sgsplatting.jpg
publication_date: '2024-12-31T08:31:52+00:00'
- id: cha2024perse
title: 'PERSE: Personalized 3D Generative Avatars from A Single Portrait'
authors: Hyunsoo Cha, Inhee Lee, Hanbyul Joo
year: '2024'
abstract: We present PERSE, a method for building an animatable personalized generative
avatar from a reference portrait. Our avatar model enables facial attribute editing
in a continuous and disentangled latent space to control each facial attribute,
while preserving the individual's identity. To achieve this, our method begins
by synthesizing large-scale synthetic 2D video datasets, where each video contains
consistent changes in the facial expression and viewpoint, combined with a variation
in a specific facial attribute from the original input. We propose a novel pipeline
to produce high-quality, photorealistic 2D videos with facial attribute editing.
Leveraging this synthetic attribute dataset, we present a personalized avatar
creation method based on the 3D Gaussian Splatting, learning a continuous and
disentangled latent space for intuitive facial attribute manipulation. To enforce
smooth transitions in this latent space, we introduce a latent space regularization
technique by using interpolated 2D faces as supervision. Compared to previous
approaches, we demonstrate that PERSE generates high-quality avatars with interpolated
attributes while preserving identity of reference person.
project_page: https://hyunsoocha.github.io/perse/
paper: https://arxiv.org/pdf/2412.21206v1.pdf
code: null
video: https://youtu.be/zX881Zx03o4
tags:
- Avatar
- GAN
- Project
- Video
thumbnail: assets/thumbnails/cha2024perse.jpg
publication_date: '2024-12-30T18:59:58+00:00'
- id: yang20244d
title: '4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives'
authors: Zeyu Yang, Zijie Pan, Xiatian Zhu, Li Zhang, Yu-Gang Jiang, Philip H. S.
Torr
year: '2024'
abstract: Dynamic 3D scene representation and novel view synthesis from captured
videos are crucial for enabling immersive experiences required by AR/VR and metaverse
applications. However, this task is challenging due to the complexity of unconstrained
real-world scenes and their temporal dynamics. In this paper, we frame dynamic
scenes as a spatio-temporal 4D volume learning problem, offering a native explicit
reformulation with minimal assumptions about motion, which serves as a versatile
dynamic scene learning framework. Specifically, we represent a target dynamic
scene using a collection of 4D Gaussian primitives with explicit geometry and
appearance features, dubbed as 4D Gaussian splatting (4DGS). This approach can
capture relevant information in space and time by fitting the underlying spatio-temporal
volume. Modeling the spacetime as a whole with 4D Gaussians parameterized by anisotropic
ellipses that can rotate arbitrarily in space and time, our model can naturally
learn view-dependent and time-evolved appearance with 4D spherindrical harmonics.
Notably, our 4DGS model is the first solution that supports real-time rendering
of high-resolution, photorealistic novel views for complex dynamic scenes. To
enhance efficiency, we derive several compact variants that effectively reduce
memory footprint and mitigate the risk of overfitting. Extensive experiments validate
the superiority of 4DGS in terms of visual quality and efficiency across a range
of dynamic scene-related tasks (e.g., novel view synthesis, 4D generation, scene
understanding) and scenarios (e.g., single object, indoor scenes, driving environments,
synthetic and real data).
project_page: null
paper: https://arxiv.org/pdf/2412.20720v1.pdf
code: null
video: null
tags:
- Compression
- Dynamic
- Large-Scale
thumbnail: assets/thumbnails/yang20244d.jpg
publication_date: '2024-12-30T05:30:26+00:00'
- id: liu2024maskgaussian
title: 'MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks'
authors: Yifei Liu, Zhihang Zhong, Yifan Zhan, Sheng Xu, Xiao Sun
year: '2024'
abstract: 'While 3D Gaussian Splatting (3DGS) has demonstrated remarkable performance
in novel view synthesis and real-time rendering, the high memory consumption due
to the use of millions of Gaussians limits its practicality. To mitigate this
issue, improvements have been made by pruning unnecessary Gaussians, either through
a hand-crafted criterion or by using learned masks. However, these methods deterministically
remove Gaussians based on a snapshot of the pruning moment, leading to sub-optimized
reconstruction performance from a long-term perspective. To address this issue,
we introduce MaskGaussian, which models Gaussians as probabilistic entities rather
than permanently removing them, and utilize them according to their probability
of existence. To achieve this, we propose a masked-rasterization technique that
enables unused yet probabilistically existing Gaussians to receive gradients,
allowing for dynamic assessment of their contribution to the evolving scene and
adjustment of their probability of existence. Hence, the importance of Gaussians
iteratively changes and the pruned Gaussians are selected diversely. Extensive
experiments demonstrate the superiority of the proposed method in achieving better
rendering quality with fewer Gaussians than previous pruning methods, pruning
over 60% of Gaussians on average with only a 0.02 PSNR decline. Our code can be
found at: https://github.com/kaikai23/MaskGaussian
'
project_page: null
paper: https://arxiv.org/pdf/2412.20522.pdf
code: https://github.com/kaikai23/MaskGaussian
video: null
tags:
- Code
- Compression
- Densification
thumbnail: assets/thumbnails/liu2024maskgaussian.jpg
publication_date: '2024-12-29T17:12:16+00:00'
date_source: arxiv
- id: zeller2024gsplatloc
title: 'GSplatLoc: Ultra-Precise Camera Localization via 3D Gaussian Splatting'
authors: Atticus J. Zeller
year: '2024'
abstract: 'We present GSplatLoc, a camera localization method that leverages the
differentiable rendering capabilities of 3D Gaussian splatting for ultra-precise
pose estimation. By formulating pose estimation as a gradient-based optimization
problem that minimizes discrepancies between rendered depth maps from a pre-existing
3D Gaussian scene and observed depth images, GSplatLoc achieves translational
errors within 0.01 cm and near-zero rotational errors on the Replica dataset -
significantly outperforming existing methods. Evaluations on the Replica and TUM
RGB-D datasets demonstrate the method''s robustness in challenging indoor environments
with complex camera motions. GSplatLoc sets a new benchmark for localization in
dense mapping, with important implications for applications requiring accurate
real-time localization, such as robotics and augmented reality.
'
project_page: null
paper: https://arxiv.org/pdf/2412.20056.pdf
code: https://github.com/AtticusZeller/GsplatLoc
video: null
tags:
- Code
- In the Wild
- Point Cloud
- Poses
- Robotics
- SLAM
thumbnail: assets/thumbnails/zeller2024gsplatloc.jpg
publication_date: '2024-12-28T07:14:14+00:00'
date_source: arxiv
- id: xu2024das3r
title: 'DAS3R: Dynamics-Aware Gaussian Splatting for Static Scene Reconstruction'
authors: Kai Xu, Tze Ho Elden Tse, Jizong Peng, Angela Yao
year: '2024'
abstract: 'We propose a novel framework for scene decomposition and static background
reconstruction from everyday videos. By integrating the trained motion masks and
modeling the static scene as Gaussian splats with dynamics-aware optimization,
our method achieves more accurate background reconstruction results than previous
works. Our proposed method is termed DAS3R, an abbreviation for Dynamics-Aware
Gaussian Splatting for Static Scene Reconstruction. Compared to existing methods,
DAS3R is more robust in complex motion scenarios, capable of handling videos where
dynamic objects occupy a significant portion of the scene, and does not require
camera pose inputs or point cloud data from SLAM-based methods. We compared DAS3R
against recent distractor-free approaches on the DAVIS and Sintel datasets; DAS3R
demonstrates enhanced performance and robustness with a margin of more than 2
dB in PSNR. The project''s webpage can be accessed via \url{https://kai422.github.io/DAS3R/}
'
project_page: https://kai422.github.io/DAS3R/
paper: https://arxiv.org/pdf/2412.19584.pdf
code: https://github.com/kai422/das3r
video: https://kai422.github.io/DAS3R/assets/davis.gif
tags:
- Code
- Project
- Video
thumbnail: assets/thumbnails/xu2024das3r.jpg
publication_date: '2024-12-27T10:59:46+00:00'
date_source: arxiv
- id: cai2024dust
title: 'Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from
Sparse Uncalibrated Images'
authors: Xudong Cai, Yongcai Wang, Zhaoxin Fan, Deng Haoran, Shuo Wang, Wanting
Li, Deying Li, Lun Luo, Minhang Wang, Jintao Xu
year: '2024'
abstract: Photo-realistic scene reconstruction from sparse-view, uncalibrated images
is highly required in practice. Although some successes have been made, existing
methods are either Sparse-View but require accurate camera parameters (i.e., intrinsic
and extrinsic), or SfM-free but need densely captured images. To combine the advantages
of both methods while addressing their respective weaknesses, we propose Dust
to Tower (D2T), an accurate and efficient coarse-to-fine framework to optimize
3DGS and image poses simultaneously from sparse and uncalibrated images. Our key
idea is to first construct a coarse model efficiently and subsequently refine
it using warped and inpainted images at novel viewpoints. To do this, we first
introduce a Coarse Construction Module (CCM) which exploits a fast Multi-View
Stereo model to initialize a 3D Gaussian Splatting (3DGS) and recover initial
camera poses. To refine the 3D model at novel viewpoints, we propose a Confidence
Aware Depth Alignment (CADA) module to refine the coarse depth maps by aligning
their confident parts with estimated depths by a Mono-depth model. Then, a Warped
Image-Guided Inpainting (WIGI) module is proposed to warp the training images
to novel viewpoints by the refined depth maps, and inpainting is applied to fulfill
the ``holes" in the warped images caused by view-direction changes, providing
high-quality supervision to further optimize the 3D model and the camera poses.
Extensive experiments and ablation studies demonstrate the validity of D2T and
its design choices, achieving state-of-the-art performance in both tasks of novel
view synthesis and pose estimation while keeping high efficiency. Codes will be
publicly available.
project_page: null
paper: https://arxiv.org/pdf/2412.19518.pdf
code: null
video: null
tags:
- Inpainting
- Poses
- Sparse
thumbnail: assets/thumbnails/cai2024dust.jpg
publication_date: '2024-12-27T08:19:34+00:00'
- id: yao2024reflective
title: Reflective Gaussian Splatting
authors: Yuxuan Yao, Zixuan Zeng, Chun Gu, Xiatian Zhu, Li Zhang
year: '2024'
abstract: 'Novel view synthesis has experienced significant advancements owing to
increasingly capable NeRF- and 3DGS-based methods. However, reflective object
reconstruction remains challenging, lacking a proper solution to achieve real-time,
high-quality rendering while accommodating inter-reflection. To fill this gap,
we introduce a Reflective Gaussian splatting (\textbf{Ref-Gaussian}) framework
characterized with two components: (I) {\em Physically based deferred rendering}
that empowers the rendering equation with pixel-level material properties via
formulating split-sum approximation; (II) {\em Gaussian-grounded inter-reflection}
that realizes the desired inter-reflection function within a Gaussian splatting
paradigm for the first time. To enhance geometry modeling, we further introduce
material-aware normal propagation and an initial per-Gaussian shading stage, along
with 2D Gaussian primitives. Extensive experiments on standard datasets demonstrate
that Ref-Gaussian surpasses existing approaches in terms of quantitative metrics,
visual quality, and compute efficiency. Further, we show that our method serves
as a unified solution for both reflective and non-reflective scenes, going beyond
the previous alternatives focusing on only reflective scenes. Also, we illustrate
that Ref-Gaussian supports more applications such as relighting and editing.
'
project_page: https://fudan-zvg.github.io/ref-gaussian/
paper: https://arxiv.org/pdf/2412.19282.pdf
code: null
video: null
tags:
- Meshing
- Project
- Ray Tracing
- Relight
thumbnail: assets/thumbnails/yao2024reflective.jpg
publication_date: '2024-12-26T16:58:35+00:00'
- id: qian2024weathergs
title: 'WeatherGS: 3D Scene Reconstruction in Adverse Weather Conditions via Gaussian
Splatting'
authors: Chenghao Qian, Yuhu Guo, Wenjing Li, Gustav Markkula
year: '2024'
abstract: 3D Gaussian Splatting (3DGS) has gained significant attention for 3D scene
reconstruction, but still suffers from complex outdoor environments, especially
under adverse weather. This is because 3DGS treats the artifacts caused by adverse
weather as part of the scene and will directly reconstruct them, largely reducing
the clarity of the reconstructed scene. To address this challenge, we propose
WeatherGS, a 3DGS-based framework for reconstructing clear scenes from multi-view
images under different weather conditions. Specifically, we explicitly categorize
the multi-weather artifacts into the dense particles and lens occlusions that
have very different characters, in which the former are caused by snowflakes and
raindrops in the air, and the latter are raised by the precipitation on the camera
lens. In light of this, we propose a dense-to-sparse preprocess strategy, which
sequentially removes the dense particles by an Atmospheric Effect Filter (AEF)
and then extracts the relatively sparse occlusion masks with a Lens Effect Detector
(LED). Finally, we train a set of 3D Gaussians by the processed images and generated
masks for excluding occluded areas, and accurately recover the underlying clear
scene by Gaussian splatting. We conduct a diverse and challenging benchmark to
facilitate the evaluation of 3D reconstruction under complex weather scenarios.
Extensive experiments on this benchmark demonstrate that our WeatherGS consistently
produces high-quality, clean scenes across various weather scenarios, outperforming
existing state-of-the-art methods.
project_page: null
paper: https://arxiv.org/pdf/2412.18862.pdf
code: https://github.com/Jumponthemoon/WeatherGS
video: null
tags:
- Code
- In the Wild
thumbnail: assets/thumbnails/qian2024weathergs.jpg
publication_date: '2024-12-25T10:16:57+00:00'
- id: lyu2024facelift
title: 'FaceLift: Single Image to 3D Head with View Generation and GS-LRM'
authors: Weijie Lyu, Yi Zhou, Ming-Hsuan Yang, Zhixin Shu
year: '2024'
abstract: We present FaceLift, a feed-forward approach for rapid, high-quality,
360-degree head reconstruction from a single image. Our pipeline begins by employing
a multi-view latent diffusion model that generates consistent side and back views
of the head from a single facial input. These generated views then serve as input
to a GS-LRM reconstructor, which produces a comprehensive 3D representation using
Gaussian splats. To train our system, we develop a dataset of multi-view renderings
using synthetic 3D human head as-sets. The diffusion-based multi-view generator
is trained exclusively on synthetic head images, while the GS-LRM reconstructor
undergoes initial training on Objaverse followed by fine-tuning on synthetic head
data. FaceLift excels at preserving identity and maintaining view consistency
across views. Despite being trained solely on synthetic data, FaceLift demonstrates
remarkable generalization to real-world images. Through extensive qualitative
and quantitative evaluations, we show that FaceLift outperforms state-of-the-art
methods in 3D head reconstruction, highlighting its practical applicability and
robust performance on real-world images. In addition to single image reconstruction,
FaceLift supports video inputs for 4D novel view synthesis and seamlessly integrates
with 2D reanimation techniques to enable 3D facial animation.
project_page: https://www.wlyu.me/FaceLift/
paper: https://arxiv.org/pdf/2412.17812.pdf
code: null
video: https://huggingface.co/wlyu/FaceLift/resolve/main/videos/website_video.mp4
tags:
- Avatar
- Feed-Forward
- Project
- Video
thumbnail: assets/thumbnails/lyu2024facelift.jpg
publication_date: '2024-12-23T18:59:49+00:00'
- id: shao2024gausim
title: 'GauSim: Registering Elastic Objects into Digital World by Gaussian Simulator'
authors: Yidi Shao, Mu Huang, Chen Change Loy, Bo Dai
year: '2024'
abstract: In this work, we introduce GauSim, a novel neural network-based simulator
designed to capture the dynamic behaviors of real-world elastic objects represented
through Gaussian kernels. Unlike traditional methods that treat kernels as particles
within particle-based simulations, we leverage continuum mechanics, modeling each
kernel as a continuous piece of matter to account for realistic deformations without
idealized assumptions. To improve computational efficiency and fidelity, we employ
a hierarchical structure that organizes kernels into Center of Mass Systems (CMS)
with explicit formulations, enabling a coarse-to-fine simulation approach. This
structure significantly reduces computational overhead while preserving detailed
dynamics. In addition, GauSim incorporates explicit physics constraints, such
as mass and momentum conservation, ensuring interpretable results and robust,
physically plausible simulations. To validate our approach, we present a new dataset,
READY, containing multi-view videos of real-world elastic deformations. Experimental
results demonstrate that GauSim achieves superior performance compared to existing
physics-driven baselines, offering a practical and accurate solution for simulating
complex dynamic behaviors. Code and model will be released.
project_page: https://www.mmlab-ntu.com/project/gausim/index.html
paper: https://arxiv.org/pdf/2412.17804.pdf
code: null
video: null
tags:
- Dynamic
- Physics
- Project
thumbnail: assets/thumbnails/shao2024gausim.jpg
publication_date: '2024-12-23T18:58:17+00:00'
- id: jin2024activegs
title: 'ActiveGS: Active Scene Reconstruction using Gaussian Splatting'
authors: Liren Jin, Xingguang Zhong, Yue Pan, Jens Behley, Cyrill Stachniss, Marija
Popović
year: '2024'
abstract: 'Robotics applications often rely on scene reconstructions to enable downstream
tasks. In this work, we tackle the challenge of actively building an accurate
map of an unknown scene using an on-board RGB-D camera. We propose a hybrid map
representation that combines a Gaussian splatting map with a coarse voxel map,
leveraging the strengths of both representations: the high-fidelity scene reconstruction
capabilities of Gaussian splatting and the spatial modelling strengths of the
voxel map. The core of our framework is an effective confidence modelling technique
for the Gaussian splatting map to identify under-reconstructed areas, while utilising
spatial information from the voxel map to target unexplored areas and assist in
collision-free path planning. By actively collecting scene information in under-reconstructed
and unexplored areas for map updates, our approach achieves superior Gaussian
splatting reconstruction results compared to state-of-the-art approaches. Additionally,
we demonstrate the applicability of our active scene reconstruction framework
in the real world using an unmanned aerial vehicle.
'
project_page: null
paper: https://arxiv.org/pdf/2412.17769.pdf
code: null
video: null
tags:
- Meshing
- Robotics
- SLAM
thumbnail: assets/thumbnails/jin2024activegs.jpg
publication_date: '2024-12-23T18:29:03+00:00'
date_source: arxiv
- id: gao2024cosurfgscollaborative
title: CoSurfGS:Collaborative 3D Surface Gaussian Splatting with Distributed Learning
for Large Scene Reconstruction
authors: Yuanyuan Gao, Yalun Dai, Hao Li, Weicai Ye, Junyi Chen, Danpeng Chen, Dingwen
Zhang, Tong He, Guofeng Zhang, Junwei Han
year: '2024'
abstract: 3D Gaussian Splatting (3DGS) has demonstrated impressive performance in
scene reconstruction. However, most existing GS-based surface reconstruction methods
focus on 3D objects or limited scenes. Directly applying these methods to large-scale
scene reconstruction will pose challenges such as high memory costs, excessive
time consumption, and lack of geometric detail, which makes it difficult to implement
in practical applications. To address these issues, we propose a multi-agent collaborative
fast 3DGS surface reconstruction framework based on distributed learning for large-scale
surface reconstruction. Specifically, we develop local model compression (LMC)
and model aggregation schemes (MAS) to achieve high-quality surface representation
of large scenes while reducing GPU memory consumption. Extensive experiments on
Urban3d, MegaNeRF, and BlendedMVS demonstrate that our proposed method can achieve
fast and scalable high-fidelity surface reconstruction and photorealistic rendering.
project_page: https://gyy456.github.io/CoSurfGS/
paper: https://arxiv.org/pdf/2412.17612.pdf
code: null
video: null
tags:
- Distributed
- Large-Scale
- Meshing
- Project
thumbnail: assets/thumbnails/gao2024cosurfgscollaborative.jpg
publication_date: '2024-12-23T14:31:15+00:00'
- id: gui2024balanced
title: 'Balanced 3DGS: Gaussian-wise Parallelism Rendering with Fine-Grained Tiling'
authors: Hao Gui, Lin Hu, Rui Chen, Mingxiao Huang, Yuxin Yin, Jin Yang, Yong Wu
year: '2024'
abstract: '3D Gaussian Splatting (3DGS) is increasingly attracting attention in
both academia and industry owing to its superior visual quality and rendering
speed. However, training a 3DGS model remains a time-intensive task, especially
in load imbalance scenarios where workload diversity among pixels and Gaussian
spheres causes poor renderCUDA kernel performance. We introduce Balanced 3DGS,
a Gaussian-wise parallelism rendering with fine-grained tiling approach in 3DGS
training process, perfectly solving load-imbalance issues. First, we innovatively
introduce the inter-block dynamic workload distribution technique to map workloads
to Streaming Multiprocessor(SM) resources within a single GPU dynamically, which
constitutes the foundation of load balancing. Second, we are the first to propose
the Gaussian-wise parallel rendering technique to significantly reduce workload
divergence inside a warp, which serves as a critical component in addressing load
imbalance. Based on the above two methods, we further creatively put forward the
fine-grained combined load balancing technique to uniformly distribute workload
across all SMs, which boosts the forward renderCUDA kernel performance by up to
7.52x. Besides, we present a self-adaptive render kernel selection strategy during
the 3DGS training process based on different load-balance situations, which effectively
improves training efficiency.
'
project_page: null
paper: https://arxiv.org/pdf/2412.17378.pdf
code: null
video: null
tags:
- Acceleration
thumbnail: assets/thumbnails/gui2024balanced.jpg
publication_date: '2024-12-23T08:26:30+00:00'
- id: jambon2024interactive
title: Interactive Scene Authoring with Specialized Generative Primitives
authors: Clément Jambon, Changwoon Choi, Dongsu Zhang, Olga Sorkine-Hornung, Young
Min Kim
year: '2024'
abstract: 'Generating high-quality 3D digital assets often requires expert knowledge
of complex design tools. We introduce Specialized Generative Primitives, a generative
framework that allows non-expert users to author high-quality 3D scenes in a seamless,
lightweight, and controllable manner. Each primitive is an efficient generative
model that captures the distribution of a single exemplar from the real world.
With our framework, users capture a video of an environment, which we turn into
a high-quality and explicit appearance model thanks to 3D Gaussian Splatting.
Users then select regions of interest guided by semantically-aware features. To
create a generative primitive, we adapt Generative Cellular Automata to single-exemplar
training and controllable generation. We decouple the generative task from the
appearance model by operating on sparse voxels and we recover a high-quality output
with a subsequent sparse patch consistency step. Each primitive can be trained
within 10 minutes and used to author new scenes interactively in a fully compositional
manner. We showcase interactive sessions where various primitives are extracted
from real-world scenes and controlled to create 3D assets and scenes in a few
minutes. We also demonstrate additional capabilities of our primitives: handling
various 3D representations to control generation, transferring appearances, and
editing geometries.
'
project_page: null
paper: https://arxiv.org/pdf/2412.16253.pdf
code: null
video: null
tags:
- Editing
- World Generation
thumbnail: assets/thumbnails/jambon2024interactive.jpg
publication_date: '2024-12-20T04:39:50+00:00'
- id: shen2024solidgs
title: 'SolidGS: Consolidating Gaussian Surfel Splatting for Sparse-View Surface
Reconstruction'
authors: Zhuowen Shen, Yuan Liu, Zhang Chen, Zhong Li, Jiepeng Wang, Yongqing Liang,
Zhengming Yu, Jingdong Zhang, Yi Xu, Scott Schaefer, Xin Li, Wenping Wang
year: '2024'
abstract: 'Gaussian splatting has achieved impressive improvements for both novel-view
synthesis and surface reconstruction from multi-view images. However, current
methods still struggle to reconstruct high-quality surfaces from only sparse view
input images using Gaussian splatting. In this paper, we propose a novel method
called SolidGS to address this problem. We observed that the reconstructed geometry
can be severely inconsistent across multi-views, due to the property of Gaussian
function in geometry rendering. This motivates us to consolidate all Gaussians
by adopting a more solid kernel function, which effectively improves the surface
reconstruction quality. With the additional help of geometrical regularization
and monocular normal estimation, our method achieves superior performance on the
sparse view surface reconstruction than all the Gaussian splatting methods and
neural field methods on the widely used DTU, Tanks-and-Temples, and LLFF datasets.
'
project_page: https://mickshen7558.github.io/projects/SolidGS/
paper: https://arxiv.org/pdf/2412.15400.pdf
code: null
video: null
tags:
- Meshing
- Project
- Sparse
thumbnail: assets/thumbnails/shen2024solidgs.jpg
publication_date: '2024-12-19T21:04:43+00:00'
date_source: arxiv
- id: xie2024envgs
title: 'EnvGS: Modeling View-Dependent Appearance with Environment Gaussian'
authors: Tao Xie, Xi Chen, Zhen Xu, Yiman Xie, Yudong Jin, Yujun Shen, Sida Peng,
Hujun Bao, Xiaowei Zhou
year: '2024'
abstract: 'Reconstructing complex reflections in real-world scenes from 2D images
is essential for achieving photorealistic novel view synthesis. Existing methods
that utilize environment maps to model reflections from distant lighting often
struggle with high-frequency reflection details and fail to account for near-field
reflections. In this work, we introduce EnvGS, a novel approach that employs a
set of Gaussian primitives as an explicit 3D representation for capturing reflections
of environments. These environment Gaussian primitives are incorporated with base
Gaussian primitives to model the appearance of the whole scene. To efficiently
render these environment Gaussian primitives, we developed a ray-tracing-based
renderer that leverages the GPU''s RT core for fast rendering. This allows us
to jointly optimize our model for high-quality reconstruction while maintaining
real-time rendering speeds. Results from multiple real-world and synthetic datasets
demonstrate that our method produces significantly more detailed reflections,
achieving the best rendering quality in real-time novel view synthesis.
'
project_page: https://zju3dv.github.io/envgs/
paper: https://arxiv.org/pdf/2412.15215.pdf
code: null
video: https://raw.githubusercontent.com/xbillowy/assets/refs/heads/main/envgs.github.io.assets/teaser.mp4
tags:
- Project
- Ray Tracing
- Rendering
- Video
thumbnail: assets/thumbnails/xie2024envgs.jpg
publication_date: '2024-12-19T18:59:57+00:00'
date_source: arxiv
- id: saito2024squeezeme
title: 'SqueezeMe: Efficient Gaussian Avatars for VR'
authors: Shunsuke Saito, Stanislav Pidhorskyi, Igor Santesteban, Forrest Iandola,
Divam Gupta, Anuj Pahuja, Nemanja Bartolovic, Frank Yu, Emanuel Garbin, Tomas
Simon
year: '2024'
abstract: "Gaussian Splatting has enabled real-time 3D human avatars with unprecedented\
\ levels of visual quality. While previous methods require a desktop GPU for real-time\
\ inference of a single avatar, we aim to squeeze multiple Gaussian avatars onto\
\ a portable virtual reality headset with real-time drivable inference. We begin\
\ by training a previous work, Animatable Gaussians, on a high quality dataset\
\ captured with 512 cameras. The Gaussians are animated by controlling base set\
\ of Gaussians with linear blend skinning (LBS) motion and then further adjusting\
\ the Gaussians with a neural network decoder to correct their appearance. When\
\ deploying the model on a Meta Quest 3 VR headset, we find two major computational\
\ bottlenecks: the decoder and the rendering. To accelerate the decoder, we train\
\ the Gaussians in UV-space instead of pixel-space, and we distill the decoder\
\ to a single neural network layer. Further, we discover that neighborhoods of\
\ Gaussians can share a single corrective from the decoder, which provides an\
\ additional speedup. To accelerate the rendering, we develop a custom pipeline\
\ in Vulkan that runs on the mobile GPU. Putting it all together, we run 3 Gaussian\
\ avatars concurrently at 72 FPS on a VR headset. \n"
project_page: https://forresti.github.io/squeezeme
paper: https://arxiv.org/pdf/2412.15171.pdf
code: null
video: null
tags:
- Avatar
- Dynamic
- Project
thumbnail: assets/thumbnails/saito2024squeezeme.jpg
publication_date: '2024-12-19T18:46:55+00:00'
date_source: arxiv
- id: peng2024gags
title: 'GAGS: Granularity-Aware Feature Distillation for Language Gaussian Splatting'
authors: Yuning Peng, Haiping Wang, Yuan Liu, Chenglu Wen, Zhen Dong, Bisheng Yang
year: '2024'
abstract: '3D open-vocabulary scene understanding, which accurately perceives complex
semantic properties of objects in space, has gained significant attention in recent
years. In this paper, we propose GAGS, a framework that distills 2D CLIP features
into 3D Gaussian splatting, enabling open-vocabulary queries for renderings on
arbitrary viewpoints. The main challenge of distilling 2D features for 3D fields
lies in the multiview inconsistency of extracted 2D features, which provides unstable
supervision for the 3D feature field. GAGS addresses this challenge with two novel
strategies. First, GAGS associates the prompt point density of SAM with the camera
distances, which significantly improves the multiview consistency of segmentation
results. Second, GAGS further decodes a granularity factor to guide the distillation
process and this granularity factor can be learned in a unsupervised manner to
only select the multiview consistent 2D features in the distillation process.
Experimental results on two datasets demonstrate significant performance and stability
improvements of GAGS in visual grounding and semantic segmentation, with an inference
speed 2$\times$ faster than baseline methods. The code and additional results
are available at https://pz0826.github.io/GAGS-Webpage/ .
'
project_page: https://pz0826.github.io/GAGS-Webpage/
paper: https://arxiv.org/pdf/2412.13654.pdf
code: https://github.com/WHU-USI3DV/GAGS
video: null
tags:
- Code
- Language Embedding
- Project
- Segmentation
thumbnail: assets/thumbnails/peng2024gags.jpg
publication_date: '2024-12-18T09:33:20+00:00'
date_source: arxiv
- id: lu2024turbogs
title: 'Turbo-GS: Accelerating 3D Gaussian Fitting for High-Quality Radiance Fields'
authors: Tao Lu, Ankit Dhiman, R Srinath, Emre Arslan, Angela Xing, Yuanbo Xiangli,
R Venkatesh Babu, Srinath Sridhar
year: '2024'
abstract: 'Novel-view synthesis is an important problem in computer vision with
applications in 3D reconstruction, mixed reality, and robotics. Recent methods
like 3D Gaussian Splatting (3DGS) have become the preferred method for this task,
providing high-quality novel views in real time. However, the training time of
a 3DGS model is slow, often taking 30 minutes for a scene with 200 views. In contrast,
our goal is to reduce the optimization time by training for fewer steps while
maintaining high rendering quality. Specifically, we combine the guidance from
both the position error and the appearance error to achieve a more effective densification.
To balance the rate between adding new Gaussians and fitting old Gaussians, we
develop a convergence-aware budget control mechanism. Moreover, to make the densification
process more reliable, we selectively add new Gaussians from mostly visited regions.
With these designs, we reduce the Gaussian optimization steps to one-third of
the previous approach while achieving a comparable or even better novel view rendering
quality. To further facilitate the rapid fitting of 4K resolution images, we introduce
a dilation-based rendering technique. Our method, Turbo-GS, speeds up optimization
for typical scenes and scales well to high-resolution (4K) scenarios on standard
datasets. Through extensive experiments, we show that our method is significantly
faster in optimization than other methods while retaining quality. Project page:
https://ivl.cs.brown.edu/research/turbo-gs.
'
project_page: https://ivl.cs.brown.edu/research/turbo-gs
paper: https://arxiv.org/pdf/2412.13547v1.pdf
code: null
video: null
tags:
- Acceleration
- Densification
- Project
thumbnail: assets/thumbnails/lu2024turbogs.jpg
publication_date: '2024-12-18T06:46:40+00:00'
date_source: arxiv
- id: jiang2024gausstr
title: 'GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised
3D Spatial Understanding'
authors: Haoyi Jiang, Liu Liu, Tianheng Cheng, Xinjie Wang, Tianwei Lin, Zhizhong
Su, Wenyu Liu, Xinggang Wang
year: '2024'
abstract: '3D Semantic Occupancy Prediction is fundamental for spatial understanding
as it provides a comprehensive semantic cognition of surrounding environments.
However, prevalent approaches primarily rely on extensive labeled data and computationally
intensive voxel-based modeling, restricting the scalability and generalizability
of 3D representation learning. In this paper, we introduce GaussTR, a novel Gaussian
Transformer that leverages alignment with foundation models to advance self-supervised
3D spatial understanding. GaussTR adopts a Transformer architecture to predict
sparse sets of 3D Gaussians that represent scenes in a feed-forward manner. Through
aligning rendered Gaussian features with diverse knowledge from pre-trained foundation
models, GaussTR facilitates the learning of versatile 3D representations and enables
open-vocabulary occupancy prediction without explicit annotations. Empirical evaluations
on the Occ3D-nuScenes dataset showcase GaussTR''s state-of-the-art zero-shot performance,
achieving 11.70 mIoU while reducing training duration by approximately 50%. These
experimental results highlight the significant potential of GaussTR for scalable
and holistic 3D spatial understanding, with promising implications for autonomous
driving and embodied agents. Code is available at https://github.com/hustvl/GaussTR.
'
project_page: https://hustvl.github.io/GaussTR/
paper: https://arxiv.org/pdf/2412.13193.pdf
code: https://github.com/hustvl/GaussTR
video: null
tags:
- Code
- Project
thumbnail: assets/thumbnails/jiang2024gausstr.jpg
publication_date: '2024-12-17T18:59:46+00:00'
date_source: arxiv
- id: sun2024realtime
title: Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double
Unprojected Textures
authors: Guoxing Sun, Rishabh Dabral, Heming Zhu, Pascal Fua, Christian Theobalt,
Marc Habermann
year: '2024'
abstract: 'Real-time free-view human rendering from sparse-view RGB inputs is a
challenging task due to the sensor scarcity and the tight time budget. To ensure
efficiency, recent methods leverage 2D CNNs operating in texture space to learn
rendering primitives. However, they either jointly learn geometry and appearance,
or completely ignore sparse image information for geometry estimation, significantly
harming visual quality and robustness to unseen body poses. To address these issues,
we present Double Unprojected Textures, which at the core disentangles coarse
geometric deformation estimation from appearance synthesis, enabling robust and
photorealistic 4K rendering in real-time. Specifically, we first introduce a novel
image-conditioned template deformation network, which estimates the coarse deformation
of the human template from a first unprojected texture. This updated geometry
is then used to apply a second and more accurate texture unprojection. The resulting
texture map has fewer artifacts and better alignment with input views, which benefits
our learning of finer-level geometry and appearance represented by Gaussian splats.
We validate the effectiveness and efficiency of the proposed method in quantitative
and qualitative experiments, which significantly surpasses other state-of-the-art
methods.
'
project_page: https://vcai.mpi-inf.mpg.de/projects/DUT/
paper: https://arxiv.org/pdf/2412.13183v1.pdf
code: null
video: https://vcai.mpi-inf.mpg.de/projects/DUT/videos/main_video.mp4
tags:
- Avatar
- Project
- Sparse
- Texturing
- Video
thumbnail: assets/thumbnails/sun2024realtime.jpg
publication_date: '2024-12-17T18:57:38+00:00'
date_source: arxiv
- id: weiss2024gaussian
title: 'Gaussian Billboards: Expressive 2D Gaussian Splatting with Textures'
authors: Sebastian Weiss, Derek Bradley
year: '2024'
abstract: 'Gaussian Splatting has recently emerged as the go-to representation for
reconstructing and rendering 3D scenes. The transition from 3D to 2D Gaussian
primitives has further improved multi-view consistency and surface reconstruction
accuracy. In this work we highlight the similarity between 2D Gaussian Splatting
(2DGS) and billboards from traditional computer graphics. Both use flat semi-transparent
2D geometry that is positioned, oriented and scaled in 3D space. However 2DGS
uses a solid color per splat and an opacity modulated by a Gaussian distribution,
where billboards are more expressive, modulating the color with a uv-parameterized
texture. We propose to unify these concepts by presenting Gaussian Billboards,
a modification of 2DGS to add spatially-varying color achieved using per-splat
texture interpolation. The result is a mixture of the two representations, which
benefits from both the robust scene optimization power of 2DGS and the expressiveness
of texture mapping. We show that our method can improve the sharpness and quality
of the scene representation in a wide range of qualitative and quantitative evaluations
compared to the original 2DGS implementation.
'
project_page: null
paper: https://arxiv.org/pdf/2412.12734v1.pdf
code: null
video: null
tags:
- 2DGS
- Texturing
thumbnail: assets/thumbnails/weiss2024gaussian.jpg
publication_date: '2024-12-17T09:57:04+00:00'
date_source: arxiv
- id: wu20243dgut
title: '3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting'
authors: Qi Wu, Janick Martinez Esturo, Ashkan Mirzaei, Nicolas Moenne-Loccoz, Zan
Gojcic
year: '2024'
abstract: '3D Gaussian Splatting (3DGS) has shown great potential for efficient
reconstruction and high-fidelity real-time rendering of complex scenes on consumer
hardware. However, due to its rasterization-based formulation, 3DGS is constrained
to ideal pinhole cameras and lacks support for secondary lighting effects. Recent
methods address these limitations by tracing volumetric particles instead, however,
this comes at the cost of significantly slower rendering speeds. In this work,
we propose 3D Gaussian Unscented Transform (3DGUT), replacing the EWA splatting
formulation in 3DGS with the Unscented Transform that approximates the particles
through sigma points, which can be projected exactly under any nonlinear projection
function. This modification enables trivial support of distorted cameras with
time dependent effects such as rolling shutter, while retaining the efficiency
of rasterization. Additionally, we align our rendering formulation with that of
tracing-based methods, enabling secondary ray tracing required to represent phenomena
such as reflections and refraction within the same 3D representation.
'
project_page: https://research.nvidia.com/labs/toronto-ai/3DGUT/
paper: https://arxiv.org/pdf/2412.12507.pdf
code: null
video: https://research.nvidia.com/labs/toronto-ai/3DGUT/res/3DGUT_ready_compressed.mp4
tags:
- Perspective-correct
- Project
- Video
thumbnail: assets/thumbnails/wu20243dgut.jpg
publication_date: '2024-12-17T03:21:25+00:00'
date_source: arxiv
- id: murai2024mast3rslam
title: 'MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors'
authors: Riku Murai, Eric Dexheimer, Andrew J. Davison
year: '2024'
abstract: 'We present a real-time monocular dense SLAM system designed bottom-up
from MASt3R, a two-view 3D reconstruction and matching prior. Equipped with this
strong prior, our system is robust on in-the-wild video sequences despite making
no assumption on a fixed or parametric camera model beyond a unique camera centre.
We introduce efficient methods for pointmap matching, camera tracking and local
fusion, graph construction and loop closure, and second-order global optimisation.
With known calibration, a simple modification to the system achieves state-of-the-art
performance across various benchmarks. Altogether, we propose a plug-and-play
monocular SLAM system capable of producing globally-consistent poses and dense
geometry while operating at 15 FPS.
'
project_page: null
paper: https://arxiv.org/pdf/2412.12392.pdf
code: null
video: https://www.youtube.com/watch?v=wozt71NBFTQ
tags:
- 3ster-based
- SLAM
- Video
thumbnail: assets/thumbnails/murai2024mast3rslam.jpg
publication_date: '2024-12-16T23:00:05+00:00'
date_source: arxiv
- id: zhang2024pansplat
title: 'PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting'
authors: Cheng Zhang, Haofei Xu, Qianyi Wu, Camilo Cruz Gambardella, Dinh Phung,
Jianfei Cai
year: '2024'
abstract: 'With the advent of portable 360{\deg} cameras, panorama has gained significant
attention in applications like virtual reality (VR), virtual tours, robotics,
and autonomous driving. As a result, wide-baseline panorama view synthesis has
emerged as a vital task, where high resolution, fast inference, and memory efficiency
are essential. Nevertheless, existing methods are typically constrained to lower
resolutions (512 $\times$ 1024) due to demanding memory and computational requirements.
In this paper, we present PanSplat, a generalizable, feed-forward approach that
efficiently supports resolution up to 4K (2048 $\times$ 4096). Our approach features
a tailored spherical 3D Gaussian pyramid with a Fibonacci lattice arrangement,
enhancing image quality while reducing information redundancy. To accommodate
the demands of high resolution, we propose a pipeline that integrates a hierarchical
spherical cost volume and Gaussian heads with local operations, enabling two-step
deferred backpropagation for memory-efficient training on a single A100 GPU. Experiments
demonstrate that PanSplat achieves state-of-the-art results with superior efficiency
and image quality across both synthetic and real-world datasets. Code will be
available at \url{https://github.com/chengzhag/PanSplat}.
'
project_page: https://chengzhag.github.io/publication/pansplat/
paper: https://arxiv.org/pdf/2412.12096v1.pdf
code: https://github.com/chengzhag/PanSplat
video: https://youtu.be/R3qIzL77ZSc
tags:
- 360 degree
- Code
- Feed-Forward
- Project
- Video
- World Generation
thumbnail: assets/thumbnails/zhang2024pansplat.jpg
publication_date: '2024-12-16T18:59:45+00:00'
date_source: arxiv
- id: taubner2024cap4d
title: 'CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View
Diffusion Models'
authors: Felix Taubner, Ruihang Zhang, Mathieu Tuli, David B. Lindell
year: '2024'
abstract: 'Reconstructing photorealistic and dynamic portrait avatars from images
is essential to many applications including advertising, visual effects, and virtual
reality. Depending on the application, avatar reconstruction involves different
capture setups and constraints − for example, visual effects studios use camera
arrays to capture hundreds of reference images, while content creators may seek
to animate a single portrait image downloaded from the internet. As such, there
is a large and heterogeneous ecosystem of methods for avatar reconstruction. Techniques
based on multi-view stereo or neural rendering achieve the highest quality results,
but require hundreds of reference images. Recent generative models produce convincing
avatars from a single reference image, but visual fidelity yet lags behind multi-view
techniques. Here, we present CAP4D: an approach that uses a morphable multi-view
diffusion model to reconstruct photoreal 4D (dynamic 3D) portrait avatars from
any number of reference images (i.e., one to 100) and animate and render them
in real time. Our approach demonstrates state-of-the-art performance for single-,
few-, and multi-image 4D portrait avatar reconstruction, and takes steps to bridge
the gap in visual fidelity between single-image and multi-view reconstruction
techniques.'
project_page: https://felixtaubner.github.io/cap4d/
paper: https://arxiv.org/pdf/2412.12093
code: null
video: null
tags:
- Avatar
- Project
thumbnail: assets/thumbnails/taubner2024cap4d.jpg
publication_date: '2024-12-16T18:58:51+00:00'
- id: liang2024wonderland
title: 'Wonderland: Navigating 3D Scenes from a Single Image'
authors: Hanwen Liang, Junli Cao, Vidit Goel, Guocheng Qian, Sergei Korolev, Demetri
Terzopoulos, Konstantinos N. Plataniotis, Sergey Tulyakov, Jian Ren
year: '2024'
abstract: 'This paper addresses a challenging question: How can we efficiently create
high-quality, wide-scope 3D scenes from a single arbitrary image? Existing methods
face several constraints, such as requiring multi-view data, time-consuming per-scene
optimization, low visual quality in backgrounds, and distorted reconstructions
in unseen areas. We propose a novel pipeline to overcome these limitations. Specifically,
we introduce a large-scale reconstruction model that uses latents from a video
diffusion model to predict 3D Gaussian Splattings for the scenes in a feed-forward
manner. The video diffusion model is designed to create videos precisely following
specified camera trajectories, allowing it to generate compressed video latents
that contain multi-view information while maintaining 3D consistency. We train
the 3D reconstruction model to operate on the video latent space with a progressive
training strategy, enabling the efficient generation of high-quality, wide-scope,
and generic 3D scenes. Extensive evaluations across various datasets demonstrate
that our model significantly outperforms existing methods for single-view 3D scene
generation, particularly with out-of-domain images. For the first time, we demonstrate
that a 3D reconstruction model can be effectively built upon the latent space
of a diffusion model to realize efficient 3D scene generation.
'
project_page: https://snap-research.github.io/wonderland/
paper: https://arxiv.org/pdf/2412.12091v1.pdf
code: null
video: null
tags:
- Feed-Forward
- Project
- Sparse
- World Generation
thumbnail: assets/thumbnails/liang2024wonderland.jpg
publication_date: '2024-12-16T18:58:17+00:00'
date_source: arxiv
- id: huang2024deformable
title: Deformable Radial Kernel Splatting
authors: Yi-Hua Huang, Ming-Xian Lin, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei
Cao, Xiaojuan Qi
year: '2024'
abstract: 'Recently, Gaussian splatting has emerged as a robust technique for representing
3D scenes, enabling real-time rasterization and high-fidelity rendering. However,
Gaussians'' inherent radial symmetry and smoothness constraints limit their ability
to represent complex shapes, often requiring thousands of primitives to approximate
detailed geometry. We introduce Deformable Radial Kernel (DRK), which extends
Gaussian splatting into a more general and flexible framework. Through learnable
radial bases with adjustable angles and scales, DRK efficiently models diverse
shape primitives while enabling precise control over edge sharpness and boundary
curvature. iven DRK''s planar nature, we further develop accurate ray-primitive
intersection computation for depth sorting and introduce efficient kernel culling
strategies for improved rasterization efficiency. Extensive experiments demonstrate
that DRK outperforms existing methods in both representation efficiency and rendering
quality, achieving state-of-the-art performance while dramatically reducing primitive
count.
'
project_page: https://yihua7.github.io/DRK-web/
paper: https://arxiv.org/pdf/2412.11752v1.pdf
code: null
video: null
tags:
- Optimization
- Project
- Rendering
thumbnail: assets/thumbnails/huang2024deformable.jpg
publication_date: '2024-12-16T13:11:02+00:00'
date_source: arxiv
- id: xu2024gaussianproperty
title: 'GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs'
authors: Xinli Xu, Wenhang Ge, Dicong Qiu, ZhiFei Chen, Dongyu Yan, Zhuoyun Liu,
Haoyu Zhao, Hanfeng Zhao, Shunsi Zhang, Junwei Liang, Ying-Cong Chen
year: '2024'
abstract: 'Estimating physical properties for visual data is a crucial task in computer
vision, graphics, and robotics, underpinning applications such as augmented reality,
physical simulation, and robotic grasping. However, this area remains under-explored
due to the inherent ambiguities in physical property estimation. To address these
challenges, we introduce GaussianProperty, a training-free framework that assigns
physical properties of materials to 3D Gaussians. Specifically, we integrate the
segmentation capability of SAM with the recognition capability of GPT-4V(ision)
to formulate a global-local physical property reasoning module for 2D images.
Then we project the physical properties from multi-view 2D images to 3D Gaussians
using a voting strategy. We demonstrate that 3D Gaussians with physical property
annotations enable applications in physics-based dynamic simulation and robotic
grasping. For physics-based dynamic simulation, we leverage the Material Point
Method (MPM) for realistic dynamic simulation. For robot grasping, we develop
a grasping force prediction strategy that estimates a safe force range required
for object grasping based on the estimated physical properties. Extensive experiments
on material segmentation, physics-based dynamic simulation, and robotic grasping
validate the effectiveness of our proposed method, highlighting its crucial role
in understanding physical properties from visual data. Online demo, code, more
cases and annotated datasets are available on \href{https://Gaussian-Property.github.io}{this
https URL}.
'
project_page: https://gaussian-property.github.io/
paper: https://arxiv.org/pdf/2412.11258.pdf
code: https://github.com/xxlbigbrother/Gaussian-Property
video: null
tags:
- Code
- Language Embedding
- Project
- Robotics
thumbnail: assets/thumbnails/xu2024gaussianproperty.jpg
publication_date: '2024-12-15T17:44:10+00:00'
date_source: arxiv
- id: liang2024supergseg
title: 'SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians'
authors: Siyun Liang, Sen Wang, Kunyi Li, Michael Niemeyer, Stefano Gasperini, Nassir
Navab, Federico Tombari
year: '2024'
abstract: '3D Gaussian Splatting has recently gained traction for its efficient
training and real-time rendering. While the vanilla Gaussian Splatting representation
is mainly designed for view synthesis, more recent works investigated how to extend
it with scene understanding and language features. However, existing methods lack
a detailed comprehension of scenes, limiting their ability to segment and interpret
complex structures. To this end, We introduce SuperGSeg, a novel approach that
fosters cohesive, context-aware scene representation by disentangling segmentation
and language field distillation. SuperGSeg first employs neural Gaussians to learn
instance and hierarchical segmentation features from multi-view images with the
aid of off-the-shelf 2D masks. These features are then leveraged to create a sparse
set of what we call Super-Gaussians. Super-Gaussians facilitate the distillation
of 2D language features into 3D space. Through Super-Gaussians, our method enables
high-dimensional language feature rendering without extreme increases in GPU memory.
Extensive experiments demonstrate that SuperGSeg outperforms prior works on both
open-vocabulary object localization and semantic segmentation tasks.
'
project_page: https://supergseg.github.io/
paper: https://arxiv.org/pdf/2412.10231.pdf
code: null
video: null
tags:
- Language Embedding
- Project
- Segmentation
thumbnail: assets/thumbnails/liang2024supergseg.jpg
publication_date: '2024-12-13T16:01:19+00:00'
date_source: arxiv
- id: tang2024gaf
title: 'GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view
Diffusion'
authors: Jiapeng Tang, Davide Davoli, Tobias Kirschstein, Liam Schoneveld, Matthias
Nießner
year: '2024'
abstract: We propose a novel approach for reconstructing animatable 3D Gaussian
avatars from monocular videos captured by commodity devices like smartphones.
Photorealistic 3D head avatar reconstruction from such recordings is challenging
due to limited observations, which leaves unobserved regions under-constrained
and can lead to artifacts in novel views. To address this problem, we introduce
a multi-view head diffusion model, leveraging its priors to fill in missing regions
and ensure view consistency in Gaussian splatting renderings. To enable precise
viewpoint control, we use normal maps rendered from FLAME-based head reconstruction,
which provides pixel-aligned inductive biases. We also condition the diffusion
model on VAE features extracted from the input image to preserve details of facial
identity and appearance. For Gaussian avatar reconstruction, we distill multi-view
diffusion priors by using iteratively denoised images as pseudo-ground truths,
effectively mitigating over-saturation issues. To further improve photorealism,
we apply latent upsampling to refine the denoised latent before decoding it into
an image. We evaluate our method on the NeRSemble dataset, showing that GAF outperforms
the previous state-of-the-art methods in novel view synthesis and novel expression
animation. Furthermore, we demonstrate higher-fidelity avatar reconstructions
from monocular videos captured on commodity devices.
project_page: https://tangjiapeng.github.io/projects/GAF/
paper: https://arxiv.org/pdf/2412.10209
code: null
video: https://www.youtube.com/embed/QuIYTljvhygE
tags:
- Avatar
- Project
- Video
thumbnail: assets/thumbnails/tang2024gaf.jpg
publication_date: '2024-12-13T15:31:22+00:00'
- id: park2024splinegs
title: 'SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians
from Monocular Video'
authors: Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello, Jaeho Moon,
Jihyong Oh, Munchurl Kim
year: '2024'
abstract: 'Synthesizing novel views from in-the-wild monocular videos is challenging
due to scene dynamics and the lack of multi-view cues. To address this, we propose
SplineGS, a COLMAP-free dynamic 3D Gaussian Splatting (3DGS) framework for high-quality
reconstruction and fast rendering from monocular videos. At its core is a novel
Motion-Adaptive Spline (MAS) method, which represents continuous dynamic 3D Gaussian
trajectories using cubic Hermite splines with a
gitextract__e_uw1ef/
├── .gitattributes
├── .github/
│ ├── CODEOWNERS
│ ├── FUNDING.yml
│ └── workflows/
│ ├── generate-html.yml
│ └── validate-pr.yml
├── .gitignore
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── awesome_3dgs_papers.yaml
├── editor.py
├── index.html
├── requirements.txt
└── src/
├── __init__.py
├── arxiv_integration.py
├── components/
│ ├── __init__.py
│ ├── dialogs.py
│ ├── thumbnail.py
│ └── widgets.py
├── fix_date.py
├── generate.py
├── helper.py
├── paper_generator.py
├── paper_schema.py
├── static/
│ ├── css/
│ │ ├── base.css
│ │ ├── components.css
│ │ └── responsive.css
│ └── js/
│ ├── filters.js
│ ├── main.js
│ ├── navigation.js
│ ├── selection.js
│ ├── sharing.js
│ ├── state.js
│ └── utils.js
├── template_engine.py
├── templates/
│ ├── index.html
│ └── paper_card.html
├── utils.py
├── validate_yaml.py
└── yaml_editor.py
SYMBOL INDEX (106 symbols across 18 files)
FILE: src/arxiv_integration.py
class ArxivIntegration (line 8) | class ArxivIntegration:
method __init__ (line 9) | def __init__(self):
method extract_arxiv_id (line 12) | def extract_arxiv_id(self, url_or_id: str) -> str:
method get_paper (line 29) | def get_paper(self, url_or_id: str) -> Optional[Dict[str, Any]]:
method append_to_yaml (line 61) | def append_to_yaml(self, entry: Dict[str, Any], filename: str = "aweso...
method format_yaml_entry (line 81) | def format_yaml_entry(entry: Dict[str, Any]) -> str:
function clean_and_quote (line 107) | def clean_and_quote(text: str) -> str:
function format_optional_field (line 117) | def format_optional_field(value) -> str:
FILE: src/components/dialogs.py
class ArxivAddDialog (line 12) | class ArxivAddDialog(QDialog):
method __init__ (line 13) | def __init__(self, parent=None):
method setup_ui (line 21) | def setup_ui(self):
method generate_thumbnail (line 43) | def generate_thumbnail(self, entry):
method add_paper (line 67) | def add_paper(self):
FILE: src/components/thumbnail.py
class ThumbnailGenerator (line 10) | class ThumbnailGenerator:
method __init__ (line 11) | def __init__(self, output_dir: str = "assets/thumbnails"):
method download_pdf (line 18) | def download_pdf(self, url: str) -> bytes:
method create_thumbnail (line 29) | def create_thumbnail(self, pdf_content: bytes, paper_id: str) -> bool:
FILE: src/components/widgets.py
class TagButton (line 3) | class TagButton(QPushButton):
method __init__ (line 4) | def __init__(self, text, active=False):
class URLWidget (line 25) | class URLWidget(QWidget):
method __init__ (line 26) | def __init__(self, label_text):
method set_text (line 40) | def set_text(self, value):
FILE: src/fix_date.py
class YAMLUpdater (line 9) | class YAMLUpdater:
method __init__ (line 10) | def __init__(self):
method extract_year_from_id (line 14) | def extract_year_from_id(self, paper_id: str) -> Optional[int]:
method extract_arxiv_id (line 21) | def extract_arxiv_id(self, url: str) -> Optional[str]:
method get_fallback_date (line 32) | def get_fallback_date(self, entry: Dict[str, Any]) -> Optional[str]:
method process_paper (line 51) | def process_paper(self, entry: Dict[str, Any]) -> Tuple[Dict[str, Any]...
method safe_sort_key (line 90) | def safe_sort_key(self, x: Dict[str, Any]) -> tuple:
method update_yaml_with_dates (line 112) | def update_yaml_with_dates(self, filename: str = "awesome_3dgs_papers....
FILE: src/generate.py
function generate_html (line 9) | def generate_html(entries: List[Dict[str, Any]], output_file: str) -> None:
function main (line 38) | def main():
FILE: src/helper.py
function generate_year_options (line 10) | def generate_year_options(entries: List[Dict[str, Any]]) -> str:
function generate_tag_filters (line 15) | def generate_tag_filters(entries: List[Dict[str, Any]]) -> str:
function generate_paper_cards (line 21) | def generate_paper_cards(entries: List[Dict[str, Any]]) -> str:
function format_publication_date (line 45) | def format_publication_date(date_str: str, date_source: str) -> str:
FILE: src/paper_generator.py
class PaperCardGenerator (line 7) | class PaperCardGenerator:
method __init__ (line 10) | def __init__(self, templates_dir: Path):
method _generate_link (line 14) | def _generate_link(self, url: str, icon: str, text: str, emoji: str = ...
method _generate_links (line 21) | def _generate_links(self, paper: Paper) -> str:
method _generate_tags (line 44) | def _generate_tags(self, paper: Paper) -> str:
method _generate_abstract (line 49) | def _generate_abstract(self, paper: Paper) -> str:
method generate_card (line 58) | def generate_card(self, paper: Paper) -> str:
method generate_cards (line 75) | def generate_cards(self, papers: List[Paper]) -> str:
FILE: src/paper_schema.py
class Paper (line 6) | class Paper:
method from_dict (line 23) | def from_dict(cls, data: dict) -> 'Paper':
method to_dict (line 74) | def to_dict(self) -> dict:
FILE: src/static/js/filters.js
function filterPapers (line 1) | function filterPapers() {
function clearSearch (line 41) | function clearSearch() {
function initializeFilters (line 46) | function initializeFilters() {
FILE: src/static/js/navigation.js
function scrollToTop (line 2) | function scrollToTop() {
function scrollToBottom (line 9) | function scrollToBottom() {
function updateScrollProgress (line 17) | function updateScrollProgress() {
function updateFilterStatus (line 25) | function updateFilterStatus() {
function createFilterTag (line 77) | function createFilterTag(type, title, info) {
function clearAllFilters (line 94) | function clearAllFilters() {
FILE: src/static/js/selection.js
function toggleSelectedOnly (line 1) | function toggleSelectedOnly() {
function toggleSelectionMode (line 24) | function toggleSelectionMode() {
function clearSelection (line 54) | function clearSelection() {
function togglePaperSelection (line 80) | function togglePaperSelection(paperId, checkbox) {
function removeFromSelection (line 119) | function removeFromSelection(paperId) {
function updateSelectionCount (line 143) | function updateSelectionCount() {
function handleCheckboxClick (line 148) | function handleCheckboxClick(ev, paperId, checkbox) {
function scrollToPaper (line 153) | function scrollToPaper(paperId) {
FILE: src/static/js/sharing.js
function showShareModal (line 1) | function showShareModal() {
function hideShareModal (line 17) | function hideShareModal() {
function copyShareLink (line 21) | async function copyShareLink() {
function copyBitcoinAddress (line 36) | function copyBitcoinAddress() {
function applyURLParams (line 48) | function applyURLParams() {
FILE: src/static/js/utils.js
function debounce (line 1) | function debounce(fn, delay) {
function updateURL (line 9) | function updateURL() {
function updatePaperNumbers (line 37) | function updatePaperNumbers() {
FILE: src/template_engine.py
class TemplateEngine (line 5) | class TemplateEngine:
method __init__ (line 6) | def __init__(self, template_path: Path):
method render (line 10) | def render(self, context: Dict[str, Any]) -> str:
FILE: src/utils.py
function read_files (line 4) | def read_files(base_dir: Path, file_paths: List[str]) -> List[str]:
function write_output (line 12) | def write_output(output_file: str, content: str) -> None:
FILE: src/validate_yaml.py
function validate_url (line 39) | def validate_url(url, required=False):
function get_changed_entries (line 66) | def get_changed_entries():
function validate_entries (line 101) | def validate_entries(entries):
function main (line 153) | def main():
FILE: src/yaml_editor.py
class YAMLEditor (line 17) | class YAMLEditor(QMainWindow):
method __init__ (line 18) | def __init__(self):
method safe_sort_key (line 53) | def safe_sort_key(self, x: Dict[str, Any]) -> tuple:
method load_yaml (line 84) | def load_yaml(self):
method setup_status_bar (line 102) | def setup_status_bar(self):
method show_save_feedback (line 107) | def show_save_feedback(self, success=True):
method clear_save_indicator (line 116) | def clear_save_indicator(self):
method setup_ui (line 120) | def setup_ui(self):
method auto_save (line 253) | def auto_save(self):
method handle_url_change (line 303) | def handle_url_change(self):
method get_entry_state (line 307) | def get_entry_state(self, entry):
method update_tags (line 314) | def update_tags(self):
method update_automatic_tags (line 320) | def update_automatic_tags(self):
method clear_search_results (line 341) | def clear_search_results(self):
method show_current_entry (line 346) | def show_current_entry(self):
method search_entry (line 378) | def search_entry(self):
method open_url (line 414) | def open_url(self, field):
method go_to_page (line 419) | def go_to_page(self):
method prev_entry (line 430) | def prev_entry(self):
method next_entry (line 436) | def next_entry(self):
method delete_current_entry (line 442) | def delete_current_entry(self):
method add_arxiv_button (line 479) | def add_arxiv_button(self):
method refresh_ui (line 484) | def refresh_ui(self):
method show_arxiv_dialog (line 505) | def show_arxiv_dialog(self):
function main (line 551) | def main():
Condensed preview — 40 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (2,639K chars).
[
{
"path": ".gitattributes",
"chars": 43,
"preview": "assets filter=lfs diff=lfs merge=lfs -text\n"
},
{
"path": ".github/CODEOWNERS",
"chars": 43,
"preview": "# Require review for all changes\n* @MrNeRF\n"
},
{
"path": ".github/FUNDING.yml",
"chars": 0,
"preview": ""
},
{
"path": ".github/workflows/generate-html.yml",
"chars": 1984,
"preview": "name: Generate HTML\non:\n pull_request:\n types: [closed]\n push:\n branches: [main]\n paths:\n - 'awesome_3dg"
},
{
"path": ".github/workflows/validate-pr.yml",
"chars": 805,
"preview": "name: Validate PR Changes\n\non:\n pull_request:\n branches: [ main ]\n paths:\n - 'awesome_3dgs_papers.yaml'\n\njob"
},
{
"path": ".gitignore",
"chars": 26,
"preview": ".venv\n__pycache__/\n*.pyc\n\n"
},
{
"path": "CONTRIBUTING.md",
"chars": 1775,
"preview": "# Contributing Guide\n\nThank you for your interest in contributing to the Awesome 3D Gaussian Splatting repository! This "
},
{
"path": "LICENSE",
"chars": 1064,
"preview": "MIT License\n\nCopyright (c) 2023 janusch\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof"
},
{
"path": "README.md",
"chars": 16910,
"preview": "# Awesome 3D Gaussian Splatting\n\n<div align=\"center\">\n A curated collection of resources focused on 3D Gaussian Splatti"
},
{
"path": "awesome_3dgs_papers.yaml",
"chars": 921032,
"preview": "- id: ren2025fastgs\n title: 'FastGS: Training 3D Gaussian Splatting in 100 Seconds'\n authors: Shiwei Ren, Tianci Wen, "
},
{
"path": "editor.py",
"chars": 72,
"preview": "from src.yaml_editor import main\n\nif __name__ == '__main__':\n main()\n"
},
{
"path": "index.html",
"chars": 1504672,
"preview": "<!DOCTYPE HTML>\n<html lang=\"en\">\n<head>\n <meta charset=\"UTF-8\">\n <meta name=\"viewport\" content=\"width=device-width"
},
{
"path": "requirements.txt",
"chars": 386,
"preview": "arxiv==2.1.3\ncertifi==2024.12.14\ncffi==1.17.1\ncharset-normalizer==3.4.1\ncryptography==44.0.0\nDeprecated==1.2.15\nfeedpars"
},
{
"path": "src/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "src/arxiv_integration.py",
"chars": 4717,
"preview": "import arxiv\nimport yaml\nimport re\nfrom urllib.parse import urlparse\nfrom typing import Optional, Dict, Any\n\n\nclass Arxi"
},
{
"path": "src/components/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "src/components/dialogs.py",
"chars": 4571,
"preview": "import arxiv\nfrom PyQt6.QtWidgets import (QApplication, QMainWindow, QWidget, QVBoxLayout, \n Q"
},
{
"path": "src/components/thumbnail.py",
"chars": 2426,
"preview": "from pathlib import Path\nimport requests\nfrom pdf2image import convert_from_bytes\nfrom PIL import Image\nimport logging\n\n"
},
{
"path": "src/components/widgets.py",
"chars": 1304,
"preview": "from PyQt6.QtWidgets import QPushButton, QWidget, QHBoxLayout, QLabel, QLineEdit\n\nclass TagButton(QPushButton):\n def "
},
{
"path": "src/fix_date.py",
"chars": 6114,
"preview": "import yaml\nimport arxiv\nimport time\nfrom datetime import datetime\nfrom typing import Dict, Any, Optional, List, Tuple\ni"
},
{
"path": "src/generate.py",
"chars": 1897,
"preview": "import yaml\nimport sys\nfrom pathlib import Path\nfrom typing import List, Dict, Any\nfrom helper import generate_year_opti"
},
{
"path": "src/helper.py",
"chars": 2386,
"preview": "import datetime\nfrom typing import List, Dict, Any\nfrom paper_schema import Paper\nfrom pathlib import Path\nfrom paper_ge"
},
{
"path": "src/paper_generator.py",
"chars": 3718,
"preview": "from pathlib import Path\nimport json\nfrom typing import List\nfrom paper_schema import Paper\nfrom template_engine import "
},
{
"path": "src/paper_schema.py",
"chars": 3619,
"preview": "from dataclasses import dataclass\nfrom typing import List, Optional\nfrom datetime import datetime\n\n@dataclass\nclass Pape"
},
{
"path": "src/static/css/base.css",
"chars": 9237,
"preview": "/* Root Variables */\n:root {\n --primary-color: #1772d0;\n --hover-color: #f09228;\n --bg-color: #ffffff;\n --ca"
},
{
"path": "src/static/css/components.css",
"chars": 10183,
"preview": ".selected-only-mode-toggle {\n position: fixed;\n bottom: 6rem;\n right: 2rem;\n background: var(--primary-color"
},
{
"path": "src/static/css/responsive.css",
"chars": 1138,
"preview": "@media (max-width: 1024px) {\n .container {\n padding: 0 1rem;\n }\n\n .selection-preview {\n display: "
},
{
"path": "src/static/js/filters.js",
"chars": 2774,
"preview": "function filterPapers() {\n // Show/hide non-paper elements regardless of filter state\n document.querySelectorAll('"
},
{
"path": "src/static/js/main.js",
"chars": 2623,
"preview": "document.addEventListener('DOMContentLoaded', function() {\n // Initialize variables\n window.paperCards = document."
},
{
"path": "src/static/js/navigation.js",
"chars": 4383,
"preview": "// Navigation controls\nfunction scrollToTop() {\n window.scrollTo({\n top: 0,\n behavior: 'smooth'\n });"
},
{
"path": "src/static/js/selection.js",
"chars": 5891,
"preview": "function toggleSelectedOnly() {\n state.onlyShowSelected = !state.onlyShowSelected;\n const button = document.queryS"
},
{
"path": "src/static/js/sharing.js",
"chars": 4025,
"preview": "function showShareModal() {\n if (state.selectedPapers.size === 0) {\n alert('Please select at least one paper t"
},
{
"path": "src/static/js/state.js",
"chars": 161,
"preview": "const state = {\n selectedPapers: new Set(),\n isSelectionMode: false,\n includeTags: new Set(),\n excludeTags: "
},
{
"path": "src/static/js/utils.js",
"chars": 1316,
"preview": "function debounce(fn, delay) {\n let timeout;\n return (...args) => {\n if (timeout) clearTimeout(timeout);\n "
},
{
"path": "src/template_engine.py",
"chars": 448,
"preview": "from string import Template as StringTemplate\nfrom typing import Dict, Any\nfrom pathlib import Path\n\nclass TemplateEngin"
},
{
"path": "src/templates/index.html",
"chars": 10054,
"preview": "<!DOCTYPE HTML>\n<html lang=\"en\">\n<head>\n <meta charset=\"UTF-8\">\n <meta name=\"viewport\" content=\"width=device-width"
},
{
"path": "src/templates/paper_card.html",
"chars": 743,
"preview": "<div class=\"paper-row\" data-id=\"$id\" data-title=\"$title\" data-authors=\"$authors\" data-year=\"$year\" data-tags='$tags_json"
},
{
"path": "src/utils.py",
"chars": 542,
"preview": "from pathlib import Path\nfrom typing import List\n\ndef read_files(base_dir: Path, file_paths: List[str]) -> List[str]:\n "
},
{
"path": "src/validate_yaml.py",
"chars": 6230,
"preview": "import yaml\nimport sys\nimport os\nimport requests\nimport time\nfrom urllib3.util.retry import Retry\nfrom requests.adapters"
},
{
"path": "src/yaml_editor.py",
"chars": 22360,
"preview": "import sys\nfrom src.fix_date import YAMLUpdater\nimport yaml\nimport webbrowser\nfrom PyQt6.QtWidgets import (QApplication,"
}
]
About this extraction
This page contains the full source code of the MrNeRF/awesome-3D-gaussian-splatting GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 40 files (2.4 MB), approximately 642.1k tokens, and a symbol index with 106 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.