Full Code of MrNeRF/awesome-3D-gaussian-splatting for AI

main c692febe4ec8 cached

40 files

2.4 MB

642.1k tokens

106 symbols

1 requests

Download .txt

Showing preview only (2,568K chars total). Download the full file or copy to clipboard to get everything.

Repository: MrNeRF/awesome-3D-gaussian-splatting
Branch: main
Commit: c692febe4ec8
Files: 40
Total size: 2.4 MB

Directory structure:
gitextract__e_uw1ef/

├── .gitattributes
├── .github/
│   ├── CODEOWNERS
│   ├── FUNDING.yml
│   └── workflows/
│       ├── generate-html.yml
│       └── validate-pr.yml
├── .gitignore
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── awesome_3dgs_papers.yaml
├── editor.py
├── index.html
├── requirements.txt
└── src/
    ├── __init__.py
    ├── arxiv_integration.py
    ├── components/
    │   ├── __init__.py
    │   ├── dialogs.py
    │   ├── thumbnail.py
    │   └── widgets.py
    ├── fix_date.py
    ├── generate.py
    ├── helper.py
    ├── paper_generator.py
    ├── paper_schema.py
    ├── static/
    │   ├── css/
    │   │   ├── base.css
    │   │   ├── components.css
    │   │   └── responsive.css
    │   └── js/
    │       ├── filters.js
    │       ├── main.js
    │       ├── navigation.js
    │       ├── selection.js
    │       ├── sharing.js
    │       ├── state.js
    │       └── utils.js
    ├── template_engine.py
    ├── templates/
    │   ├── index.html
    │   └── paper_card.html
    ├── utils.py
    ├── validate_yaml.py
    └── yaml_editor.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitattributes
================================================
assets filter=lfs diff=lfs merge=lfs -text


================================================
FILE: .github/CODEOWNERS
================================================
# Require review for all changes
* @MrNeRF


================================================
FILE: .github/FUNDING.yml
================================================


================================================
FILE: .github/workflows/generate-html.yml
================================================
name: Generate HTML
on:
  pull_request:
    types: [closed]
  push:
    branches: [main]
    paths:
      - 'awesome_3dgs_papers.yaml'

jobs:
  build:
    if: github.event.pull_request.merged == true || github.event_name == 'push'
    runs-on: ubuntu-latest
    permissions:
      contents: write
      pull-requests: write
    
    steps:
    - uses: actions/checkout@v3
      with:
        fetch-depth: 0
        ref: main
    
    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.10'
    
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt

    - name: Setup Git
      run: |
        git config --global user.name 'github-actions[bot]'
        git config --global user.email '41898282+github-actions[bot]@users.noreply.github.com'

    - name: Update HTML
      run: |
        git fetch origin update-html || true
        git checkout -B update-html
        python src/generate.py awesome_3dgs_papers.yaml index.html
        git add index.html
        if ! git diff --staged --quiet; then
          git commit -m "Update generated HTML"
          git push --force origin update-html
        fi

    - name: Create Pull Request
      uses: actions/github-script@v6
      with:
        script: |
          const { data: pulls } = await github.rest.pulls.list({
            owner: context.repo.owner,
            repo: context.repo.repo,
            head: `${context.repo.owner}:update-html`,
            base: 'main',
            state: 'open'
          });
          
          if (pulls.length === 0) {
            await github.rest.pulls.create({
              owner: context.repo.owner,
              repo: context.repo.repo,
              head: 'update-html',
              base: 'main',
              title: 'Update generated HTML',
              body: 'Auto-generated PR to update the HTML based on recent YAML changes.'
            });
          }


================================================
FILE: .github/workflows/validate-pr.yml
================================================
name: Validate PR Changes

on:
  pull_request:
    branches: [ main ]
    paths:
      - 'awesome_3dgs_papers.yaml'

jobs:
  validate:
    runs-on: ubuntu-latest
    permissions:
      pull-requests: read
      contents: read
    
    steps:
    - uses: actions/checkout@v3
      with:
        fetch-depth: 0
    
    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.10'
    
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt

    - name: Validate Changed YAML entries
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        PR_NUMBER: ${{ github.event.pull_request.number }}
        REPO: ${{ github.repository }}
      run: |
        python src/validate_yaml.py


================================================
FILE: .gitignore
================================================
.venv
__pycache__/
*.pyc



================================================
FILE: CONTRIBUTING.md
================================================
# Contributing Guide

Thank you for your interest in contributing to the Awesome 3D Gaussian Splatting repository! This document will guide you through the contribution process.

## Adding Papers

We use a custom YAML editor to maintain the paper database. To add or edit papers:

1. Clone the repository:
```bash
git clone https://github.com/MrNeRF/awesome-3D-gaussian-splatting.git
cd awesome-3D-gaussian-splatting
```

2. Install dependencies:
```bash
pip install -r requirements.txt
```

3. Install Poppler (required for PDF processing):
   - **Ubuntu/Debian:**
     ```bash
     sudo apt-get install poppler-utils
     ```
   - **macOS:**
     ```bash
     brew install poppler
     ```
   - **Windows:**
     - Download and install from: https://github.com/oschwartz10612/poppler-windows/releases/
     - Add the `bin` directory to your system PATH

4. Run the YAML editor:
```bash
python src/yaml_editor.py
```

5. Use the editor to:
   - Add new papers using the "Add from arXiv" button
   - Edit existing entries
   - Add tags, links, and other metadata
   - Preview thumbnails

6. The editor will automatically save changes to `awesome_3dgs_papers.yaml`

## Adding Other Resources

For adding other resources (implementations, tools, tutorials, etc.):

1. Fork the repository
2. Create a new branch (`git checkout -b feature/new-resource`)
3. Edit the README.md file
4. Commit your changes (`git commit -m 'Add new resource'`)
5. Push to your fork (`git push origin feature/new-resource`)
6. Open a Pull Request

Please ensure your additions:
- Are related to 3D Gaussian Splatting
- Have working links
- Are placed in the appropriate section
- Follow the existing formatting

---

By contributing to this repository, you agree to abide by its terms and conditions.

================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2023 janusch

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# Awesome 3D Gaussian Splatting

<div align="center">
  A curated collection of resources focused on 3D Gaussian Splatting (3DGS) and related technologies.

  [**Browse the Paper List**](https://mrnerf.github.io/awesome-3D-gaussian-splatting/) | [**Contribute**](CONTRIBUTING.md) | [**MrNeRF**](https://www.mrnerf.com)

</div>

## Contents

- [Papers &amp; Documentation](#papers--documentation)
- [Implementations](#implementations)
- [Viewers &amp; Game Engine Support](#viewers--game-engine-support)
- [Tools &amp; Utilities](#tools--utilities)
- [Learning Resources](#learning-resources)
- [Sponsors](#sponsors)

## Papers & Documentation

### Papers Database

Visit our comprehensive, searchable database of 3D Gaussian Splatting papers:
[Papers Database](https://mrnerf.github.io/awesome-3D-gaussian-splatting/)

### Courses

- [MIT Inverse Rendering Lectures (Module 2)](https://www.scenerepresentations.org/courses/inverse-graphics-23/) - Academic deep dive into inverse rendering

### Datasets

- [NERDS 360 Multi-View dataset](https://zubair-irshad.github.io/projects/neo360.html) - High-quality outdoor scene dataset

## Implementations

### Official Reference

- [Original Gaussian Splatting](https://github.com/graphdeco-inria/gaussian-splatting) - The reference implementation by the original authors

### Community Implementations

| Implementation                                                           | Language    | License    | Description                     |
| ------------------------------------------------------------------------ | ----------- | ---------- | ------------------------------- |
| [LichtFeld-Studio](https://github.com/MrNeRF/LichtFeld-Studio)                   | C++/CUDA    | GPL-3.0 | High-performance implementation |
| [Taichi 3D GS](https://github.com/wanmeihuali/taichi_3d_gaussian_splatting) | Taichi      | Apache-2.0 | Taichi-based implementation     |
| [Nerfstudio gsplat](https://github.com/nerfstudio-project/gsplat)           | Python/CUDA | Apache-2.0 | Integration with Nerfstudio     |
| [OpenSplat](https://github.com/pierotofy/OpenSplat)                         | C++/CPU/GPU | AGPL-3.0   | Cross-platform solution         |
| [Grendel](https://github.com/nyu-systems/Grendel-GS)                        | Python/CUDA | Apache-2.0 | Distributed computing focus     |
| [Warp 3DGS](https://github.com/guoriyue/3dgs-warp-scratch)                  | Warp/Python | AGPL-3.0   | Warp-based implementation       |

### Frameworks

- [Pointrix](https://github.com/pointrix-project/pointrix) - Differentiable point-based rendering
- [GauStudio](https://github.com/GAP-LAB-CUHK-SZ/gaustudio) - Unified framework with multiple implementations
- [DriveStudio](https://github.com/ziyc/drivestudio) - Urban scene reconstruction framework
- [GSCodecStudio](https://github.com/JasonLSC/GSCodec_Studio) - Compression and Dynamic splattings

## Viewers & Game Engine Support

### Game Engines

- [Unity Plugin](https://github.com/aras-p/UnityGaussianSplatting)
- [Unity Plugin (gsplat-unity)](https://github.com/wuyize25/gsplat-unity)
- [Unity Plugin (DynGsplat-unity)](https://github.com/HiFi-Human/DynGsplat-unity) - For dynamic splattings
- [Unreal Plugin (MLSLabsGaussianSplattingRenderer-UE) ](https://github.com/mlslabs/MLSLabsGaussianSplattingRenderer-UE))
- [Unreal Plugin (XV3DGS-UEPlugin)](https://github.com/xverse-engine/XV3DGS-UEPlugin)
- [PlayCanvas Engine](https://github.com/playcanvas/engine)

### Web Viewers

**WebGL**

- [Splat Viewer](https://github.com/antimatter15/splat)
- [Gauzilla](https://github.com/BladeTransformerLLC/gauzilla)
- [Interactive Viewer](https://github.com/kishimisu/Gaussian-Splatting-WebGL)
- [GaussianSplats3D](https://github.com/mkkellogg/GaussianSplats3D)
- [PlayCanvas Model Viewer](https://github.com/playcanvas/model-viewer)
- [SuperSplat Viewer](https://github.com/playcanvas/supersplat-viewer)

**WebGPU**

- [EPFL Viewer](https://github.com/cvlab-epfl/gaussian-splatting-web)
- [WebGPU Splat](https://github.com/KeKsBoTer/web-splat)

### Desktop Viewers

**Linux**

- [DearGaussianGUI](https://github.com/leviome/DearGaussianGUI)
- [LiteViz-GS](https://github.com/panxkun/liteviz-gs)

### Native Applications

- [Blender Add-on](https://github.com/ReshotAI/gaussian-splatting-blender-addon)
- [Blender Add-on (KIRI)](https://github.com/Kiri-Innovation/3dgs-render-blender-addon)
- [Blender Add-on (404—GEN)](https://github.com/404-Repo/three-gen-blender-plugin)
- [iOS Metal Viewer](https://github.com/laanlabs/metal-splats)
- [OpenGL Viewer](https://github.com/limacv/GaussianSplattingViewer)
- [VR Support (OpenXR)](https://github.com/hyperlogic/splatapult)
- [ROS2 Support](https://github.com/shadygm/ROSplat)

## Tools & Utilities

### Data Processing

- [Kapture](https://github.com/naver/kapture) - Unified data format for visual localization
- [3DGS Converter](https://github.com/francescofugazzi/3dgsconverter) - Format conversion tool
- [Point Cloud Editor](https://github.com/JohannesKrueger/pointcloudeditor) - Web-based point cloud editing
- [SPZ Converter](https://github.com/stytim/spz) - SPZ conversion tool
- [gsbox Converter](https://github.com/gotoeasy/gsbox) - PLY SPLAT SPZ SPX conversion tool
- [SplatTransform](https://github.com/playcanvas/splat-transform) - CLI tool for converting and editing splats
- [GaussForge](https://github.com/3dgscloud/GaussForge) - C++/WASM-based conversion between PLY, SPZ, SPLAT, and KSPLAT.

### Development Tools

- [GSOPs for Houdini](https://github.com/david-rhodes/GSOPs) - Houdini integration tools
- [camorph](https://github.com/Fraunhofer-IIS/camorph) - Camera parameter conversion
- [SuperSplat](https://github.com/playcanvas/supersplat) - Browser-based 3DGS editor

## Learning Resources

### Blog Posts

- [3DGS Introduction](https://huggingface.co/blog/gaussian-splatting) - HuggingFace guide
- [Implementation Details](https://github.com/kwea123/gaussian_splatting_notes) - Technical deep dive
- [Mathematical Foundation](https://github.com/chiehwangs/3d-gaussian-theory) - Theory explanation
- [Capture Guide](https://medium.com/@heyulei/capture-images-for-gaussian-splatting-81d081bbc826) - Image capture tutorial
- [PyTorch Implementation](https://myasincifci.github.io/) - Curated implementation of Vanilla 3DGS in PyTorch

### Talks

- [Gaussian Splats: Ready for Standardization?](https://www.youtube.com/watch?v=0xdPpKSkO3I) - Metaverse Standards Forum 1/28/2025
- [Unity Integration Guide](https://www.youtube.com/watch?v=pM_HV2TU4rU&t=5298s) - Metaverse Standards Forum 5/6/2025

### Video Tutorials

- [Getting Started (Windows)](https://youtu.be/UXtuigy_wYc)
- [Gaussian Splats Town Hall - Part 2](https://youtu.be/5_GaPYBHqOo)
- [Two-Minute Explanation](https://youtu.be/HVv_IQKlafQ)
- [Jupyter Tutorial](https://www.youtube.com/watch?v=OcvA7fmiZYM)

<br>

## Data
- [NERDS 360 Multi-View dataset for Outdoor Scenes](https://zubair-irshad.github.io/projects/neo360.html)

<br>

## Courses
- [MIT Inverse Rendering Lectures (Module 2)](https://www.scenerepresentations.org/courses/inverse-graphics-23/)

<br>

## Open Source Implementations 
### Reference 
- [Gaussian Splatting](https://github.com/graphdeco-inria/gaussian-splatting)

### Unofficial Implementations
|                                                                                             | Language       | License    |
|---------------------------------------------------------------------------------------------|----------------|------------|
| [Taichi 3D Gaussian Splatting](https://github.com/wanmeihuali/taichi_3d_gaussian_splatting) | taichi         | Apache-2.0 |
| [Gaussian Splatting 3D](https://github.com/heheyas/gaussian_splatting_3d)                   | Python/CUDA    |            |
| [3D Gaussian Splatting](https://github.com/WangFeng18/3d-gaussian-splatting)                | Python/CUDA    | MIT        |
| [fast](https://github.com/MrNeRF/gaussian-splatting-cuda)                                   | C++/CUDA       | Inria/MPII |
| [nerfstudio](https://github.com/nerfstudio-project/gsplat)                                  | Python/CUDA    | Apache-2.0 |
| [taichi-splatting](https://github.com/uc-vision/taichi-splatting)                           | taichi/PyTorch | Apache-2.0 |
| [OpenSplat](https://github.com/pierotofy/OpenSplat)                                         | C++/CPU or GPU | AGPL-3.0   |
| [3D Gaussian Splatting](https://github.com/joeyan/gaussian_splatting)                       | Python/CUDA    | MIT        |
| [Grendel Distributed 3DGS](https://github.com/nyu-systems/Grendel-GS)                       | Python/CUDA    | Apache-2.0 |

### 2D Gaussian Splatting
- [jupyter notebook 2D GS splatting](https://github.com/OutofAi/2D-Gaussian-Splatting)

### Gaussian Style Transfer 
- [Direct Gaussian Style Optimization (DGSO): Stylizing 3D Gaussian Splats](https://github.com/An-u-rag/stylized-gaussian-splatting) - Applying style transfer during gaussian optimization to produce stylized gaussian splats of a scene.

### Game Engines 
- [Unity](https://github.com/aras-p/UnityGaussianSplatting)
- [PlayCanvas](https://github.com/playcanvas/engine/tree/main/src/scene/gsplat)
- [Unreal](https://github.com/xverse-engine/XV3DGS-UEPlugin)

### Viewers
- [WebGL Viewer 1](https://github.com/antimatter15/splat)
- [WebGL Viewer 2](https://github.com/kishimisu/Gaussian-Splatting-WebGL)
- [WebGL Viewer 3](https://github.com/BladeTransformerLLC/gauzilla)
- [WebGPU Viewer 1](https://github.com/cvlab-epfl/gaussian-splatting-web)
- [WebGPU Viewer 2](https://github.com/MarcusAndreasSvensson/gaussian-splatting-webgpu)
- [WebGPU Viewer 3](https://github.com/KeKsBoTer/web-splat)
- [Three.js](https://github.com/mkkellogg/GaussianSplats3D)
- [A-Frame](https://github.com/quadjr/aframe-gaussian-splatting)
- [Nerfstudio Unofficial](https://github.com/yzslab/nerfstudio/tree/gaussian_splatting)
- [Nerfstudio Viser](https://github.com/nerfstudio-project/viser)
- [Blender (Editor)](https://github.com/ReshotAI/gaussian-splatting-blender-addon/tree/master)
- [WebRTC viewer](https://github.com/dylanebert/gaussian-viewer)
- [iOS & Metal viewer](https://github.com/laanlabs/metal-splats)
- [jupyter notebook](https://github.com/shumash/gaussian-splatting/blob/mshugrina/interactive/interactive.ipynb)
- [PyOpenGL viewer (also with official CUDA backend)](https://github.com/limacv/GaussianSplattingViewer.git)
- [PlayCanvas Viewer](https://github.com/playcanvas/model-viewer)
- [gsplat.js](https://github.com/dylanebert/gsplat.js)
- [Splatapult](https://github.com/hyperlogic/splatapult) - 3d gaussian splatting renderer in C++ and OpenGL, works with OpenXR for tethered VR
- [3DGS.cpp](https://github.com/shg8/3DGS.cpp) - cross-platform, high performance 3DGS renderer in C++ and Vulkan Compute, supporting Windows, macOS, Linux, iOS, and visionOS
- [vkgs](https://github.com/jaesung-cs/vkgs) - cross-platform, high performance 3DGS renderer in C++ and Vulkan Compute/Graphics
- [spaTV](https://github.com/antimatter15/splaTV) - WebGL Viewer for 4D Gaussians (based on SpaceTime Gaussian) with demo [here](http://antimatter15.com/splaTV/)
- [Taichi Viewer](https://github.com/uc-vision/splat-viewer)
- [uc-vision-splat-viewer](https://github.com/uc-vision/splat-viewer)(3D gaussin splatting renderer with benchmarking capability)
- [splatviz](https://github.com/Florian-Barthel/splatviz) - Viewer that allows you to edit the rendering code during runtime or to display multiple scenes at once.
- [Houdini Gaussian Splatting Viewport Renderer](https://github.com/rubendhz/houdini-gsplat-renderer) - A HDK/GLSL implementation of Gaussian Splatting in Houdini

### Utilities
- [Kapture](https://github.com/naver/kapture) - A unified data format to facilitate visual localization and structure from motion e.g. for bundler to colmap model conversion
- [Kapture image cropper script](https://gist.github.com/jo-chemla/258e6e40d3d6c2220b29518ff3c17c40) - Undistorted image cropper script to remove black borders with included conversion instructions
- [camorph](https://github.com/Fraunhofer-IIS/camorph) - A toolbox for conversion between camera parameter conventions e.g. Reality Capture to colmap model
- [3DGS Converter](https://github.com/francescofugazzi/3dgsconverter) - A tool for converting 3D Gaussian Splatting .ply files into a format suitable for Cloud Compare and vice-versa
- [SuperSplat](https://github.com/playcanvas/super-splat) - Open source browser-based tool to clean/filter, reorient and compress .ply/.splat files
- [SpectacularAI](https://github.com/SpectacularAI/point-cloud-tools) - Conversion scripts for different 3DGS conventions
- [GSOPs](https://github.com/david-rhodes/GSOPs) - GSOPs (Gaussian Splat Operators) for SideFX Houdini. Import, edit, and export models, or generate synthetic training data
- [Point Cloud Editor](https://github.com/JohannesKrueger/pointcloudeditor) - Clean and edit pointclouds from that are in colmap sparse format in a browser to improve reconstruction results

### Tutorial
- [Tutorial from the authors of 3DGS](https://3dgstutorial.github.io/)

### Framework
- [Pointrix](https://github.com/pointrix-project/pointrix) - A differentiable point-based rendering framework.
- [msplat](https://github.com/pointrix-project/msplat) - A modular differential gaussian rasterization library.
- [GauStudio](https://github.com/GAP-LAB-CUHK-SZ/gaustudio) - Unified framework with different paper implementations
- [DriveStudio](https://github.com/ziyc/drivestudio) - A 3DGS framework for omni urban scene reconstruction and simulation.
- [gaussian-splatting-lightning](https://github.com/yzslab/gaussian-splatting-lightning) - A 3D Gaussian Splatting framework with various derived algorithms and an interactive web viewer

### Other
- [My-exp-Gaussians](https://github.com/ingra14m/My-exp-Gaussians) - Enhancing the ability of 3D Gaussians to model complex scenes
- [360-gaussian-splatting](https://github.com/inuex35/360-gaussian-splatting) - Generate gaussian splatting directly from 360 images


## Blog Posts

1. [Gaussian Splatting is pretty cool](https://aras-p.info/blog/2023/09/05/Gaussian-Splatting-is-pretty-cool/)
2. [Making Gaussian Splats smaller](https://aras-p.info/blog/2023/09/13/Making-Gaussian-Splats-smaller/)
3. [Making Gaussian Splats more smaller](https://aras-p.info/blog/2023/09/27/Making-Gaussian-Splats-more-smaller/)
4. [Introduction to 3D Gaussian Splatting](https://huggingface.co/blog/gaussian-splatting)
5. [Very good (technical) intro to 3D Gaussian Splatting](https://medium.com/@AriaLeeNotAriel/numbynum-3d-gaussian-splatting-for-real-time-radiance-field-rendering-kerbl-et-al-60c0b25e5544)
6. [Write up on some mathematical details of the 3DGS implementation](https://github.com/kwea123/gaussian_splatting_notes)
7. [Discussion about gs universal format](https://github.com/mkkellogg/GaussianSplats3D/issues/47#issuecomment-1801360116)
8. [Math explanation to understand 3DGS](https://github.com/chiehwangs/3d-gaussian-theory)
9. [Compressing Gaussian Splats](https://blog.playcanvas.com/compressing-gaussian-splats/)
10. [Comprehensive overview of Gaussian Splatting](https://towardsdatascience.com/a-comprehensive-overview-of-gaussian-splatting-e7d570081362)
11. [Gaussian Head Avatars: A Summary](https://towardsdatascience.com/gaussian-head-avatars-a-summary-2bd17bd48500)
12. [NeRFs vs. 3DGS](https://edwardahn.me/writing/NeRFvs3DGS/)
13. [Howto capture images for 3DGS](https://medium.com/@heyulei/capture-images-for-gaussian-splatting-81d081bbc826)
14. [Mathematical details of forward and backward passes](https://github.com/joeyan/gaussian_splatting/blob/main/MATH.md)
15. [3D in Geospatial: NeRFs, Gaussian Splatting, and Spatial Computing](https://ckoziol.com/blog/2024/radiance_methods/)

## Tutorial Videos

1. [Getting Started with 3DGS for Windows](https://youtu.be/UXtuigy_wYc?si=j1vfORNspcocSH-b)
2. [How to view 3DGS Scenes in Unity](https://youtu.be/5_GaPYBHqOo?si=6u9j1HqXwF_5WSUL)
3. [Two-minute explanation of 3DGS](https://youtu.be/HVv_IQKlafQ?si=w5c9XKHfKIBuXDLW)
4. [Jupyter notebook tutorial](https://www.youtube.com/watch?v=OcvA7fmiZYM&t=2s)
5. [Intro to gaussian splatting (and Unity plugin)](https://www.xuanprada.com/blog/2023/10/22/intro-to-gaussian-splatting)
6. [Computerphile 3DGS explanation](https://youtu.be/VkIJbpdTujE?si=1GLjzBfF9LCuT22o)

## Credits

- Thanks to [Leonid Keselman](https://github.com/leonidk) for informing me about the release of the paper "Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting".
- Thanks to [Eric Haines](https://github.com/erich666) for suggesting the jupyter notebook viewer, windows tutorial and for fixing text hyphenations and other issues.
- Thanks to [Henry Pearce](https://github.com/henrypearce4D) for maintaining contributions.
=======
- [Yehe Liu](https://x.com/YeheLiu)
>>>>>>> 7656f5e7ed3bc239fae0e9a8e1990be82bd7daa9


================================================
FILE: awesome_3dgs_papers.yaml
================================================
- id: ren2025fastgs
  title: 'FastGS: Training 3D Gaussian Splatting in 100 Seconds'
  authors: Shiwei Ren, Tianci Wen, Yongchun Fang, Biao Lu
  year: '2025'
  abstract: 'The dominant 3D Gaussian splatting (3DGS) acceleration methods fail to
    properly regulate the number of Gaussians during training, causing redundant computational
    time overhead. In this paper, we propose FastGS, a novel, simple, and general
    acceleration framework that fully considers the importance of each Gaussian based
    on multi-view consistency, efficiently solving the trade-off between training
    time and rendering quality. We innovatively design a densification and pruning
    strategy based on multi-view consistency, dispensing with the budgeting mechanism.
    Extensive experiments on Mip-NeRF 360, Tanks & Temples, and Deep Blending datasets
    demonstrate that our method significantly outperforms the state-of-the-art methods
    in training speed, achieving a 3.32$\times$ training acceleration and comparable
    rendering quality compared with DashGaussian on the Mip-NeRF 360 dataset and a
    15.45$\times$ acceleration compared with vanilla 3DGS on the Deep Blending dataset.
    We demonstrate that FastGS exhibits strong generality, delivering 2-7$\times$
    training acceleration across various tasks, including dynamic scene reconstruction,
    surface reconstruction, sparse-view reconstruction, large-scale reconstruction,
    and simultaneous localization and mapping. The project page is available at https://fastgs.github.io/

    '
  project_page: https://fastgs.github.io/
  paper: https://arxiv.org/pdf/2511.04283.pdf
  code: https://github.com/fastgs/FastGS
  video: null
  tags:
  - Acceleration
  - Code
  - Densification
  - Dynamic
  - Project
  - Sparse
  thumbnail: assets/thumbnails/ren2025fastgs.jpg
  publication_date: '2025-11-06T11:21:16+00:00'
  date_source: arxiv
- id: pan2025diff4splat
  title: 'Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction
    Models'
  authors: Panwang Pan, Chenguo Lin, Jingjing Zhao, Chenxin Li, Yuchen Lin, Haopeng
    Li, Honglei Yan, Kairun Wen, Yunlong Lin, Yixuan Yuan, Yadong Mu
  year: '2025'
  abstract: 'We introduce Diff4Splat, a feed-forward method that synthesizes controllable
    and explicit 4D scenes from a single image. Our approach unifies the generative
    priors of video diffusion models with geometry and motion constraints learned
    from large-scale 4D datasets. Given a single input image, a camera trajectory,
    and an optional text prompt, Diff4Splat directly predicts a deformable 3D Gaussian
    field that encodes appearance, geometry, and motion, all in a single forward pass,
    without test-time optimization or post-hoc refinement. At the core of our framework
    lies a video latent transformer, which augments video diffusion models to jointly
    capture spatio-temporal dependencies and predict time-varying 3D Gaussian primitives.
    Training is guided by objectives on appearance fidelity, geometric accuracy, and
    motion consistency, enabling Diff4Splat to synthesize high-quality 4D scenes in
    30 seconds. We demonstrate the effectiveness of Diff4Splatacross video generation,
    novel view synthesis, and geometry extraction, where it matches or surpasses optimization-based
    methods for dynamic scene synthesis while being significantly more efficient.

    '
  project_page: https://paulpanwang.github.io/Diff4Splat/
  paper: https://arxiv.org/pdf/2511.00503.pdf
  code: https://github.com/paulpanwang/Diff4Splat
  video: https://www.youtube.com/watch?v=IZKt-pvCLd0
  tags:
  - Code
  - Diffusion
  - Dynamic
  - Feed-Forward
  - Gaussian Video
  - Project
  - Video
  - Virtual Reality
  - World Generation
  thumbnail: assets/thumbnails/pan2025diff4splat.jpg
  publication_date: '2025-11-01T11:16:25+00:00'
  date_source: arxiv
- id: xin2025learning
  title: Learning Unified Representation of 3D Gaussian Splatting
  authors: Yuelin Xin, Yuheng Liu, Xiaohui Xie, Xinke Li
  year: '2025'
  abstract: 'A well-designed vectorized representation is crucial for the learning
    systems natively based on 3D Gaussian Splatting. While 3DGS enables efficient
    and explicit 3D reconstruction, its parameter-based representation remains hard
    to learn as features, especially for neural-network-based models. Directly feeding
    raw Gaussian parameters into learning frameworks fails to address the non-unique
    and heterogeneous nature of the Gaussian parameterization, yielding highly data-dependent
    models. This challenge motivates us to explore a more principled approach to represent
    3D Gaussian Splatting in neural networks that preserves the underlying color and
    geometric structure while enforcing unique mapping and channel homogeneity. In
    this paper, we propose an embedding representation of 3DGS based on continuous
    submanifold fields that encapsulate the intrinsic information of Gaussian primitives,
    thereby benefiting the learning of 3DGS.

    '
  project_page: https://cilix-ai.github.io/gs-embedding-page/
  paper: https://arxiv.org/pdf/2509.22917.pdf
  code: https://github.com/cilix-ai/gs-embedding
  video: null
  tags:
  - Code
  - Compression
  - Feed-Forward
  - Point Cloud
  - Project
  - Segmentation
  thumbnail: assets/thumbnails/xin2025learning.jpg
  publication_date: '2025-09-26T20:44:59+00:00'
  date_source: arxiv
- id: chang2025meshsplat
  title: 'MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian
    Splatting'
  authors: Hanzhi Chang, Ruijie Zhu, Wenjie Chang, Mulin Yu, Yanzhe Liang, Jiahao
    Lu, Zhuoyuan Li, Tianzhu Zhang
  year: '2025'
  abstract: 'Surface reconstruction has been widely studied in computer vision and
    graphics. However, existing surface reconstruction works struggle to recover accurate
    scene geometry when the input views are extremely sparse. To address this issue,
    we propose MeshSplat, a generalizable sparse-view surface reconstruction framework
    via Gaussian Splatting. Our key idea is to leverage 2DGS as a bridge, which connects
    novel view synthesis to learned geometric priors and then transfers these priors
    to achieve surface reconstruction. Specifically, we incorporate a feed-forward
    network to predict per-view pixel-aligned 2DGS, which enables the network to synthesize
    novel view images and thus eliminates the need for direct 3D ground-truth supervision.
    To improve the accuracy of 2DGS position and orientation prediction, we propose
    a Weighted Chamfer Distance Loss to regularize the depth maps, especially in overlapping
    areas of input views, and also a normal prediction network to align the orientation
    of 2DGS with normal vectors predicted by a monocular normal estimator. Extensive
    experiments validate the effectiveness of our proposed improvement, demonstrating
    that our method achieves state-of-the-art performance in generalizable sparse-view
    mesh reconstruction tasks.

    '
  project_page: https://hanzhichang.github.io/meshsplat_web/
  paper: https://arxiv.org/pdf/2508.17811.pdf
  code: null
  video: https://hanzhichang.github.io/meshsplat_web/static/images/meshsplat/demo.mp4
  tags:
  - 2DGS
  - Feed-Forward
  - Meshing
  thumbnail: assets/thumbnails/chang2025meshsplat.jpg
  publication_date: '2025-08-25T09:04:20+00:00'
  date_source: arxiv
- id: zhu2025objectgs
  title: 'ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via
    Gaussian Splatting'
  authors: Ruijie Zhu, Mulin Yu, Linning Xu, Lihan Jiang, Yixuan Li, Tianzhu Zhang,
    Jiangmiao Pang, Bo Dai
  year: '2025'
  abstract: '3D Gaussian Splatting is renowned for its high-fidelity reconstructions
    and real-time novel view synthesis, yet its lack of semantic understanding limits
    object-level perception. In this work, we propose ObjectGS, an object-aware framework
    that unifies 3D scene reconstruction with semantic understanding. Instead of treating
    the scene as a unified whole, ObjectGS models individual objects as local anchors
    that generate neural Gaussians and share object IDs, enabling precise object-level
    reconstruction. During training, we dynamically grow or prune these anchors and
    optimize their features, while a one-hot ID encoding with a classification loss
    enforces clear semantic constraints. We show through extensive experiments that
    ObjectGS not only outperforms state-of-the-art methods on open-vocabulary and
    panoptic segmentation tasks, but also integrates seamlessly with applications
    like mesh extraction and scene editing.

    '
  project_page: https://ruijiezhu94.github.io/ObjectGS_page/
  paper: https://arxiv.org/pdf/2507.15454.pdf
  code: https://github.com/RuijieZhu94/ObjectGS
  video: null
  tags:
  - Code
  - Project
  - Segmentation
  - Language Embedding
  thumbnail: assets/thumbnails/zhu2025objectgs.jpg
  publication_date: '2025-07-21T10:06:23+00:00'
  date_source: arxiv
- id: wang2025adaptive
  title: Adaptive Voxelization for Transform coding of 3D Gaussian splatting data
  authors: Chenjunjie Wang, Shashank N. Sridhara, Eduardo Pavez, Antonio Ortega, Cheng
    Chang
  year: '2025'
  abstract: 'We present a novel compression framework for 3D Gaussian splatting (3DGS)
    data that leverages transform coding tools originally developed for point clouds.
    Contrary to existing 3DGS compression methods, our approach can produce compressed
    3DGS models at multiple bitrates in a computationally efficient way. Point cloud
    voxelization is a discretization technique that point cloud codecs use to improve
    coding efficiency while enabling the use of fast transform coding algorithms.
    We propose an adaptive voxelization algorithm tailored to 3DGS data, to avoid
    the inefficiencies introduced by uniform voxelization used in point cloud codecs.
    We ensure the positions of larger volume Gaussians are represented at high resolution,
    as these significantly impact rendering quality. Meanwhile, a low-resolution representation
    is used for dense regions with smaller Gaussians, which have a relatively lower
    impact on rendering quality. This adaptive voxelization approach significantly
    reduces the number of Gaussians and the bitrate required to encode the 3DGS data.
    After voxelization, many Gaussians are moved or eliminated. Thus, we propose to
    fine-tune/recolor the remaining 3DGS attributes with an initialization that can
    reduce the amount of retraining required. Experimental results on pre-trained
    datasets show that our proposed compression framework outperforms existing methods.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2506.00271.pdf
  code: https://github.com/STAC-USC/3DGS_Compression_Adaptive_Voxelization
  video: https://www.youtube.com/watch?v=o92Bj0k1izA
  tags:
  - Code
  - Compression
  - Video
  thumbnail: assets/thumbnails/wang2025adaptive.jpg
  publication_date: '2025-05-30T22:12:33+00:00'
  date_source: arxiv
- id: howil2025clipgaussian
  title: 'CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian
    Splatting'
  authors: Kornel Howil, Joanna Waczyńska, Piotr Borycki, Tadeusz Dziarmaga, Marcin
    Mazur, Przemysław Spurek
  year: '2025'
  abstract: 'Gaussian Splatting (GS) has recently emerged as an efficient representation
    for rendering 3D scenes from 2D images and has been extended to images, videos,
    and dynamic 4D content. However, applying style transfer to GS-based representations,
    especially beyond simple color changes, remains challenging. In this work, we
    introduce CLIPGaussians, the first unified style transfer framework that supports
    text- and image-guided stylization across multiple modalities: 2D images, videos,
    3D objects, and 4D scenes. Our method operates directly on Gaussian primitives
    and integrates into existing GS pipelines as a plug-in module, without requiring
    large generative models or retraining from scratch. CLIPGaussians approach enables
    joint optimization of color and geometry in 3D and 4D settings, and achieves temporal
    coherence in videos, while preserving a model size. We demonstrate superior style
    fidelity and consistency across all tasks, validating CLIPGaussians as a universal
    and efficient solution for multimodal style transfer.

    '
  project_page: https://kornelhowil.github.io/CLIPGaussian/
  paper: https://arxiv.org/pdf/2505.22854.pdf
  code: ' https://github.com/kornelhowil/CLIPGaussian'
  video: null
  tags:
  - Code
  - Project
  - Style Transfer
  thumbnail: assets/thumbnails/howil2025clipgaussian.jpg
  publication_date: '2025-05-28T20:41:24+00:00'
  date_source: arxiv
- id: gomez2025foci
  title: 'FOCI: Trajectory Optimization on Gaussian Splats'
  authors: Mario Gomez Andreu, Maximum Wilder-Smith, Victor Klemm, Vaishakh Patil,
    Jesus Tordesillas, Marco Hutter
  year: '2025'
  abstract: '3D Gaussian Splatting (3DGS) has recently gained popularity as a faster
    alternative to Neural Radiance Fields (NeRFs) in 3D reconstruction and view synthesis
    methods. Leveraging the spatial information encoded in 3DGS, this work proposes
    FOCI (Field Overlap Collision Integral), an algorithm that is able to optimize
    trajectories directly on the Gaussians themselves. FOCI leverages a novel and
    interpretable collision formulation for 3DGS using the notion of the overlap integral
    between Gaussians. Contrary to other approaches, which represent the robot with
    conservative bounding boxes that underestimate the traversability of the environment,
    we propose to represent the environment and the robot as Gaussian Splats. This
    not only has desirable computational properties, but also allows for orientation-aware
    planning, allowing the robot to pass through very tight and narrow spaces. We
    extensively test our algorithm in both synthetic and real Gaussian Splats, showcasing
    that collision-free trajectories for the ANYmal legged robot that can be computed
    in a few seconds, even with hundreds of thousands of Gaussians making up the environment.
    The project page and code are available at https://rffr.leggedrobotics.com/works/foci/

    '
  project_page: null
  paper: https://arxiv.org/pdf/2505.08510.pdf
  code: null
  video: null
  tags:
  - Optimization
  - Robotics
  thumbnail: assets/thumbnails/andreu2025foci.jpg
  publication_date: '2025-05-13T12:40:19+00:00'
  date_source: arxiv
- id: liu2025abcgs
  title: 'ABC-GS: Alignment-Based Controllable Style Transfer for 3D Gaussian Splatting'
  authors: Wenjie Liu, Zhongliang Liu, Xiaoyan Yang, Man Sha, Yang Li
  year: '2025'
  abstract: '3D scene stylization approaches based on Neural Radiance Fields (NeRF)
    achieve promising results by optimizing with Nearest Neighbor Feature Matching
    (NNFM) loss. However, NNFM loss does not consider global style information. In
    addition, the implicit representation of NeRF limits their fine-grained control
    over the resulting scenes. In this paper, we introduce ABC-GS, a novel framework
    based on 3D Gaussian Splatting to achieve high-quality 3D style transfer. To this
    end, a controllable matching stage is designed to achieve precise alignment between
    scene content and style features through segmentation masks. Moreover, a style
    transfer loss function based on feature alignment is proposed to ensure that the
    outcomes of style transfer accurately reflect the global style of the reference
    image. Furthermore, the original geometric information of the scene is preserved
    with the depth loss and Gaussian regularization terms. Extensive experiments show
    that our ABC-GS provides controllability of style transfer and achieves stylization
    results that are more faithfully aligned with the global style of the chosen artistic
    reference. Our homepage is available at https://vpx-ecnu.github.io/ABC-GS-website.

    '
  project_page: https://vpx-ecnu.github.io/ABC-GS-website/
  paper: https://arxiv.org/pdf/2503.22218.pdf
  code: https://github.com/vpx-ecnu/ABC-GS
  video: null
  tags:
  - Code
  - Project
  - Style Transfer
  thumbnail: assets/thumbnails/liu2025abcgs.jpg
  publication_date: '2025-03-28T08:07:57+00:00'
  date_source: arxiv
- id: huang2025stdloc
  title: 'From Sparse to Dense: Camera Relocalization with Scene-Specific Detector
    from Feature Gaussian Splatting'
  authors: Zhiwei Huang, Hailin Yu, Yichun Shentu, Jin Yuan, Guofeng Zhang
  year: '2025'
  abstract: 'This paper presents a novel camera relocalization method, STDLoc, which
    leverages Feature Gaussian as scene representation. STDLoc is a full relocalization
    pipeline that can achieve accurate relocalization without relying on any pose
    prior. Unlike previous coarse-to-fine localization methods that require image
    retrieval first and then feature matching, we propose a novel sparse-to-dense
    localization paradigm. Based on this scene representation, we introduce a novel
    matching-oriented Gaussian sampling strategy and a scene-specific detector to
    achieve efficient and robust initial pose estimation. Furthermore, based on the
    initial localization results, we align the query feature map to the Gaussian feature
    field by dense feature matching to enable accurate localization. The experiments
    on indoor and outdoor datasets show that STDLoc outperforms current state-of-the-art
    localization methods in terms of localization accuracy and recall.

    '
  project_page: https://zju3dv.github.io/STDLoc/
  paper: https://arxiv.org/pdf/2503.19358.pdf
  code: https://github.com/zju3dv/STDLoc
  video: null
  tags:
  - Code
  - Poses
  - Project
  thumbnail: assets/thumbnails/huang2025from.jpg
  publication_date: '2025-03-25T05:18:19+00:00'
  date_source: arxiv
- id: zhang2025motion
  title: Motion Blender Gaussian Splatting for Dynamic Scene Reconstruction
  authors: Xinyu Zhang, Haonan Chang, Yuhan Liu, Abdeslam Boularias
  year: '2025'
  abstract: 'Gaussian splatting has emerged as a powerful tool for high-fidelity reconstruction
    of dynamic scenes. However, existing methods primarily rely on implicit motion
    representations, such as encoding motions into neural networks or per-Gaussian
    parameters, which makes it difficult to further manipulate the reconstructed motions.
    This lack of explicit controllability limits existing methods to replaying recorded
    motions only, which hinders a wider application in robotics. To address this,
    we propose Motion Blender Gaussian Splatting (MBGS), a novel framework that uses
    motion graphs as an explicit and sparse motion representation. The motion of a
    graph''s links is propagated to individual Gaussians via dual quaternion skinning,
    with learnable weight painting functions that determine the influence of each
    link. The motion graphs and 3D Gaussians are jointly optimized from input videos
    via differentiable rendering. Experiments show that MBGS achieves state-of-the-art
    performance on the highly challenging iPhone dataset while being competitive on
    HyperNeRF. We demonstrate the application potential of our method in animating
    novel object poses, synthesizing real robot demonstrations, and predicting robot
    actions through visual planning. The source code, models, video demonstrations
    can be found at http://mlzxy.github.io/motion-blender-gs.

    '
  project_page: https://mlzxy.github.io/motion-blender-gs/
  paper: https://arxiv.org/pdf/2503.09040.pdf
  code: https://github.com/mlzxy/motion-blender-gs
  video: null
  tags:
  - Code
  - Dynamic
  - Editing
  - Project
  - Robotics
  - Segmentation
  thumbnail: assets/thumbnails/zhang2025motion.jpg
  publication_date: '2025-03-12T03:55:16+00:00'
  date_source: arxiv
- id: yu2025persistent
  title: Persistent Object Gaussian Splat (POGS) for Tracking Human and Robot Manipulation
    of Irregularly Shaped Objects
  authors: Justin Yu, Kush Hari, Karim El-Refai, Arnav Dalal, Justin Kerr, Chung Min
    Kim, Richard Cheng, Muhammad Zubair Irshad, Ken Goldberg
  year: '2025'
  abstract: 'Tracking and manipulating irregularly-shaped, previously unseen objects
    in dynamic environments is important for robotic applications in manufacturing,
    assembly, and logistics. Recently introduced Gaussian Splats efficiently model
    object geometry, but lack persistent state estimation for task-oriented manipulation.
    We present Persistent Object Gaussian Splat (POGS), a system that embeds semantics,
    self-supervised visual features, and object grouping features into a compact representation
    that can be continuously updated to estimate the pose of scanned objects. POGS
    updates object states without requiring expensive rescanning or prior CAD models
    of objects. After an initial multi-view scene capture and training phase, POGS
    uses a single stereo camera to integrate depth estimates along with self-supervised
    vision encoder features for object pose estimation. POGS supports grasping, reorientation,
    and natural language-driven manipulation by refining object pose estimates, facilitating
    sequential object reset operations with human-induced object perturbations and
    tool servoing, where robots recover tool pose despite tool perturbations of up
    to 30{\deg}. POGS achieves up to 12 consecutive successful object resets and recovers
    from 80% of in-grasp tool perturbations.

    '
  project_page: https://berkeleyautomation.github.io/POGS/
  paper: https://arxiv.org/pdf/2503.05189.pdf
  code: https://github.com/uynitsuj/pogs
  video: null
  tags:
  - Code
  - Language Embedding
  - Project
  - Robotics
  - Segmentation
  thumbnail: assets/thumbnails/yu2025persistent.jpg
  publication_date: '2025-03-07T07:20:25+00:00'
  date_source: arxiv
- id: chacko2025lifting
  title: 'Lifting by Gaussians: A Simple, Fast and Flexible Method for 3D Instance
    Segmentation'
  authors: Rohan Chacko, Nicolai Haeni, Eldar Khaliullin, Lin Sun, Douglas Lee
  year: '2025'
  abstract: 'We introduce Lifting By Gaussians (LBG), a novel approach for open-world
    instance segmentation of 3D Gaussian Splatted Radiance Fields (3DGS). Recently,
    3DGS Fields have emerged as a highly efficient and explicit alternative to Neural
    Field-based methods for high-quality Novel View Synthesis. Our 3D instance segmentation
    method directly lifts 2D segmentation masks from SAM (alternately FastSAM, etc.),
    together with features from CLIP and DINOv2, directly fusing them onto 3DGS (or
    similar Gaussian radiance fields such as 2DGS). Unlike previous approaches, LBG
    requires no per-scene training, allowing it to operate seamlessly on any existing
    3DGS reconstruction. Our approach is not only an order of magnitude faster and
    simpler than existing approaches; it is also highly modular, enabling 3D semantic
    segmentation of existing 3DGS fields without requiring a specific parametrization
    of the 3D Gaussians. Furthermore, our technique achieves superior semantic segmentation
    for 2D semantic novel view synthesis and 3D asset extraction results while maintaining
    flexibility and efficiency. We further introduce a novel approach to evaluate
    individually segmented 3D assets from 3D radiance field segmentation methods.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2502.00173.pdf
  code: null
  video: null
  tags:
  - Language Embedding
  - Segmentation
  - Virtual Reality
  thumbnail: assets/thumbnails/chacko2025lifting.jpg
  publication_date: '2025-01-31T21:30:59+00:00'
  date_source: arxiv
- id: lin2025omniphysgs
  title: 'OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics
    Generation'
  authors: Yuchen Lin, Chenguo Lin, Jianjin Xu, Yadong Mu
  year: '2025'
  abstract: 'Recently, significant advancements have been made in the reconstruction
    and generation of 3D assets, including static cases and those with physical interactions.
    To recover the physical properties of 3D assets, existing methods typically assume
    that all materials belong to a specific predefined category (e.g., elasticity).
    However, such assumptions ignore the complex composition of multiple heterogeneous
    objects in real scenarios and tend to render less physically plausible animation
    given a wider range of objects. We propose OmniPhysGS for synthesizing a physics-based
    3D dynamic scene composed of more general objects. A key design of OmniPhysGS
    is treating each 3D asset as a collection of constitutive 3D Gaussians. For each
    Gaussian, its physical material is represented by an ensemble of 12 physical domain-expert
    sub-models (rubber, metal, honey, water, etc.), which greatly enhances the flexibility
    of the proposed model. In the implementation, we define a scene by user-specified
    prompts and supervise the estimation of material weighting factors via a pretrained
    video diffusion model. Comprehensive experiments demonstrate that OmniPhysGS achieves
    more general and realistic physical dynamics across a broader spectrum of materials,
    including elastic, viscoelastic, plastic, and fluid substances, as well as interactions
    between different materials. Our method surpasses existing methods by approximately
    3% to 16% in metrics of visual quality and text alignment.

    '
  project_page: https://wgsxm.github.io/projects/omniphysgs/
  paper: https://arxiv.org/pdf/2501.18982.pdf
  code: https://github.com/wgsxm/omniphysgs
  video: https://wgsxm.github.io/videos/omniphysgs.mp4
  tags:
  - Code
  - Dynamic
  - Physics
  - Project
  - Video
  thumbnail: assets/thumbnails/lin2025omniphysgs.jpg
  publication_date: '2025-01-31T09:28:07+00:00'
  date_source: arxiv
- id: guizilini2025zeroshot
  title: Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion
  authors: Vitor Guizilini, Muhammad Zubair Irshad, Dian Chen, Greg Shakhnarovich,
    Rares Ambrus
  year: '2025'
  abstract: 'Current methods for 3D scene reconstruction from sparse posed images
    employ intermediate 3D representations such as neural fields, voxel grids, or
    3D Gaussians, to achieve multi-view consistent scene appearance and geometry.
    In this paper we introduce MVGD, a diffusion-based architecture capable of direct
    pixel-level generation of images and depth maps from novel viewpoints, given an
    arbitrary number of input views. Our method uses raymap conditioning to both augment
    visual features with spatial information from different viewpoints, as well as
    to guide the generation of images and depth maps from novel views. A key aspect
    of our approach is the multi-task generation of images and depth maps, using learnable
    task embeddings to guide the diffusion process towards specific modalities. We
    train this model on a collection of more than 60 million multi-view samples from
    publicly available datasets, and propose techniques to enable efficient and consistent
    learning in such diverse conditions. We also propose a novel strategy that enables
    the efficient training of larger models by incrementally fine-tuning smaller ones,
    with promising scaling behavior. Through extensive experiments, we report state-of-the-art
    results in multiple novel view synthesis benchmarks, as well as multi-view stereo
    and video depth estimation.

    '
  project_page: https://mvgd.github.io/
  paper: https://arxiv.org/pdf/2501.18804.pdf
  code: null
  video: null
  tags:
  - 360 degree
  - Diffusion
  - Feed-Forward
  - Large-Scale
  - Point Cloud
  - Project
  thumbnail: assets/thumbnails/guizilini2025zeroshot.jpg
  publication_date: '2025-01-30T23:43:06+00:00'
  date_source: arxiv
- id: lin2025diffsplat
  title: 'DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat
    Generation'
  authors: Chenguo Lin, Panwang Pan, Bangbang Yang, Zeming Li, Yadong Mu
  year: '2025'
  abstract: 'Recent advancements in 3D content generation from text or a single image
    struggle with limited high-quality 3D datasets and inconsistency from 2D multi-view
    generation. We introduce DiffSplat, a novel 3D generative framework that natively
    generates 3D Gaussian splats by taming large-scale text-to-image diffusion models.
    It differs from previous 3D generative models by effectively utilizing web-scale
    2D priors while maintaining 3D consistency in a unified model. To bootstrap the
    training, a lightweight reconstruction model is proposed to instantly produce
    multi-view Gaussian splat grids for scalable dataset curation. In conjunction
    with the regular diffusion loss on these grids, a 3D rendering loss is introduced
    to facilitate 3D coherence across arbitrary views. The compatibility with image
    diffusion models enables seamless adaptions of numerous techniques for image generation
    to the 3D realm. Extensive experiments reveal the superiority of DiffSplat in
    text- and image-conditioned generation tasks and downstream applications. Thorough
    ablation studies validate the efficacy of each critical design choice and provide
    insights into the underlying mechanism.

    '
  project_page: https://chenguolin.github.io/projects/DiffSplat/
  paper: https://arxiv.org/pdf/2501.16764.pdf
  code: https://github.com/chenguolin/DiffSplat
  video: null
  tags:
  - Diffusion
  - Project
  thumbnail: assets/thumbnails/lin2025diffsplat.jpg
  publication_date: '2025-01-28T07:38:59+00:00'
  date_source: arxiv
- id: armagan2025trickgs
  title: 'Trick-GS: A Balanced Bag of Tricks for Efficient Gaussian Splatting'
  authors: Anil Armagan, Albert Saà-Garriga, Bruno Manganelli, Mateusz Nowak, Mehmet
    Kerim Yucel
  year: '2025'
  abstract: 'Gaussian splatting (GS) for 3D reconstruction has become quite popular
    due to their fast training, inference speeds and high quality reconstruction.
    However, GS-based reconstructions generally consist of millions of Gaussians,
    which makes them hard to use on computationally constrained devices such as smartphones.
    In this paper, we first propose a principled analysis of advances in efficient
    GS methods. Then, we propose Trick-GS, which is a careful combination of several
    strategies including (1) progressive training with resolution, noise and Gaussian
    scales, (2) learning to prune and mask primitives and SH bands by their significance,
    and (3) accelerated GS training framework. Trick-GS takes a large step towards
    resource-constrained GS, where faster run-time, smaller and faster-convergence
    of models is of paramount concern. Our results on three datasets show that Trick-GS
    achieves up to 2x faster training, 40x smaller disk size and 2x faster rendering
    speed compared to vanilla GS, while having comparable accuracy.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2501.14534.pdf
  code: null
  video: null
  tags:
  - Acceleration
  thumbnail: assets/thumbnails/armagan2025trickgs.jpg
  publication_date: '2025-01-24T14:40:40+00:00'
  date_source: arxiv
- id: lee2025densesfm
  title: 'Dense-SfM: Structure from Motion with Dense Consistent Matching'
  authors: JongMin Lee, Sungjoo Yoo
  year: '2025'
  abstract: 'We present Dense-SfM, a novel Structure from Motion (SfM) framework designed
    for dense and accurate 3D reconstruction from multi-view images. Sparse keypoint
    matching, which traditional SfM methods often rely on, limits both accuracy and
    point density, especially in texture-less areas. Dense-SfM addresses this limitation
    by integrating dense matching with a Gaussian Splatting (GS) based track extension
    which gives more consistent, longer feature tracks. To further improve reconstruction
    accuracy, Dense-SfM is equipped with a multi-view kernelized matching module leveraging
    transformer and Gaussian Process architectures, for robust track refinement across
    multi-views. Evaluations on the ETH3D and Texture-Poor SfM datasets show that
    Dense-SfM offers significant improvements in accuracy and density over state-of-the-art
    methods.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2501.14277.pdf
  code: null
  video: null
  tags:
  - Point Cloud
  - Poses
  thumbnail: assets/thumbnails/lee2025densesfm.jpg
  publication_date: '2025-01-24T06:45:12+00:00'
  date_source: arxiv
- id: li2025micromacro
  title: Micro-macro Wavelet-based Gaussian Splatting for 3D Reconstruction from Unconstrained
    Images
  authors: Yihui Li, Chengxin Lv, Hongyu Yang, Di Huang
  year: '2025'
  abstract: '3D reconstruction from unconstrained image collections presents substantial
    challenges due to varying appearances and transient occlusions. In this paper,
    we introduce Micro-macro Wavelet-based Gaussian Splatting (MW-GS), a novel approach
    designed to enhance 3D reconstruction by disentangling scene representations into
    global, refined, and intrinsic components. The proposed method features two key
    innovations: Micro-macro Projection, which allows Gaussian points to capture details
    from feature maps across multiple scales with enhanced diversity; and Wavelet-based
    Sampling, which leverages frequency domain information to refine feature representations
    and significantly improve the modeling of scene appearances. Additionally, we
    incorporate a Hierarchical Residual Fusion Network to seamlessly integrate these
    features. Extensive experiments demonstrate that MW-GS delivers state-of-the-art
    rendering performance, surpassing existing methods.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2501.14231.pdf
  code: null
  video: null
  tags:
  - In the Wild
  thumbnail: assets/thumbnails/li2025micromacro.jpg
  publication_date: '2025-01-24T04:37:57+00:00'
  date_source: arxiv
- id: yu2025hammer
  title: 'HAMMER: Heterogeneous, Multi-Robot Semantic Gaussian Splatting'
  authors: Javier Yu, Timothy Chen, Mac Schwager
  year: '2025'
  abstract: '3D Gaussian Splatting offers expressive scene reconstruction, modeling
    a broad range of visual, geometric, and semantic information. However, efficient
    real-time map reconstruction with data streamed from multiple robots and devices
    remains a challenge. To that end, we propose HAMMER, a server-based collaborative
    Gaussian Splatting method that leverages widely available ROS communication infrastructure
    to generate 3D, metric-semantic maps from asynchronous robot data-streams with
    no prior knowledge of initial robot positions and varying on-device pose estimators.
    HAMMER consists of (i) a frame alignment module that transforms local SLAM poses
    and image data into a global frame and requires no prior relative pose knowledge,
    and (ii) an online module for training semantic 3DGS maps from streaming data.
    HAMMER handles mixed perception modes, adjusts automatically for variations in
    image pre-processing among different devices, and distills CLIP semantic codes
    into the 3D scene for open-vocabulary language queries. In our real-world experiments,
    HAMMER creates higher-fidelity maps (2x) compared to competing baselines and is
    useful for downstream tasks, such as semantic goal-conditioned navigation (e.g.,
    ``go to the couch"). Accompanying content available at hammer-project.github.io.

    '
  project_page: https://hammer-project.github.io/
  paper: https://arxiv.org/pdf/2501.14147.pdf
  code: null
  video: null
  tags:
  - Project
  - Robotics
  - SLAM
  thumbnail: assets/thumbnails/yu2025hammer.jpg
  publication_date: '2025-01-24T00:21:10+00:00'
  date_source: arxiv
- id: yang2025fast3r
  title: 'Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass'
  authors: Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang, Ang
    Cao, Joyce Chai, Franziska Meier, Matt Feiszli
  year: '2025'
  abstract: 'Multi-view 3D reconstruction remains a core challenge in computer vision,
    particularly in applications requiring accurate and scalable representations across
    diverse perspectives. Current leading methods such as DUSt3R employ a fundamentally
    pairwise approach, processing images in pairs and necessitating costly global
    alignment procedures to reconstruct from multiple views. In this work, we propose
    Fast 3D Reconstruction (Fast3R), a novel multi-view generalization to DUSt3R that
    achieves efficient and scalable 3D reconstruction by processing many views in
    parallel. Fast3R''s Transformer-based architecture forwards N images in a single
    forward pass, bypassing the need for iterative alignment. Through extensive experiments
    on camera pose estimation and 3D reconstruction, Fast3R demonstrates state-of-the-art
    performance, with significant improvements in inference speed and reduced error
    accumulation. These results establish Fast3R as a robust alternative for multi-view
    applications, offering enhanced scalability without compromising reconstruction
    accuracy.

    '
  project_page: https://fast3r-3d.github.io/
  paper: https://arxiv.org/pdf/2501.13928.pdf
  code: null
  video: null
  tags:
  - 3ster-based
  - Project
  thumbnail: assets/thumbnails/yang2025fast3r.jpg
  publication_date: '2025-01-23T18:59:55+00:00'
  date_source: arxiv
- id: sario2025gode
  title: 'GoDe: Gaussians on Demand for Progressive Level of Detail and Scalable Compression'
  authors: Francesco Di Sario, Riccardo Renzulli, Marco Grangetto, Akihiro Sugimoto,
    Enzo Tartaglione
  year: '2025'
  abstract: '3D Gaussian Splatting enhances real-time performance in novel view synthesis
    by representing scenes with mixtures of Gaussians and utilizing differentiable
    rasterization. However, it typically requires large storage capacity and high
    VRAM, demanding the design of effective pruning and compression techniques. Existing
    methods, while effective in some scenarios, struggle with scalability and fail
    to adapt models based on critical factors such as computing capabilities or bandwidth,
    requiring to re-train the model under different configurations. In this work,
    we propose a novel, model-agnostic technique that organizes Gaussians into several
    hierarchical layers, enabling progressive Level of Detail (LoD) strategy. This
    method, combined with recent approach of compression of 3DGS, allows a single
    model to instantly scale across several compression ratios, with minimal to none
    impact to quality compared to a single non-scalable model and without requiring
    re-training. We validate our approach on typical datasets and benchmarks, showcasing
    low distortion and substantial gains in terms of scalability and adaptability.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2501.13558.pdf
  code: null
  video: null
  tags:
  - Compression
  - LoD
  thumbnail: assets/thumbnails/sario2025gode.jpg
  publication_date: '2025-01-23T11:05:45+00:00'
  date_source: arxiv
- id: lan20253dgs2
  title: '3DGS$^2$: Near Second-order Converging 3D Gaussian Splatting'
  authors: Lei Lan, Tianjia Shao, Zixuan Lu, Yu Zhang, Chenfanfu Jiang, Yin Yang
  year: '2025'
  abstract: '3D Gaussian Splatting (3DGS) has emerged as a mainstream solution for
    novel view synthesis and 3D reconstruction. By explicitly encoding a 3D scene
    using a collection of Gaussian kernels, 3DGS achieves high-quality rendering with
    superior efficiency. As a learning-based approach, 3DGS training has been dealt
    with the standard stochastic gradient descent (SGD) method, which offers at most
    linear convergence. Consequently, training often requires tens of minutes, even
    with GPU acceleration. This paper introduces a (near) second-order convergent
    training algorithm for 3DGS, leveraging its unique properties. Our approach is
    inspired by two key observations. First, the attributes of a Gaussian kernel contribute
    independently to the image-space loss, which endorses isolated and local optimization
    algorithms. We exploit this by splitting the optimization at the level of individual
    kernel attributes, analytically constructing small-size Newton systems for each
    parameter group, and efficiently solving these systems on GPU threads. This achieves
    Newton-like convergence per training image without relying on the global Hessian.
    Second, kernels exhibit sparse and structured coupling across input images. This
    property allows us to effectively utilize spatial information to mitigate overshoot
    during stochastic training. Our method converges an order faster than standard
    GPU-based 3DGS training, requiring over $10\times$ fewer iterations while maintaining
    or surpassing the quality of the compared with the SGD-based 3DGS reconstructions.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2501.13975.pdf
  code: null
  video: null
  tags:
  - Optimization
  thumbnail: assets/thumbnails/lan20253dgs2.jpg
  publication_date: '2025-01-22T22:28:11+00:00'
  date_source: arxiv
- id: shi2025sketch
  title: 'Sketch and Patch: Efficient 3D Gaussian Representation for Man-Made Scenes'
  authors: Yuang Shi, Simone Gasparini, Géraldine Morin, Chenggang Yang, Wei Tsang
    Ooi
  year: '2025'
  abstract: '3D Gaussian Splatting (3DGS) has emerged as a promising representation
    for photorealistic rendering of 3D scenes. However, its high storage requirements
    pose significant challenges for practical applications. We observe that Gaussians
    exhibit distinct roles and characteristics that are analogous to traditional artistic
    techniques -- Like how artists first sketch outlines before filling in broader
    areas with color, some Gaussians capture high-frequency features like edges and
    contours; While other Gaussians represent broader, smoother regions, that are
    analogous to broader brush strokes that add volume and depth to a painting. Based
    on this observation, we propose a novel hybrid representation that categorizes
    Gaussians into (i) Sketch Gaussians, which define scene boundaries, and (ii) Patch
    Gaussians, which cover smooth regions. Sketch Gaussians are efficiently encoded
    using parametric models, leveraging their geometric coherence, while Patch Gaussians
    undergo optimized pruning, retraining, and vector quantization to maintain volumetric
    consistency and storage efficiency. Our comprehensive evaluation across diverse
    indoor and outdoor scenes demonstrates that this structure-aware approach achieves
    up to 32.62% improvement in PSNR, 19.12% in SSIM, and 45.41% in LPIPS at equivalent
    model sizes, and correspondingly, for an indoor scene, our model maintains the
    visual quality with 2.3% of the original model size.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2501.13045.pdf
  code: null
  video: null
  tags:
  - Densification
  thumbnail: assets/thumbnails/shi2025sketch.jpg
  publication_date: '2025-01-22T17:52:45+00:00'
  date_source: arxiv
- id: arunan2025darbsplatting
  title: 'DARB-Splatting: Generalizing Splatting with Decaying Anisotropic Radial
    Basis Functions'
  authors: Vishagar Arunan, Saeedha Nazar, Hashiru Pramuditha, Vinasirajan Viruthshaan,
    Sameera Ramasinghe, Simon Lucey, Ranga Rodrigo
  year: '2025'
  abstract: 'Splatting-based 3D reconstruction methods have gained popularity with
    the advent of 3D Gaussian Splatting, efficiently synthesizing high-quality novel
    views. These methods commonly resort to using exponential family functions, such
    as the Gaussian function, as reconstruction kernels due to their anisotropic nature,
    ease of projection, and differentiability in rasterization. However, the field
    remains restricted to variations within the exponential family, leaving generalized
    reconstruction kernels largely underexplored, partly due to the lack of easy integrability
    in 3D to 2D projections. In this light, we show that a class of decaying anisotropic
    radial basis functions (DARBFs), which are non-negative functions of the Mahalanobis
    distance, supports splatting by approximating the Gaussian function''s closed-form
    integration advantage. With this fresh perspective, we demonstrate up to 34% faster
    convergence during training and a 15% reduction in memory consumption across various
    DARB reconstruction kernels, while maintaining comparable PSNR, SSIM, and LPIPS
    results. We will make the code available.

    '
  project_page: https://randomnerds.github.io/darbs.github.io/
  paper: https://arxiv.org/pdf/2501.12369.pdf
  code: null
  video: null
  tags:
  - Project
  - Rendering
  thumbnail: assets/thumbnails/arunan2025darbsplatting.jpg
  publication_date: '2025-01-21T18:49:06+00:00'
  date_source: arxiv
- id: chen2025hac
  title: 'HAC++: Towards 100X Compression of 3D Gaussian Splatting'
  authors: Yihang Chen, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, Jianfei Cai
  year: '2025'
  abstract: '3D Gaussian Splatting (3DGS) has emerged as a promising framework for
    novel view synthesis, boasting rapid rendering speed with high fidelity. However,
    the substantial Gaussians and their associated attributes necessitate effective
    compression techniques. Nevertheless, the sparse and unorganized nature of the
    point cloud of Gaussians (or anchors in our paper) presents challenges for compression.
    To achieve a compact size, we propose HAC++, which leverages the relationships
    between unorganized anchors and a structured hash grid, utilizing their mutual
    information for context modeling. Additionally, HAC++ captures intra-anchor contextual
    relationships to further enhance compression performance. To facilitate entropy
    coding, we utilize Gaussian distributions to precisely estimate the probability
    of each quantized attribute, where an adaptive quantization module is proposed
    to enable high-precision quantization of these attributes for improved fidelity
    restoration. Moreover, we incorporate an adaptive masking strategy to eliminate
    invalid Gaussians and anchors. Overall, HAC++ achieves a remarkable size reduction
    of over 100X compared to vanilla 3DGS when averaged on all datasets, while simultaneously
    improving fidelity. It also delivers more than 20X size reduction compared to
    Scaffold-GS. Our code is available at https://github.com/YihangChen-ee/HAC-plus.

    '
  project_page: https://yihangchen-ee.github.io/project_hac++/
  paper: https://arxiv.org/pdf/2501.12255.pdf
  code: https://github.com/YihangChen-ee/HAC-plus
  video: null
  tags:
  - Code
  - Compression
  - Project
  thumbnail: assets/thumbnails/chen2025hac.jpg
  publication_date: '2025-01-21T16:23:05+00:00'
  date_source: arxiv
- id: li2025cargs
  title: 'Car-GS: Addressing Reflective and Transparent Surface Challenges in 3D Car
    Reconstruction'
  authors: Congcong Li, Jin Wang, Xiaomeng Wang, Xingchen Zhou, Wei Wu, Yuzhi Zhang,
    Tongyi Cao
  year: '2025'
  abstract: '3D car modeling is crucial for applications in autonomous driving systems,
    virtual and augmented reality, and gaming. However, due to the distinctive properties
    of cars, such as highly reflective and transparent surface materials, existing
    methods often struggle to achieve accurate 3D car reconstruction.To address these
    limitations, we propose Car-GS, a novel approach designed to mitigate the effects
    of specular highlights and the coupling of RGB and geometry in 3D geometric and
    shading reconstruction (3DGS). Our method incorporates three key innovations:
    First, we introduce view-dependent Gaussian primitives to effectively model surface
    reflections. Second, we identify the limitations of using a shared opacity parameter
    for both image rendering and geometric attributes when modeling transparent objects.
    To overcome this, we assign a learnable geometry-specific opacity to each 2D Gaussian
    primitive, dedicated solely to rendering depth and normals. Third, we observe
    that reconstruction errors are most prominent when the camera view is nearly orthogonal
    to glass surfaces. To address this issue, we develop a quality-aware supervision
    module that adaptively leverages normal priors from a pre-trained large-scale
    normal model.Experimental results demonstrate that Car-GS achieves precise reconstruction
    of car surfaces and significantly outperforms prior methods. The project page
    is available at https://lcc815.github.io/Car-GS.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2501.11020.pdf
  code: https://lcc815.github.io/Car-GS/
  video: null
  tags:
  - Code
  - Meshing
  - Rendering
  thumbnail: assets/thumbnails/li2025cargs.jpg
  publication_date: '2025-01-19T11:49:35+00:00'
  date_source: arxiv
- id: zheng2025gstar
  title: 'GSTAR: Gaussian Surface Tracking and Reconstruction'
  authors: Chengwei Zheng, Lixin Xue, Juan Zarate, Jie Song
  year: '2025'
  abstract: '3D Gaussian Splatting techniques have enabled efficient photo-realistic
    rendering of static scenes. Recent works have extended these approaches to support
    surface reconstruction and tracking. However, tracking dynamic surfaces with 3D
    Gaussians remains challenging due to complex topology changes, such as surfaces
    appearing, disappearing, or splitting. To address these challenges, we propose
    GSTAR, a novel method that achieves photo-realistic rendering, accurate surface
    reconstruction, and reliable 3D tracking for general dynamic scenes with changing
    topology. Given multi-view captures as input, GSTAR binds Gaussians to mesh faces
    to represent dynamic objects. For surfaces with consistent topology, GSTAR maintains
    the mesh topology and tracks the meshes using Gaussians. In regions where topology
    changes, GSTAR adaptively unbinds Gaussians from the mesh, enabling accurate registration
    and the generation of new surfaces based on these optimized Gaussians. Additionally,
    we introduce a surface-based scene flow method that provides robust initialization
    for tracking between frames. Experiments demonstrate that our method effectively
    tracks and reconstructs dynamic surfaces, enabling a range of applications. Our
    project page with the code release is available at https://eth-ait.github.io/GSTAR/.

    '
  project_page: chengwei-zheng.github.io/GSTAR/
  paper: https://arxiv.org/pdf/2501.10283.pdf
  code: null
  video: https://www.youtube.com/watch?v=Fwby4PrjFeM
  tags:
  - Avatar
  - Dynamic
  - Meshing
  - Project
  - Video
  thumbnail: assets/thumbnails/zheng2025gstar.jpg
  publication_date: '2025-01-17T16:26:24+00:00'
  date_source: arxiv
- id: ma2025cityloc
  title: 'CityLoc: 6 DoF Localization of Text Descriptions in Large-Scale Scenes with
    Gaussian Representation'
  authors: Qi Ma, Runyi Yang, Bin Ren, Ender Konukoglu, Luc Van Gool, Danda Pani Paudel
  year: '2025'
  abstract: 'Localizing text descriptions in large-scale 3D scenes is inherently an
    ambiguous task. This nonetheless arises while describing general concepts, e.g.
    all traffic lights in a city.   To facilitate reasoning based on such concepts,
    text localization in the form of distribution is required. In this paper, we generate
    the distribution of the camera poses conditioned upon the textual description.
    To facilitate such generation, we propose a diffusion-based architecture that
    conditionally diffuses the noisy 6DoF camera poses to their plausible locations.
    The conditional signals are derived from the text descriptions, using the pre-trained
    text encoders. The connection between text descriptions and pose distribution
    is established through pretrained Vision-Language-Model, i.e. CLIP. Furthermore,
    we demonstrate that the candidate poses for the distribution can be further refined
    by rendering potential poses using 3D Gaussian splatting, guiding incorrectly
    posed samples towards locations that better align with the textual description,
    through visual reasoning.   We demonstrate the effectiveness of our method by
    comparing it with both standard retrieval methods and learning-based approaches.
    Our proposed method consistently outperforms these baselines across all five large-scale
    datasets. Our source code and dataset will be made publicly available.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2501.08982.pdf
  code: null
  video: null
  tags:
  - Language Embedding
  - Large-Scale
  thumbnail: assets/thumbnails/ma2025cityloc.jpg
  publication_date: '2025-01-15T17:59:32+00:00'
  date_source: arxiv
- id: hong2025gslivo
  title: 'GS-LIVO: Real-Time LiDAR, Inertial, and Visual Multi-sensor Fused Odometry
    with Gaussian Mapping'
  authors: Sheng Hong, Chunran Zheng, Yishu Shen, Changze Li, Fu Zhang, Tong Qin,
    Shaojie Shen
  year: '2025'
  abstract: 'In recent years, 3D Gaussian splatting (3D-GS) has emerged as a novel
    scene representation approach. However, existing vision-only 3D-GS methods often
    rely on hand-crafted heuristics for point-cloud densification and face challenges
    in handling occlusions and high GPU memory and computation consumption. LiDAR-Inertial-Visual
    (LIV) sensor configuration has demonstrated superior performance in localization
    and dense mapping by leveraging complementary sensing characteristics: rich texture
    information from cameras, precise geometric measurements from LiDAR, and high-frequency
    motion data from IMU. Inspired by this, we propose a novel real-time Gaussian-based
    simultaneous localization and mapping (SLAM) system. Our map system comprises
    a global Gaussian map and a sliding window of Gaussians, along with an IESKF-based
    odometry. The global Gaussian map consists of hash-indexed voxels organized in
    a recursive octree, effectively covering sparse spatial volumes while adapting
    to different levels of detail and scales. The Gaussian map is initialized through
    multi-sensor fusion and optimized with photometric gradients. Our system incrementally
    maintains a sliding window of Gaussians, significantly reducing GPU computation
    and memory consumption by only optimizing the map within the sliding window. Moreover,
    we implement a tightly coupled multi-sensor fusion odometry with an iterative
    error state Kalman filter (IESKF), leveraging real-time updating and rendering
    of the Gaussian map. Our system represents the first real-time Gaussian-based
    SLAM framework deployable on resource-constrained embedded systems, demonstrated
    on the NVIDIA Jetson Orin NX platform. The framework achieves real-time performance
    while maintaining robust multi-sensor fusion capabilities. All implementation
    algorithms, hardware designs, and CAD models will be publicly available.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2501.08672.pdf
  code: null
  video: null
  tags:
  - Large-Scale
  - Lidar
  thumbnail: assets/thumbnails/hong2025gslivo.jpg
  publication_date: '2025-01-15T09:04:56+00:00'
  date_source: arxiv
- id: wu2025vingsmono
  title: 'VINGS-Mono: Visual-Inertial Gaussian Splatting Monocular SLAM in Large Scenes'
  authors: Ke Wu, Zicheng Zhang, Muer Tie, Ziqing Ai, Zhongxue Gan, Wenchao Ding
  year: '2025'
  abstract: 'VINGS-Mono is a monocular (inertial) Gaussian Splatting (GS) SLAM framework
    designed for large scenes. The framework comprises four main components: VIO Front
    End, 2D Gaussian Map, NVS Loop Closure, and Dynamic Eraser. In the VIO Front End,
    RGB frames are processed through dense bundle adjustment and uncertainty estimation
    to extract scene geometry and poses. Based on this output, the mapping module
    incrementally constructs and maintains a 2D Gaussian map. Key components of the
    2D Gaussian Map include a Sample-based Rasterizer, Score Manager, and Pose Refinement,
    which collectively improve mapping speed and localization accuracy. This enables
    the SLAM system to handle large-scale urban environments with up to 50 million
    Gaussian ellipsoids. To ensure global consistency in large-scale scenes, we design
    a Loop Closure module, which innovatively leverages the Novel View Synthesis (NVS)
    capabilities of Gaussian Splatting for loop closure detection and correction of
    the Gaussian map. Additionally, we propose a Dynamic Eraser to address the inevitable
    presence of dynamic objects in real-world outdoor scenes. Extensive evaluations
    in indoor and outdoor environments demonstrate that our approach achieves localization
    performance on par with Visual-Inertial Odometry while surpassing recent GS/NeRF
    SLAM methods. It also significantly outperforms all existing methods in terms
    of mapping and rendering quality. Furthermore, we developed a mobile app and verified
    that our framework can generate high-quality Gaussian maps in real time using
    only a smartphone camera and a low-frequency IMU sensor. To the best of our knowledge,
    VINGS-Mono is the first monocular Gaussian SLAM method capable of operating in
    outdoor environments and supporting kilometer-scale large scenes.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2501.08286.pdf
  code: null
  video: null
  tags:
  - Large-Scale
  - Meshing
  - SLAM
  thumbnail: assets/thumbnails/wu2025vingsmono.jpg
  publication_date: '2025-01-14T18:01:15+00:00'
  date_source: arxiv
- id: rogge2025objectcentric
  title: 'Object-Centric 2D Gaussian Splatting: Background Removal and Occlusion-Aware
    Pruning for Compact Object Models'
  authors: Marcel Rogge, Didier Stricker
  year: '2025'
  abstract: 'Current Gaussian Splatting approaches are effective for reconstructing
    entire scenes but lack the option to target specific objects, making them computationally
    expensive and unsuitable for object-specific applications. We propose a novel
    approach that leverages object masks to enable targeted reconstruction, resulting
    in object-centric models. Additionally, we introduce an occlusion-aware pruning
    strategy to minimize the number of Gaussians without compromising quality. Our
    method reconstructs compact object models, yielding object-centric Gaussian and
    mesh representations that are up to 96\% smaller and up to 71\% faster to train
    compared to the baseline while retaining competitive quality. These representations
    are immediately usable for downstream applications such as appearance editing
    and physics simulation without additional processing.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2501.08174.pdf
  code: null
  video: null
  tags:
  - Compression
  - Densification
  - Editing
  thumbnail: assets/thumbnails/rogge2025objectcentric.jpg
  publication_date: '2025-01-14T14:56:31+00:00'
  date_source: arxiv
- id: liu2025uncommon
  title: UnCommon Objects in 3D
  authors: Xingchen Liu, Piyush Tayal, Jianyuan Wang, Jesus Zarzar, Tom Monnier, Konstantinos
    Tertikas, Jiali Duan, Antoine Toisoul, Jason Y. Zhang, Natalia Neverova, Andrea
    Vedaldi, Roman Shapovalov, David Novotny
  year: '2025'
  abstract: 'We introduce Uncommon Objects in 3D (uCO3D), a new object-centric dataset
    for 3D deep learning and 3D generative AI. uCO3D is the largest publicly-available
    collection of high-resolution videos of objects with 3D annotations that ensures
    full-360$^{\circ}$ coverage. uCO3D is significantly more diverse than MVImgNet
    and CO3Dv2, covering more than 1,000 object categories. It is also of higher quality,
    due to extensive quality checks of both the collected videos and the 3D annotations.
    Similar to analogous datasets, uCO3D contains annotations for 3D camera poses,
    depth maps and sparse point clouds. In addition, each object is equipped with
    a caption and a 3D Gaussian Splat reconstruction. We train several large 3D models
    on MVImgNet, CO3Dv2, and uCO3D and obtain superior results using the latter, showing
    that uCO3D is better for learning applications.

    '
  project_page: https://uco3d.github.io/
  paper: https://arxiv.org/pdf/2501.07574.pdf
  code: https://github.com/facebookresearch/uco3d
  video: null
  tags:
  - Code
  - Project
  thumbnail: assets/thumbnails/liu2025uncommon.jpg
  publication_date: '2025-01-13T18:59:20+00:00'
  date_source: arxiv
- id: stuart20253dgstopc
  title: '3DGS-to-PC: Convert a 3D Gaussian Splatting Scene into a Dense Point Cloud
    or Mesh'
  authors: Lewis A G Stuart, Michael P Pound
  year: '2025'
  abstract: '3D Gaussian Splatting (3DGS) excels at producing highly detailed 3D reconstructions,
    but these scenes often require specialised renderers for effective visualisation.
    In contrast, point clouds are a widely used 3D representation and are compatible
    with most popular 3D processing software, yet converting 3DGS scenes into point
    clouds is a complex challenge. In this work we introduce 3DGS-to-PC, a flexible
    and highly customisable framework that is capable of transforming 3DGS scenes
    into dense, high-accuracy point clouds. We sample points probabilistically from
    each Gaussian as a 3D density function. We additionally threshold new points using
    the Mahalanobis distance to the Gaussian centre, preventing extreme outliers.
    The result is a point cloud that closely represents the shape encoded into the
    3D Gaussian scene. Individual Gaussians use spherical harmonics to adapt colours
    depending on view, and each point may contribute only subtle colour hints to the
    resulting rendered scene. To avoid spurious or incorrect colours that do not fit
    with the final point cloud, we recalculate Gaussian colours via a customised image
    rendering approach, assigning each Gaussian the colour of the pixel to which it
    contributes most across all views. 3DGS-to-PC also supports mesh generation through
    Poisson Surface Reconstruction, applied to points sampled from predicted surface
    Gaussians. This allows coloured meshes to be generated from 3DGS scenes without
    the need for re-training. This package is highly customisable and capability of
    simple integration into existing 3DGS pipelines. 3DGS-to-PC provides a powerful
    tool for converting 3DGS data into point cloud and surface-based formats.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2501.07478.pdf
  code: https://github.com/Lewis-Stuart-11/3DGS-to-PC
  video: null
  tags:
  - Code
  - Point Cloud
  thumbnail: assets/thumbnails/stuart20253dgstopc.jpg
  publication_date: '2025-01-13T16:52:28+00:00'
  date_source: arxiv
- id: zhang2025evaluating
  title: 'Evaluating Human Perception of Novel View Synthesis: Subjective Quality
    Assessment of Gaussian Splatting and NeRF in Dynamic Scenes'
  authors: Yuhang Zhang, Joshua Maraval, Zhengyu Zhang, Nicolas Ramin, Shishun Tian,
    Lu Zhang
  year: '2025'
  abstract: 'Gaussian Splatting (GS) and Neural Radiance Fields (NeRF) are two groundbreaking
    technologies that have revolutionized the field of Novel View Synthesis (NVS),
    enabling immersive photorealistic rendering and user experiences by synthesizing
    multiple viewpoints from a set of images of sparse views. The potential applications
    of NVS, such as high-quality virtual and augmented reality, detailed 3D modeling,
    and realistic medical organ imaging, underscore the importance of quality assessment
    of NVS methods from the perspective of human perception. Although some previous
    studies have explored subjective quality assessments for NVS technology, they
    still face several challenges, especially in NVS methods selection, scenario coverage,
    and evaluation methodology. To address these challenges, we conducted two subjective
    experiments for the quality assessment of NVS technologies containing both GS-based
    and NeRF-based methods, focusing on dynamic and real-world scenes. This study
    covers 360{\deg}, front-facing, and single-viewpoint videos while providing a
    richer and greater number of real scenes. Meanwhile, it''s the first time to explore
    the impact of NVS methods in dynamic scenes with moving objects. The two types
    of subjective experiments help to fully comprehend the influences of different
    viewing paths from a human perception perspective and pave the way for future
    development of full-reference and no-reference quality metrics. In addition, we
    established a comprehensive benchmark of various state-of-the-art objective metrics
    on the proposed database, highlighting that existing methods still struggle to
    accurately capture subjective quality. The results give us some insights into
    the limitations of existing NVS methods and may promote the development of new
    NVS methods.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2501.08072.pdf
  code: null
  video: null
  tags:
  - Dynamic
  thumbnail: assets/thumbnails/zhang2025evaluating.jpg
  publication_date: '2025-01-13T10:01:27+00:00'
  date_source: arxiv
- id: peng2025rmavatar
  title: 'RMAvatar: Photorealistic Human Avatar Reconstruction from Monocular Video
    Based on Rectified Mesh-embedded Gaussians'
  authors: Sen Peng, Weixing Xie, Zilong Wang, Xiaohu Guo, Zhonggui Chen, Baorong
    Yang, Xiao Dong
  year: '2025'
  abstract: 'We introduce RMAvatar, a novel human avatar representation with Gaussian
    splatting embedded on mesh to learn clothed avatar from a monocular video. We
    utilize the explicit mesh geometry to represent motion and shape of a virtual
    human and implicit appearance rendering with Gaussian Splatting. Our method consists
    of two main modules: Gaussian initialization module and Gaussian rectification
    module. We embed Gaussians into triangular faces and control their motion through
    the mesh, which ensures low-frequency motion and surface deformation of the avatar.
    Due to the limitations of LBS formula, the human skeleton is hard to control complex
    non-rigid transformations. We then design a pose-related Gaussian rectification
    module to learn fine-detailed non-rigid deformations, further improving the realism
    and expressiveness of the avatar. We conduct extensive experiments on public datasets,
    RMAvatar shows state-of-the-art performance on both rendering quality and quantitative
    evaluations. Please see our project page at https://rm-avatar.github.io.

    '
  project_page: https://rm-avatar.github.io/
  paper: https://arxiv.org/pdf/2501.07104.pdf
  code: https://github.com/RMAvatar/RMAvatar
  video: null
  tags:
  - Avatar
  - Code
  - Dynamic
  - Meshing
  - Monocular
  - Project
  thumbnail: assets/thumbnails/peng2025rmavatar.jpg
  publication_date: '2025-01-13T07:32:44+00:00'
  date_source: arxiv
- id: zielonka2025synthetic
  title: Synthetic Prior for Few-Shot Drivable Head Avatar Inversion
  authors: Wojciech Zielonka, Stephan J. Garbin, Alexandros Lattas, George Kopanas,
    Paulo Gotardo, Thabo Beeler, Justus Thies, Timo Bolkart
  year: '2025'
  abstract: 'We present SynShot, a novel method for the few-shot inversion of a drivable
    head avatar based on a synthetic prior. We tackle two major challenges. First,
    training a controllable 3D generative network requires a large number of diverse
    sequences, for which pairs of images and high-quality tracked meshes are not always
    available. Second, state-of-the-art monocular avatar models struggle to generalize
    to new views and expressions, lacking a strong prior and often overfitting to
    a specific viewpoint distribution. Inspired by machine learning models trained
    solely on synthetic data, we propose a method that learns a prior model from a
    large dataset of synthetic heads with diverse identities, expressions, and viewpoints.
    With few input images, SynShot fine-tunes the pretrained synthetic prior to bridge
    the domain gap, modeling a photorealistic head avatar that generalizes to novel
    expressions and viewpoints. We model the head avatar using 3D Gaussian splatting
    and a convolutional encoder-decoder that outputs Gaussian parameters in UV texture
    space. To account for the different modeling complexities over parts of the head
    (e.g., skin vs hair), we embed the prior with explicit control for upsampling
    the number of per-part primitives. Compared to state-of-the-art monocular methods
    that require thousands of real training images, SynShot significantly improves
    novel view and expression synthesis.

    '
  project_page: https://zielon.github.io/synshot/
  paper: https://arxiv.org/pdf/2501.06903.pdf
  code: null
  video: https://www.youtube.com/watch?v=4KQQatkaSgc
  tags:
  - Avatar
  - Dynamic
  - Project
  - Sparse
  - Video
  thumbnail: assets/thumbnails/zielonka2025synthetic.jpg
  publication_date: '2025-01-12T19:01:05+00:00'
  date_source: arxiv
- id: chen2025generalized
  title: Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution
  authors: Du Chen, Liyi Chen, Zhengqiang Zhang, Lei Zhang
  year: '2025'
  abstract: 'Equipped with the continuous representation capability of Multi-Layer
    Perceptron (MLP), Implicit Neural Representation (INR) has been successfully employed
    for Arbitrary-scale Super-Resolution (ASR). However, the limited receptive field
    of the linear layers in MLP restricts the representation capability of INR, while
    it is computationally expensive to query the MLP numerous times to render each
    pixel. Recently, Gaussian Splatting (GS) has shown its advantages over INR in
    both visual quality and rendering speed in 3D tasks, which motivates us to explore
    whether GS can be employed for the ASR task. However, directly applying GS to
    ASR is exceptionally challenging because the original GS is an optimization-based
    method through overfitting each single scene, while in ASR we aim to learn a single
    model that can generalize to different images and scaling factors. We overcome
    these challenges by developing two novel techniques. Firstly, to generalize GS
    for ASR, we elaborately design an architecture to predict the corresponding image-conditioned
    Gaussians of the input low-resolution image in a feed-forward manner. Secondly,
    we implement an efficient differentiable 2D GPU/CUDA-based scale-aware rasterization
    to render super-resolved images by sampling discrete RGB values from the predicted
    contiguous Gaussians. Via end-to-end training, our optimized network, namely GSASR,
    can perform ASR for any image and unseen scaling factors. Extensive experiments
    validate the effectiveness of our proposed method. The project page can be found
    at \url{https://mt-cly.github.io/GSASR.github.io/}.

    '
  project_page: https://mt-cly.github.io/GSASR.github.io/
  paper: https://arxiv.org/pdf/2501.06838.pdf
  code: null
  video: null
  tags:
  - Project
  - Super Resolution
  thumbnail: assets/thumbnails/chen2025generalized.jpg
  publication_date: '2025-01-12T15:14:58+00:00'
  date_source: arxiv
- id: wang2025f3dgaus
  title: 'F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Consistent
    Gaussian Splatting'
  authors: Yuxin Wang, Qianyi Wu, Dan Xu
  year: '2025'
  abstract: 'This paper tackles the problem of generalizable 3D-aware generation from
    monocular datasets, e.g., ImageNet. The key challenge of this task is learning
    a robust 3D-aware representation without multi-view or dynamic data, while ensuring
    consistent texture and geometry across different viewpoints. Although some baseline
    methods are capable of 3D-aware generation, the quality of the generated images
    still lags behind state-of-the-art 2D generation approaches, which excel in producing
    high-quality, detailed images. To address this severe limitation, we propose a
    novel feed-forward pipeline based on pixel-aligned Gaussian Splatting, coined
    as F3D-Gaus, which can produce more realistic and reliable 3D renderings from
    monocular inputs. In addition, we introduce a self-supervised cycle-consistent
    constraint to enforce cross-view consistency in the learned 3D representation.
    This training strategy naturally allows aggregation of multiple aligned Gaussian
    primitives and significantly alleviates the interpolation limitations inherent
    in single-view pixel-aligned Gaussian Splatting. Furthermore, we incorporate video
    model priors to perform geometry-aware refinement, enhancing the generation of
    fine details in wide-viewpoint scenarios and improving the model''s capability
    to capture intricate 3D textures. Extensive experiments demonstrate that our approach
    not only achieves high-quality, multi-view consistent 3D-aware generation from
    monocular datasets, but also significantly improves training and inference efficiency.

    '
  project_page: https://arxiv.org/abs/2501.06714
  paper: https://arxiv.org/pdf/2501.06714.pdf
  code: https://github.com/W-Ted/F3D-Gaus
  video: null
  tags:
  - Code
  - Feed-Forward
  - Monocular
  - Project
  thumbnail: assets/thumbnails/wang2025f3dgaus.jpg
  publication_date: '2025-01-12T04:44:44+00:00'
  date_source: arxiv
- id: asim2025met3r
  title: 'MEt3R: Measuring Multi-View Consistency in Generated Images'
  authors: Mohammad Asim, Christopher Wewer, Thomas Wimmer, Bernt Schiele, Jan Eric
    Lenssen
  year: '2025'
  abstract: 'We introduce MEt3R, a metric for multi-view consistency in generated
    images. Large-scale generative models for multi-view image generation are rapidly
    advancing the field of 3D inference from sparse observations. However, due to
    the nature of generative modeling, traditional reconstruction metrics are not
    suitable to measure the quality of generated outputs and metrics that are independent
    of the sampling procedure are desperately needed. In this work, we specifically
    address the aspect of consistency between generated multi-view images, which can
    be evaluated independently of the specific scene. Our approach uses DUSt3R to
    obtain dense 3D reconstructions from image pairs in a feed-forward manner, which
    are used to warp image contents from one view into the other. Then, feature maps
    of these images are compared to obtain a similarity score that is invariant to
    view-dependent effects. Using MEt3R, we evaluate the consistency of a large set
    of previous methods for novel view and video generation, including our open, multi-view
    latent diffusion model.

    '
  project_page: https://geometric-rl.mpi-inf.mpg.de/met3r/
  paper: https://arxiv.org/pdf/2501.06336.pdf
  code: https://github.com/mohammadasim98/MEt3R
  video: https://geometric-rl.mpi-inf.mpg.de/met3r/static/videos/teaser.mp4
  tags:
  - 3ster-based
  - Code
  - Diffusion
  - Project
  - Video
  thumbnail: assets/thumbnails/asim2025met3r.jpg
  publication_date: '2025-01-10T20:43:33+00:00'
  date_source: arxiv
- id: shin2025localityaware
  title: Locality-aware Gaussian Compression for Fast and High-quality Rendering
  authors: Seungjoo Shin, Jaesik Park, Sunghyun Cho
  year: '2025'
  abstract: 'We present LocoGS, a locality-aware 3D Gaussian Splatting (3DGS) framework
    that exploits the spatial coherence of 3D Gaussians for compact modeling of volumetric
    scenes. To this end, we first analyze the local coherence of 3D Gaussian attributes,
    and propose a novel locality-aware 3D Gaussian representation that effectively
    encodes locally-coherent Gaussian attributes using a neural field representation
    with a minimal storage requirement. On top of the novel representation, LocoGS
    is carefully designed with additional components such as dense initialization,
    an adaptive spherical harmonics bandwidth scheme and different encoding schemes
    for different Gaussian attributes to maximize compression performance. Experimental
    results demonstrate that our approach outperforms the rendering quality of existing
    compact Gaussian representations for representative real-world 3D datasets while
    achieving from 54.6$\times$ to 96.6$\times$ compressed storage size and from 2.1$\times$
    to 2.4$\times$ rendering speed than 3DGS. Even our approach also demonstrates
    an averaged 2.4$\times$ higher rendering speed than the state-of-the-art compression
    method with comparable compression performance.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2501.05757.pdf
  code: null
  video: null
  tags:
  - Compression
  thumbnail: assets/thumbnails/shin2025localityaware.jpg
  publication_date: '2025-01-10T07:19:41+00:00'
  date_source: arxiv
- id: yan2025consistent
  title: Consistent Flow Distillation for Text-to-3D Generation
  authors: Runjie Yan, Yinbo Chen, Xiaolong Wang
  year: '2025'
  abstract: 'Score Distillation Sampling (SDS) has made significant strides in distilling
    image-generative models for 3D generation. However, its maximum-likelihood-seeking
    behavior often leads to degraded visual quality and diversity, limiting its effectiveness
    in 3D applications. In this work, we propose Consistent Flow Distillation (CFD),
    which addresses these limitations. We begin by leveraging the gradient of the
    diffusion ODE or SDE sampling process to guide the 3D generation. From the gradient-based
    sampling perspective, we find that the consistency of 2D image flows across different
    viewpoints is important for high-quality 3D generation. To achieve this, we introduce
    multi-view consistent Gaussian noise on the 3D object, which can be rendered from
    various viewpoints to compute the flow gradient. Our experiments demonstrate that
    CFD, through consistent flows, significantly outperforms previous methods in text-to-3D
    generation.

    '
  project_page: https://runjie-yan.github.io/cfd/
  paper: https://arxiv.org/pdf/2501.05445.pdf
  code: https://github.com/runjie-yan/ConsistentFlowDistillation
  video: null
  tags:
  - Code
  - Diffusion
  - Project
  thumbnail: assets/thumbnails/yan2025consistent.jpg
  publication_date: '2025-01-09T18:56:05+00:00'
  date_source: arxiv
- id: meng2025zero1tog
  title: 'Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation'
  authors: Xuyi Meng, Chen Wang, Jiahui Lei, Kostas Daniilidis, Jiatao Gu, Lingjie
    Liu
  year: '2025'
  abstract: 'Recent advances in 2D image generation have achieved remarkable quality,largely
    driven by the capacity of diffusion models and the availability of large-scale
    datasets. However, direct 3D generation is still constrained by the scarcity and
    lower fidelity of 3D datasets. In this paper, we introduce Zero-1-to-G, a novel
    approach that addresses this problem by enabling direct single-view generation
    on Gaussian splats using pretrained 2D diffusion models. Our key insight is that
    Gaussian splats, a 3D representation, can be decomposed into multi-view images
    encoding different attributes. This reframes the challenging task of direct 3D
    generation within a 2D diffusion framework, allowing us to leverage the rich priors
    of pretrained 2D diffusion models. To incorporate 3D awareness, we introduce cross-view
    and cross-attribute attention layers, which capture complex correlations and enforce
    3D consistency across generated splats. This makes Zero-1-to-G the first direct
    image-to-3D generative model to effectively utilize pretrained 2D diffusion priors,
    enabling efficient training and improved generalization to unseen objects. Extensive
    experiments on both synthetic and in-the-wild datasets demonstrate superior performance
    in 3D object generation, offering a new approach to high-quality 3D generation.

    '
  project_page: https://mengxuyigit.github.io/projects/zero-1-to-G/
  paper: https://arxiv.org/pdf/2501.05427.pdf
  code: null
  video: null
  tags:
  - Diffusion
  - Project
  thumbnail: assets/thumbnails/meng2025zero1tog.jpg
  publication_date: '2025-01-09T18:37:35+00:00'
  date_source: arxiv
- id: gerogiannis2025arc2avatar
  title: 'Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID
    Guidance'
  authors: Dimitrios Gerogiannis, Foivos Paraperas Papantoniou, Rolandos Alexandros
    Potamias, Alexandros Lattas, Stefanos Zafeiriou
  year: '2025'
  abstract: 'Inspired by the effectiveness of 3D Gaussian Splatting (3DGS) in reconstructing
    detailed 3D scenes within multi-view setups and the emergence of large 2D human
    foundation models, we introduce Arc2Avatar, the first SDS-based method utilizing
    a human face foundation model as guidance with just a single image as input. To
    achieve that, we extend such a model for diverse-view human head generation by
    fine-tuning on synthetic data and modifying its conditioning. Our avatars maintain
    a dense correspondence with a human face mesh template, allowing blendshape-based
    expression generation. This is achieved through a modified 3DGS approach, connectivity
    regularizers, and a strategic initialization tailored for our task. Additionally,
    we propose an optional efficient SDS-based correction step to refine the blendshape
    expressions, enhancing realism and diversity. Experiments demonstrate that Arc2Avatar
    achieves state-of-the-art realism and identity preservation, effectively addressing
    color issues by allowing the use of very low guidance, enabled by our strong identity
    prior and initialization strategy, without compromising detail.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2501.05379.pdf
  code: null
  video: null
  tags:
  - Avatar
  - Diffusion
  thumbnail: assets/thumbnails/gerogiannis2025arc2avatar.jpg
  publication_date: '2025-01-09T17:04:33+00:00'
  date_source: arxiv
- id: tianci2025scaffoldslam
  title: 'Scaffold-SLAM: Structured 3D Gaussians for Simultaneous Localization and
    Photorealistic Mapping'
  authors: Wen Tianci, Liu Zhiang, Lu Biao, Fang Yongchun
  year: '2025'
  abstract: '3D Gaussian Splatting (3DGS) has recently revolutionized novel view synthesis
    in the Simultaneous Localization and Mapping (SLAM). However, existing SLAM methods
    utilizing 3DGS have failed to provide high-quality novel view rendering for monocular,
    stereo, and RGB-D cameras simultaneously. Notably, some methods perform well for
    RGB-D cameras but suffer significant degradation in rendering quality for monocular
    cameras. In this paper, we present Scaffold-SLAM, which delivers simultaneous
    localization and high-quality photorealistic mapping across monocular, stereo,
    and RGB-D cameras. We introduce two key innovations to achieve this state-of-the-art
    visual quality. First, we propose Appearance-from-Motion embedding, enabling 3D
    Gaussians to better model image appearance variations across different camera
    poses. Second, we introduce a frequency regularization pyramid to guide the distribution
    of Gaussians, allowing the model to effectively capture finer details in the scene.
    Extensive experiments on monocular, stereo, and RGB-D datasets demonstrate that
    Scaffold-SLAM significantly outperforms state-of-the-art methods in photorealistic
    mapping quality, e.g., PSNR is 16.76% higher in the TUM RGB-D datasets for monocular
    cameras.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2501.05242.pdf
  code: null
  video: null
  tags:
  - SLAM
  thumbnail: assets/thumbnails/tianci2025scaffoldslam.jpg
  publication_date: '2025-01-09T13:50:26+00:00'
  date_source: arxiv
- id: bond2025gaussianvideo
  title: 'GaussianVideo: Efficient Video Representation via Hierarchical Gaussian
    Splatting'
  authors: Andrew Bond, Jui-Hsien Wang, Long Mai, Erkut Erdem, Aykut Erdem
  year: '2025'
  abstract: 'Efficient neural representations for dynamic video scenes are critical
    for applications ranging from video compression to interactive simulations. Yet,
    existing methods often face challenges related to high memory usage, lengthy training
    times, and temporal consistency. To address these issues, we introduce a novel
    neural video representation that combines 3D Gaussian splatting with continuous
    camera motion modeling. By leveraging Neural ODEs, our approach learns smooth
    camera trajectories while maintaining an explicit 3D scene representation through
    Gaussians. Additionally, we introduce a spatiotemporal hierarchical learning strategy,
    progressively refining spatial and temporal features to enhance reconstruction
    quality and accelerate convergence. This memory-efficient approach achieves high-quality
    rendering at impressive speeds. Experimental results show that our hierarchical
    learning, combined with robust camera motion modeling, captures complex dynamic
    scenes with strong temporal consistency, achieving state-of-the-art performance
    across diverse video datasets in both high- and low-motion scenarios.

    '
  project_page: https://cyberiada.github.io/GaussianVideo/
  paper: https://arxiv.org/pdf/2501.04782.pdf
  code: null
  video: null
  tags:
  - Gaussian Video
  - Project
  - Video
  thumbnail: assets/thumbnails/bond2025gaussianvideo.jpg
  publication_date: '2025-01-08T19:01:12+00:00'
  date_source: arxiv
- id: huang2025fatesgs
  title: 'FatesGS: Fast and Accurate Sparse-View Surface Reconstruction using Gaussian
    Splatting with Depth-Feature Consistency'
  authors: Han Huang, Yulun Wu, Chao Deng, Ge Gao, Ming Gu, Yu-Shen Liu
  year: '2025'
  abstract: 'Recently, Gaussian Splatting has sparked a new trend in the field of
    computer vision. Apart from novel view synthesis, it has also been extended to
    the area of multi-view reconstruction. The latest methods facilitate complete,
    detailed surface reconstruction while ensuring fast training speed. However, these
    methods still require dense input views, and their output quality significantly
    degrades with sparse views. We observed that the Gaussian primitives tend to overfit
    the few training views, leading to noisy floaters and incomplete reconstruction
    surfaces. In this paper, we present an innovative sparse-view reconstruction framework
    that leverages intra-view depth and multi-view feature consistency to achieve
    remarkably accurate surface reconstruction. Specifically, we utilize monocular
    depth ranking information to supervise the consistency of depth distribution within
    patches and employ a smoothness loss to enhance the continuity of the distribution.
    To achieve finer surface reconstruction, we optimize the absolute position of
    depth through multi-view projection features. Extensive experiments on DTU and
    BlendedMVS demonstrate that our method outperforms state-of-the-art methods with
    a speedup of 60x to 200x, achieving swift and fine-grained mesh reconstruction
    without the need for costly pre-training.

    '
  project_page: https://alvin528.github.io/FatesGS/
  paper: https://arxiv.org/pdf/2501.04628.pdf
  code: null
  video: null
  tags:
  - Meshing
  - Project
  - Sparse
  thumbnail: assets/thumbnails/huang2025fatesgs.jpg
  publication_date: '2025-01-08T17:19:35+00:00'
  date_source: arxiv
- id: kwak2025modecgs
  title: 'MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment
    for Compact Dynamic 3D Gaussian Splatting'
  authors: Sangwoon Kwak, Joonsoo Kim, Jun Young Jeong, Won-Sik Cheong, Jihyong Oh,
    Munchurl Kim
  year: '2025'
  abstract: '3D Gaussian Splatting (3DGS) has made significant strides in scene representation
    and neural rendering, with intense efforts focused on adapting it for dynamic
    scenes. Despite delivering remarkable rendering quality and speed, existing methods
    struggle with storage demands and representing complex real-world motions. To
    tackle these issues, we propose MoDecGS, a memory-efficient Gaussian splatting
    framework designed for reconstructing novel views in challenging scenarios with
    complex motions. We introduce GlobaltoLocal Motion Decomposition (GLMD) to effectively
    capture dynamic motions in a coarsetofine manner. This approach leverages Global
    Canonical Scaffolds (Global CS) and Local Canonical Scaffolds (Local CS), extending
    static Scaffold representation to dynamic video reconstruction. For Global CS,
    we propose Global Anchor Deformation (GAD) to efficiently represent global dynamics
    along complex motions, by directly deforming the implicit Scaffold attributes
    which are anchor position, offset, and local context features. Next, we finely
    adjust local motions via the Local Gaussian Deformation (LGD) of Local CS explicitly.
    Additionally, we introduce Temporal Interval Adjustment (TIA) to automatically
    control the temporal coverage of each Local CS during training, allowing MoDecGS
    to find optimal interval assignments based on the specified number of temporal
    segments. Extensive evaluations demonstrate that MoDecGS achieves an average 70%
    reduction in model size over stateoftheart methods for dynamic 3D Gaussians from
    realworld dynamic videos while maintaining or even improving rendering quality.

    '
  project_page: 'MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval
    Adjustment for Compact Dynamic 3D Gaussian Splatting'
  paper: https://arxiv.org/pdf/2501.03714.pdf
  code: null
  video: https://youtu.be/5L6gzc5-cw8?si=L6v6XLZFQrYK50iV
  tags:
  - Compression
  - Dynamic
  - Project
  - Video
  thumbnail: assets/thumbnails/kwak2025modecgs.jpg
  publication_date: '2025-01-07T11:43:13+00:00'
  date_source: arxiv
- id: yu2025dehazegs
  title: 'DehazeGS: Seeing Through Fog with 3D Gaussian Splatting'
  authors: Jinze Yu, Yiqun Wang, Zhengda Lu, Jianwei Guo, Yong Li, Hongxing Qin, Xiaopeng
    Zhang
  year: '2025'
  abstract: 'Current novel view synthesis tasks primarily rely on high-quality and
    clear images. However, in foggy scenes, scattering and attenuation can significantly
    degrade the reconstruction and rendering quality. Although NeRF-based dehazing
    reconstruction algorithms have been developed, their use of deep fully connected
    neural networks and per-ray sampling strategies leads to high computational costs.
    Moreover, NeRF''s implicit representation struggles to recover fine details from
    hazy scenes. In contrast, recent advancements in 3D Gaussian Splatting achieve
    high-quality 3D scene reconstruction by explicitly modeling point clouds into
    3D Gaussians. In this paper, we propose leveraging the explicit Gaussian representation
    to explain the foggy image formation process through a physically accurate forward
    rendering process. We introduce DehazeGS, a method capable of decomposing and
    rendering a fog-free background from participating media using only muti-view
    foggy images as input. We model the transmission within each Gaussian distribution
    to simulate the formation of fog. During this process, we jointly learn the atmospheric
    light and scattering coefficient while optimizing the Gaussian representation
    of the hazy scene. In the inference stage, we eliminate the effects of scattering
    and attenuation on the Gaussians and directly project them onto a 2D plane to
    obtain a clear view. Experiments on both synthetic and real-world foggy datasets
    demonstrate that DehazeGS achieves state-of-the-art performance in terms of both
    rendering quality and computational efficiency.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2501.03659.pdf
  code: null
  video: null
  tags:
  - In the Wild
  - Rendering
  thumbnail: assets/thumbnails/yu2025dehazegs.jpg
  publication_date: '2025-01-07T09:47:46+00:00'
  date_source: arxiv
- id: lee2025compression
  title: Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard
    Video Codecs
  authors: Soonbin Lee, Fangwen Shu, Yago Sanchez, Thomas Schierl, Cornelius Hellge
  year: '2025'
  abstract: '3D Gaussian Splatting is a recognized method for 3D scene representation,
    known for its high rendering quality and speed. However, its substantial data
    requirements present challenges for practical applications. In this paper, we
    introduce an efficient compression technique that significantly reduces storage
    overhead by using compact representation. We propose a unified architecture that
    combines point cloud data and feature planes through a progressive tri-plane structure.
    Our method utilizes 2D feature planes, enabling continuous spatial representation.
    To further optimize these representations, we incorporate entropy modeling in
    the frequency domain, specifically designed for standard video codecs. We also
    propose channel-wise bit allocation to achieve a better trade-off between bitrate
    consumption and feature plane representation. Consequently, our model effectively
    leverages spatial correlations within the feature planes to enhance rate-distortion
    performance using standard, non-differentiable video codecs. Experimental results
    demonstrate that our method outperforms existing methods in data compactness while
    maintaining high rendering quality. Our project page is available at https://fraunhoferhhi.github.io/CodecGS

    '
  project_page: null
  paper: https://arxiv.org/pdf/2501.03399.pdf
  code: null
  video: null
  tags:
  - Compression
  thumbnail: assets/thumbnails/lee2025compression.jpg
  publication_date: '2025-01-06T21:37:30+00:00'
  date_source: arxiv
- id: rajasegaran2025gaussian
  title: Gaussian Masked Autoencoders
  authors: Jathushan Rajasegaran, Xinlei Chen, Rulilong Li, Christoph Feichtenhofer,
    Jitendra Malik, Shiry Ginosar
  year: '2025'
  abstract: 'This paper explores Masked Autoencoders (MAE) with Gaussian Splatting.
    While reconstructive self-supervised learning frameworks such as MAE learns good
    semantic abstractions, it is not trained for explicit spatial awareness. Our approach,
    named Gaussian Masked Autoencoder, or GMAE, aims to learn semantic abstractions
    and spatial understanding jointly. Like MAE, it reconstructs the image end-to-end
    in the pixel space, but beyond MAE, it also introduces an intermediate, 3D Gaussian-based
    representation and renders images via splatting. We show that GMAE can enable
    various zero-shot learning capabilities of spatial understanding (e.g., figure-ground
    segmentation, image layering, edge detection, etc.) while preserving the high-level
    semantics of self-supervised representation quality from MAE. To our knowledge,
    we are the first to employ Gaussian primitives in an image representation learning
    framework beyond optimization-based single-scene reconstructions. We believe GMAE
    will inspire further research in this direction and contribute to developing next-generation
    techniques for modeling high-fidelity visual data. More details at https://brjathu.github.io/gmae

    '
  project_page: null
  paper: https://arxiv.org/pdf/2501.03229.pdf
  code: https://github.com/darshanmakwana412/gaussian-mae
  video: null
  tags:
  - Code
  - Transformer
  thumbnail: assets/thumbnails/rajasegaran2025gaussian.jpg
  publication_date: '2025-01-06T18:59:57+00:00'
  date_source: arxiv
- id: nguyen2025pointmapconditioned
  title: Pointmap-Conditioned Diffusion for Consistent Novel View Synthesis
  authors: Thang-Anh-Quan Nguyen, Nathan Piasco, Luis Roldão, Moussab Bennehar, Dzmitry
    Tsishkou, Laurent Caraffa, Jean-Philippe Tarel, Roland Brémond
  year: '2025'
  abstract: 'In this paper, we present PointmapDiffusion, a novel framework for single-image
    novel view synthesis (NVS) that utilizes pre-trained 2D diffusion models. Our
    method is the first to leverage pointmaps (i.e. rasterized 3D scene coordinates)
    as a conditioning signal, capturing geometric prior from the reference images
    to guide the diffusion process. By embedding reference attention blocks and a
    ControlNet for pointmap features, our model balances between generative capability
    and geometric consistency, enabling accurate view synthesis across varying viewpoints.
    Extensive experiments on diverse real-world datasets demonstrate that PointmapDiffusion
    achieves high-quality, multi-view consistent results with significantly fewer
    trainable parameters compared to other baselines for single-image NVS tasks.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2501.02913.pdf
  code: null
  video: null
  tags:
  - Diffusion
  thumbnail: assets/thumbnails/nguyen2025pointmapconditioned.jpg
  publication_date: '2025-01-06T10:48:31+00:00'
  date_source: arxiv
- id: bian2025gsdit
  title: 'GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through
    Efficient Dense 3D Point Tracking'
  authors: Weikang Bian, Zhaoyang Huang, Xiaoyu Shi, Yijin Li, Fu-Yun Wang, Hongsheng
    Li
  year: '2025'
  abstract: '4D video control is essential in video generation as it enables the use
    of sophisticated lens techniques, such as multi-camera shooting and dolly zoom,
    which are currently unsupported by existing methods. Training a video Diffusion
    Transformer (DiT) directly to control 4D content requires expensive multi-view
    videos. Inspired by Monocular Dynamic novel View Synthesis (MDVS) that optimizes
    a 4D representation and renders videos according to different 4D elements, such
    as camera pose and object motion editing, we bring pseudo 4D Gaussian fields to
    video generation. Specifically, we propose a novel framework that constructs a
    pseudo 4D Gaussian field with dense 3D point tracking and renders the Gaussian
    field for all video frames. Then we finetune a pretrained DiT to generate videos
    following the guidance of the rendered video, dubbed as GS-DiT. To boost the training
    of the GS-DiT, we also propose an efficient Dense 3D Point Tracking (D3D-PT) method
    for the pseudo 4D Gaussian field construction. Our D3D-PT outperforms SpatialTracker,
    the state-of-the-art sparse 3D point tracking method, in accuracy and accelerates
    the inference speed by two orders of magnitude. During the inference stage, GS-DiT
    can generate videos with the same dynamic content while adhering to different
    camera parameters, addressing a significant limitation of current video generation
    models. GS-DiT demonstrates strong generalization capabilities and extends the
    4D controllability of Gaussian splatting to video generation beyond just camera
    poses. It supports advanced cinematic effects through the manipulation of the
    Gaussian field and camera intrinsics, making it a powerful tool for creative video
    production. Demos are available at https://wkbian.github.io/Projects/GS-DiT/.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2501.02690.pdf
  code: null
  video: null
  tags:
  - Year 2025
  thumbnail: assets/thumbnails/bian2025gsdit.jpg
  publication_date: '2025-01-05T23:55:33+00:00'
  date_source: arxiv
- id: cong2025videolifter
  title: 'VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment'
  authors: Wenyan Cong, Kevin Wang, Jiahui Lei, Colton Stearns, Yuanhao Cai, Dilin
    Wang, Rakesh Ranjan, Matt Feiszli, Leonidas Guibas, Zhangyang Wang, Weiyao Wang,
    Zhiwen Fan
  year: '2025'
  abstract: 'Efficiently reconstructing accurate 3D models from monocular video is
    a key challenge in computer vision, critical for advancing applications in virtual
    reality, robotics, and scene understanding. Existing approaches typically require
    pre-computed camera parameters and frame-by-frame reconstruction pipelines, which
    are prone to error accumulation and entail significant computational overhead.
    To address these limitations, we introduce VideoLifter, a novel framework that
    leverages geometric priors from a learnable model to incrementally optimize a
    globally sparse to dense 3D representation directly from video sequences. VideoLifter
    segments the video sequence into local windows, where it matches and registers
    frames, constructs consistent fragments, and aligns them hierarchically to produce
    a unified 3D model. By tracking and propagating sparse point correspondences across
    frames and fragments, VideoLifter incrementally refines camera poses and 3D structure,
    minimizing reprojection error for improved accuracy and robustness. This approach
    significantly accelerates the reconstruction process, reducing training time by
    over 82% while surpassing current state-of-the-art methods in visual fidelity
    and computational efficiency.

    '
  project_page: https://videolifter.github.io/
  paper: https://arxiv.org/pdf/2501.01949.pdf
  code: https://github.com/VITA-Group/VideoLifter
  video: null
  tags:
  - Acceleration
  - Code
  - Diffusion
  - Project
  thumbnail: assets/thumbnails/cong2025videolifter.jpg
  publication_date: '2025-01-03T18:52:36+00:00'
  date_source: arxiv
- id: huang2025enerverse
  title: 'EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation'
  authors: Siyuan Huang, Liliang Chen, Pengfei Zhou, Shengcong Chen, Zhengkai Jiang,
    Yue Hu, Peng Gao, Hongsheng Li, Maoqing Yao, Guanghui Ren
  year: '2025'
  abstract: 'We introduce EnerVerse, a comprehensive framework for embodied future
    space generation specifically designed for robotic manipulation tasks. EnerVerse
    seamlessly integrates convolutional and bidirectional attention mechanisms for
    inner-chunk space modeling, ensuring low-level consistency and continuity. Recognizing
    the inherent redundancy in video data, we propose a sparse memory context combined
    with a chunkwise unidirectional generative paradigm to enable the generation of
    infinitely long sequences. To further augment robotic capabilities, we introduce
    the Free Anchor View (FAV) space, which provides flexible perspectives to enhance
    observation and analysis. The FAV space mitigates motion modeling ambiguity, removes
    physical constraints in confined environments, and significantly improves the
    robot''s generalization and adaptability across various tasks and settings. To
    address the prohibitive costs and labor intensity of acquiring multi-camera observations,
    we present a data engine pipeline that integrates a generative model with 4D Gaussian
    Splatting (4DGS). This pipeline leverages the generative model''s robust generalization
    capabilities and the spatial constraints provided by 4DGS, enabling an iterative
    enhancement of data quality and diversity, thus creating a data flywheel effect
    that effectively narrows the sim-to-real gap. Finally, our experiments demonstrate
    that the embodied future space generation prior substantially enhances policy
    predictive capabilities, resulting in improved overall performance, particularly
    in long-range robotic manipulation tasks.

    '
  project_page: https://sites.google.com/view/enerverse
  paper: https://arxiv.org/pdf/2501.01895.pdf
  code: null
  video: null
  tags:
  - Dynamic
  - Project
  - Robotics
  thumbnail: assets/thumbnails/huang2025enerverse.jpg
  publication_date: '2025-01-03T17:00:33+00:00'
- id: longhini2024clothsplatting
  title: 'Cloth-Splatting: 3D Cloth State Estimation from RGB Supervision'
  authors: Alberta Longhini, Marcel Büsching, Bardienus Pieter Duisterhof, Jens Lundell,
    Jeffrey Ichnowski, Mårten Björkman, Danica Kragic
  year: '2024'
  abstract: Recently, 3D Gaussian Splatting (3DGS) has revolutionized radiance field
    reconstruction, manifesting efficient and high-fidelity novel view synthesis.
    However, accurately We introduce Cloth-Splatting, a method for estimating 3D states
    of cloth from RGB images through a prediction-update framework. Cloth-Splatting
    leverages an action-conditioned dynamics model for predicting future states and
    uses 3D Gaussian Splatting to update the predicted states. Our key insight is
    that coupling a 3D mesh-based representation with Gaussian Splatting allows us
    to define a differentiable map between the cloth's state space and the image space.
    This enables the use of gradient-based optimization techniques to refine inaccurate
    state estimates using only RGB supervision. Our experiments demonstrate that Cloth-Splatting
    not only improves state estimation accuracy over current baselines but also reduces
    convergence time by ~85%.
  project_page: https://kth-rpl.github.io/cloth-splatting/
  paper: https://arxiv.org/pdf/2501.01715.pdf
  code: https://github.com/KTH-RPL/cloth-splatting
  video: null
  tags:
  - Code
  - Meshing
  - Project
  - Rendering
  thumbnail: assets/thumbnails/longhini2024clothsplatting.jpg
  publication_date: '2025-01-03T09:17:30+00:00'
  date_source: arxiv
- id: zhang2025crossviewgs
  title: 'CrossView-GS: Cross-view Gaussian Splatting For Large-scale Scene Reconstruction'
  authors: Chenhao Zhang, Yuanping Cao, Lei Zhang
  year: '2025'
  abstract: 3D Gaussian Splatting (3DGS) has emerged as a prominent method for scene
    representation and reconstruction, leveraging densely distributed Gaussian primitives
    to enable real-time rendering of high-resolution images. While existing 3DGS methods
    perform well in scenes with minor view variation, large view changes in cross-view
    scenes pose optimization challenges for these methods. To address these issues,
    we propose a novel cross-view Gaussian Splatting method for large-scale scene
    reconstruction, based on dual-branch fusion. Our method independently reconstructs
    models from aerial and ground views as two independent branches to establish the
    baselines of Gaussian distribution, providing reliable priors for cross-view reconstruction
    during both initialization and densification. Specifically, a gradient-aware regularization
    strategy is introduced to mitigate smoothing issues caused by significant view
    disparities. Additionally, a unique Gaussian supplementation strategy is utilized
    to incorporate complementary information of dual-branch into the cross-view model.
    Extensive experiments on benchmark datasets demonstrate that our method achieves
    superior performance in novel view synthesis compared to state-of-the-art methods.
  project_page: null
  paper: https://arxiv.org/pdf/2501.01695.pdf
  code: null
  video: null
  tags:
  - Large-Scale
  - Optimization
  thumbnail: assets/thumbnails/zhang2025crossviewgs.jpg
  publication_date: '2025-01-03T08:24:59+00:00'
- id: wang2025pgsag
  title: 'PG-SAG: Parallel Gaussian Splatting for Fine-Grained Large-Scale Urban Buildings
    Reconstruction via Semantic-Aware Grouping'
  authors: Tengfei Wang, Xin Wang, Yongmao Hou, Yiwei Xu, Wendi Zhang, Zongqian Zhan
  year: '2025'
  abstract: 3D Gaussian Splatting (3DGS) has emerged as a transformative method in
    the field of real-time novel synthesis. Based on 3DGS, recent advancements cope
    with large-scale scenes via spatial-based partition strategy to reduce video memory
    and optimization time costs. In this work, we introduce a parallel Gaussian splatting
    method, termed PG-SAG, which fully exploits semantic cues for both partitioning
    and Gaussian kernel optimization, enabling fine-grained building surface reconstruction
    of large-scale urban areas without downsampling the original image resolution.
    First, the Cross-modal model - Language Segment Anything is leveraged to segment
    building masks. Then, the segmented building regions is grouped into sub-regions
    according to the visibility check across registered images. The Gaussian kernels
    for these sub-regions are optimized in parallel with masked pixels. In addition,
    the normal loss is re-formulated for the detected edges of masks to alleviate
    the ambiguities in normal vectors on edges. Finally, to improve the optimization
    of 3D Gaussians, we introduce a gradient-constrained balance-load loss that accounts
    for the complexity of the corresponding scenes, effectively minimizing the thread
    waiting time in the pixel-parallel rendering stage as well as the reconstruction
    lost. Extensive experiments are tested on various urban datasets, the results
    demonstrated the superior performance of our PG-SAG on building surface reconstruction,
    compared to several state-of-the-art 3DGS-based methods.
  project_page: null
  paper: https://arxiv.org/pdf/2501.01677.pdf
  code: null
  video: null
  tags:
  - Large-Scale
  - Meshing
  - Optimization
  thumbnail: assets/thumbnails/wang2025pgsag.jpg
  publication_date: '2025-01-03T07:40:16+00:00'
- id: gao2025easysplat
  title: 'EasySplat: View-Adaptive Learning makes 3D Gaussian Splatting Easy'
  authors: Ao Gao, Luosong Guo, Tao Chen, Zhao Wang, Ying Tai, Jian Yang, Zhenyu Zhang
  year: '2025'
  abstract: 3D Gaussian Splatting (3DGS) techniques have achieved satisfactory 3D
    scene representation. Despite their impressive performance, they confront challenges
    due to the limitation of structure-from-motion (SfM) methods on acquiring accurate
    scene initialization, or the inefficiency of densification strategy. In this paper,
    we introduce a novel framework EasySplat to achieve high-quality 3DGS modeling.
    Instead of using SfM for scene initialization, we employ a novel method to release
    the power of large-scale pointmap approaches. Specifically, we propose an efficient
    grouping strategy based on view similarity, and use robust pointmap priors to
    obtain high-quality point clouds and camera poses for 3D scene initialization.
    After obtaining a reliable scene structure, we propose a novel densification approach
    that adaptively splits Gaussian primitives based on the average shape of neighboring
    Gaussian ellipsoids, utilizing KNN scheme. In this way, the proposed method tackles
    the limitation on initialization and optimization, leading to an efficient and
    accurate 3DGS modeling. Extensive experiments demonstrate that EasySplat outperforms
    the current state-of-the-art (SOTA) in handling novel view synthesis.
  project_page: null
  paper: https://arxiv.org/pdf/2501.01003.pdf
  code: null
  video: null
  tags:
  - 3ster-based
  - Acceleration
  - Densification
  - Rendering
  thumbnail: assets/thumbnails/gao2025easysplat.jpg
  publication_date: '2025-01-02T01:56:58+00:00'
- id: yang2024storm
  title: 'STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes'
  authors: Jiawei Yang, Jiahui Huang, Yuxiao Chen, Yan Wang, Boyi Li, Yurong You,
    Apoorva Sharma, Maximilian Igl, Peter Karkus, Danfei Xu, Boris Ivanovic, Yue Wang,
    Marco Pavone
  year: '2024'
  abstract: We present STORM, a spatio-temporal reconstruction model designed for
    reconstructing dynamic outdoor scenes from sparse observations. Existing dynamic
    reconstruction methods often rely on per-scene optimization, dense observations
    across space and time, and strong motion supervision, resulting in lengthy optimization
    times, limited generalization to novel views or scenes, and degenerated quality
    caused by noisy pseudo-labels for dynamics. To address these challenges, STORM
    leverages a data-driven Transformer architecture that directly infers dynamic
    3D scene representations--parameterized by 3D Gaussians and their velocities--in
    a single forward pass. Our key design is to aggregate 3D Gaussians from all frames
    using self-supervised scene flows, transforming them to the target timestep to
    enable complete (i.e., "amodal") reconstructions from arbitrary viewpoints at
    any moment in time. As an emergent property, STORM automatically captures dynamic
    instances and generates high-quality masks using only reconstruction losses. Extensive
    experiments on public datasets show that STORM achieves precise dynamic scene
    reconstruction, surpassing state-of-the-art per-scene optimization methods (+4.3
    to 6.6 PSNR) and existing feed-forward approaches (+2.1 to 4.7 PSNR) in dynamic
    regions. STORM reconstructs large-scale outdoor scenes in 200ms, supports real-time
    rendering, and outperforms competitors in scene flow estimation, improving 3D
    EPE by 0.422m and Acc5 by 28.02%. Beyond reconstruction, we showcase four additional
    applications of our model, illustrating the potential of self-supervised learning
    for broader dynamic scene understanding.
  project_page: null
  paper: https://arxiv.org/pdf/2501.00602.pdf
  code: null
  video: https://jiawei-yang.github.io/STORM/
  tags:
  - Autonomous Driving
  - Dynamic
  - Large-Scale
  - Video
  thumbnail: assets/thumbnails/yang2024storm.jpg
  publication_date: '2024-12-31T18:59:58+00:00'
- id: mao2024dreamdrive
  title: 'DreamDrive: Generative 4D Scene Modeling from Street View Images'
  authors: Jiageng Mao, Boyi Li, Boris Ivanovic, Yuxiao Chen, Yan Wang, Yurong You,
    Chaowei Xiao, Danfei Xu, Marco Pavone, Yue Wang
  year: '2024'
  abstract: Synthesizing photo-realistic visual observations from an ego vehicle's
    driving trajectory is a critical step towards scalable training of self-driving
    models. Reconstruction-based methods create 3D scenes from driving logs and synthesize
    geometry-consistent driving videos through neural rendering, but their dependence
    on costly object annotations limits their ability to generalize to in-the-wild
    driving scenarios. On the other hand, generative models can synthesize action-conditioned
    driving videos in a more generalizable way but often struggle with maintaining
    3D visual consistency. In this paper, we present DreamDrive, a 4D spatial-temporal
    scene generation approach that combines the merits of generation and reconstruction,
    to synthesize generalizable 4D driving scenes and dynamic driving videos with
    3D consistency. Specifically, we leverage the generative power of video diffusion
    models to synthesize a sequence of visual references and further elevate them
    to 4D with a novel hybrid Gaussian representation. Given a driving trajectory,
    we then render 3D-consistent driving videos via Gaussian splatting. The use of
    generative priors allows our method to produce high-quality 4D scenes from in-the-wild
    driving data, while neural rendering ensures 3D-consistent video generation from
    the 4D scenes. Extensive experiments on nuScenes and street view images demonstrate
    that DreamDrive can generate controllable and generalizable 4D driving scenes,
    synthesize novel views of driving videos with high fidelity and 3D consistency,
    decompose static and dynamic elements in a self-supervised manner, and enhance
    perception and planning tasks for autonomous driving.
  project_page: null
  paper: https://arxiv.org/pdf/2501.00601.pdf
  code: null
  video: null
  tags:
  - Autonomous Driving
  - Dynamic
  - Feed-Forward
  thumbnail: assets/thumbnails/mao2024dreamdrive.jpg
  publication_date: '2024-12-31T18:59:57+00:00'
- id: wang2024sgsplatting
  title: 'SG-Splatting: Accelerating 3D Gaussian Splatting with Spherical Gaussians'
  authors: Yiwen Wang, Siyuan Chen, Ran Yi
  year: '2024'
  abstract: '3D Gaussian Splatting is emerging as a state-of-the-art technique in
    novel view synthesis, recognized for its impressive balance between visual quality,
    speed, and rendering efficiency. However, reliance on third-degree spherical harmonics
    for color representation introduces significant storage demands and computational
    overhead, resulting in a large memory footprint and slower rendering speed. We
    introduce SG-Splatting with Spherical Gaussians based color representation, a
    novel approach to enhance rendering speed and quality in novel view synthesis.
    Our method first represents view-dependent color using Spherical Gaussians, instead
    of three degree spherical harmonics, which largely reduces the number of parameters
    used for color representation, and significantly accelerates the rendering process.
    We then develop an efficient strategy for organizing multiple Spherical Gaussians,
    optimizing their arrangement to achieve a balanced and accurate scene representation.
    To further improve rendering quality, we propose a mixed representation that combines
    Spherical Gaussians with low-degree spherical harmonics, capturing both high-
    and low-frequency color information effectively. SG-Splatting also has plug-and-play
    capability, allowing it to be easily integrated into existing systems. This approach
    improves computational efficiency and overall visual fidelity, making it a practical
    solution for real-time applications.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2501.00342.pdf
  code: null
  video: null
  tags:
  - Acceleration
  thumbnail: assets/thumbnails/wang2024sgsplatting.jpg
  publication_date: '2024-12-31T08:31:52+00:00'
- id: cha2024perse
  title: 'PERSE: Personalized 3D Generative Avatars from A Single Portrait'
  authors: Hyunsoo Cha, Inhee Lee, Hanbyul Joo
  year: '2024'
  abstract: We present PERSE, a method for building an animatable personalized generative
    avatar from a reference portrait. Our avatar model enables facial attribute editing
    in a continuous and disentangled latent space to control each facial attribute,
    while preserving the individual's identity. To achieve this, our method begins
    by synthesizing large-scale synthetic 2D video datasets, where each video contains
    consistent changes in the facial expression and viewpoint, combined with a variation
    in a specific facial attribute from the original input. We propose a novel pipeline
    to produce high-quality, photorealistic 2D videos with facial attribute editing.
    Leveraging this synthetic attribute dataset, we present a personalized avatar
    creation method based on the 3D Gaussian Splatting, learning a continuous and
    disentangled latent space for intuitive facial attribute manipulation. To enforce
    smooth transitions in this latent space, we introduce a latent space regularization
    technique by using interpolated 2D faces as supervision. Compared to previous
    approaches, we demonstrate that PERSE generates high-quality avatars with interpolated
    attributes while preserving identity of reference person.
  project_page: https://hyunsoocha.github.io/perse/
  paper: https://arxiv.org/pdf/2412.21206v1.pdf
  code: null
  video: https://youtu.be/zX881Zx03o4
  tags:
  - Avatar
  - GAN
  - Project
  - Video
  thumbnail: assets/thumbnails/cha2024perse.jpg
  publication_date: '2024-12-30T18:59:58+00:00'
- id: yang20244d
  title: '4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives'
  authors: Zeyu Yang, Zijie Pan, Xiatian Zhu, Li Zhang, Yu-Gang Jiang, Philip H. S.
    Torr
  year: '2024'
  abstract: Dynamic 3D scene representation and novel view synthesis from captured
    videos are crucial for enabling immersive experiences required by AR/VR and metaverse
    applications. However, this task is challenging due to the complexity of unconstrained
    real-world scenes and their temporal dynamics. In this paper, we frame dynamic
    scenes as a spatio-temporal 4D volume learning problem, offering a native explicit
    reformulation with minimal assumptions about motion, which serves as a versatile
    dynamic scene learning framework. Specifically, we represent a target dynamic
    scene using a collection of 4D Gaussian primitives with explicit geometry and
    appearance features, dubbed as 4D Gaussian splatting (4DGS). This approach can
    capture relevant information in space and time by fitting the underlying spatio-temporal
    volume. Modeling the spacetime as a whole with 4D Gaussians parameterized by anisotropic
    ellipses that can rotate arbitrarily in space and time, our model can naturally
    learn view-dependent and time-evolved appearance with 4D spherindrical harmonics.
    Notably, our 4DGS model is the first solution that supports real-time rendering
    of high-resolution, photorealistic novel views for complex dynamic scenes. To
    enhance efficiency, we derive several compact variants that effectively reduce
    memory footprint and mitigate the risk of overfitting. Extensive experiments validate
    the superiority of 4DGS in terms of visual quality and efficiency across a range
    of dynamic scene-related tasks (e.g., novel view synthesis, 4D generation, scene
    understanding) and scenarios (e.g., single object, indoor scenes, driving environments,
    synthetic and real data).
  project_page: null
  paper: https://arxiv.org/pdf/2412.20720v1.pdf
  code: null
  video: null
  tags:
  - Compression
  - Dynamic
  - Large-Scale
  thumbnail: assets/thumbnails/yang20244d.jpg
  publication_date: '2024-12-30T05:30:26+00:00'
- id: liu2024maskgaussian
  title: 'MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks'
  authors: Yifei Liu, Zhihang Zhong, Yifan Zhan, Sheng Xu, Xiao Sun
  year: '2024'
  abstract: 'While 3D Gaussian Splatting (3DGS) has demonstrated remarkable performance
    in novel view synthesis and real-time rendering, the high memory consumption due
    to the use of millions of Gaussians limits its practicality. To mitigate this
    issue, improvements have been made by pruning unnecessary Gaussians, either through
    a hand-crafted criterion or by using learned masks. However, these methods deterministically
    remove Gaussians based on a snapshot of the pruning moment, leading to sub-optimized
    reconstruction performance from a long-term perspective. To address this issue,
    we introduce MaskGaussian, which models Gaussians as probabilistic entities rather
    than permanently removing them, and utilize them according to their probability
    of existence. To achieve this, we propose a masked-rasterization technique that
    enables unused yet probabilistically existing Gaussians to receive gradients,
    allowing for dynamic assessment of their contribution to the evolving scene and
    adjustment of their probability of existence. Hence, the importance of Gaussians
    iteratively changes and the pruned Gaussians are selected diversely. Extensive
    experiments demonstrate the superiority of the proposed method in achieving better
    rendering quality with fewer Gaussians than previous pruning methods, pruning
    over 60% of Gaussians on average with only a 0.02 PSNR decline. Our code can be
    found at: https://github.com/kaikai23/MaskGaussian

    '
  project_page: null
  paper: https://arxiv.org/pdf/2412.20522.pdf
  code: https://github.com/kaikai23/MaskGaussian
  video: null
  tags:
  - Code
  - Compression
  - Densification
  thumbnail: assets/thumbnails/liu2024maskgaussian.jpg
  publication_date: '2024-12-29T17:12:16+00:00'
  date_source: arxiv
- id: zeller2024gsplatloc
  title: 'GSplatLoc: Ultra-Precise Camera Localization via 3D Gaussian Splatting'
  authors: Atticus J. Zeller
  year: '2024'
  abstract: 'We present GSplatLoc, a camera localization method that leverages the
    differentiable rendering capabilities of 3D Gaussian splatting for ultra-precise
    pose estimation. By formulating pose estimation as a gradient-based optimization
    problem that minimizes discrepancies between rendered depth maps from a pre-existing
    3D Gaussian scene and observed depth images, GSplatLoc achieves translational
    errors within 0.01 cm and near-zero rotational errors on the Replica dataset -
    significantly outperforming existing methods. Evaluations on the Replica and TUM
    RGB-D datasets demonstrate the method''s robustness in challenging indoor environments
    with complex camera motions. GSplatLoc sets a new benchmark for localization in
    dense mapping, with important implications for applications requiring accurate
    real-time localization, such as robotics and augmented reality.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2412.20056.pdf
  code: https://github.com/AtticusZeller/GsplatLoc
  video: null
  tags:
  - Code
  - In the Wild
  - Point Cloud
  - Poses
  - Robotics
  - SLAM
  thumbnail: assets/thumbnails/zeller2024gsplatloc.jpg
  publication_date: '2024-12-28T07:14:14+00:00'
  date_source: arxiv
- id: xu2024das3r
  title: 'DAS3R: Dynamics-Aware Gaussian Splatting for Static Scene Reconstruction'
  authors: Kai Xu, Tze Ho Elden Tse, Jizong Peng, Angela Yao
  year: '2024'
  abstract: 'We propose a novel framework for scene decomposition and static background
    reconstruction from everyday videos. By integrating the trained motion masks and
    modeling the static scene as Gaussian splats with dynamics-aware optimization,
    our method achieves more accurate background reconstruction results than previous
    works. Our proposed method is termed DAS3R, an abbreviation for Dynamics-Aware
    Gaussian Splatting for Static Scene Reconstruction. Compared to existing methods,
    DAS3R is more robust in complex motion scenarios, capable of handling videos where
    dynamic objects occupy a significant portion of the scene, and does not require
    camera pose inputs or point cloud data from SLAM-based methods. We compared DAS3R
    against recent distractor-free approaches on the DAVIS and Sintel datasets; DAS3R
    demonstrates enhanced performance and robustness with a margin of more than 2
    dB in PSNR. The project''s webpage can be accessed via \url{https://kai422.github.io/DAS3R/}

    '
  project_page: https://kai422.github.io/DAS3R/
  paper: https://arxiv.org/pdf/2412.19584.pdf
  code: https://github.com/kai422/das3r
  video: https://kai422.github.io/DAS3R/assets/davis.gif
  tags:
  - Code
  - Project
  - Video
  thumbnail: assets/thumbnails/xu2024das3r.jpg
  publication_date: '2024-12-27T10:59:46+00:00'
  date_source: arxiv
- id: cai2024dust
  title: 'Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from
    Sparse Uncalibrated Images'
  authors: Xudong Cai, Yongcai Wang, Zhaoxin Fan, Deng Haoran, Shuo Wang, Wanting
    Li, Deying Li, Lun Luo, Minhang Wang, Jintao Xu
  year: '2024'
  abstract: Photo-realistic scene reconstruction from sparse-view, uncalibrated images
    is highly required in practice. Although some successes have been made, existing
    methods are either Sparse-View but require accurate camera parameters (i.e., intrinsic
    and extrinsic), or SfM-free but need densely captured images. To combine the advantages
    of both methods while addressing their respective weaknesses, we propose Dust
    to Tower (D2T), an accurate and efficient coarse-to-fine framework to optimize
    3DGS and image poses simultaneously from sparse and uncalibrated images. Our key
    idea is to first construct a coarse model efficiently and subsequently refine
    it using warped and inpainted images at novel viewpoints. To do this, we first
    introduce a Coarse Construction Module (CCM) which exploits a fast Multi-View
    Stereo model to initialize a 3D Gaussian Splatting (3DGS) and recover initial
    camera poses. To refine the 3D model at novel viewpoints, we propose a Confidence
    Aware Depth Alignment (CADA) module to refine the coarse depth maps by aligning
    their confident parts with estimated depths by a Mono-depth model. Then, a Warped
    Image-Guided Inpainting (WIGI) module is proposed to warp the training images
    to novel viewpoints by the refined depth maps, and inpainting is applied to fulfill
    the ``holes" in the warped images caused by view-direction changes, providing
    high-quality supervision to further optimize the 3D model and the camera poses.
    Extensive experiments and ablation studies demonstrate the validity of D2T and
    its design choices, achieving state-of-the-art performance in both tasks of novel
    view synthesis and pose estimation while keeping high efficiency. Codes will be
    publicly available.
  project_page: null
  paper: https://arxiv.org/pdf/2412.19518.pdf
  code: null
  video: null
  tags:
  - Inpainting
  - Poses
  - Sparse
  thumbnail: assets/thumbnails/cai2024dust.jpg
  publication_date: '2024-12-27T08:19:34+00:00'
- id: yao2024reflective
  title: Reflective Gaussian Splatting
  authors: Yuxuan Yao, Zixuan Zeng, Chun Gu, Xiatian Zhu, Li Zhang
  year: '2024'
  abstract: 'Novel view synthesis has experienced significant advancements owing to
    increasingly capable NeRF- and 3DGS-based methods. However, reflective object
    reconstruction remains challenging, lacking a proper solution to achieve real-time,
    high-quality rendering while accommodating inter-reflection. To fill this gap,
    we introduce a Reflective Gaussian splatting (\textbf{Ref-Gaussian}) framework
    characterized with two components: (I) {\em Physically based deferred rendering}
    that empowers the rendering equation with pixel-level material properties via
    formulating split-sum approximation; (II) {\em Gaussian-grounded inter-reflection}
    that realizes the desired inter-reflection function within a Gaussian splatting
    paradigm for the first time. To enhance geometry modeling, we further introduce
    material-aware normal propagation and an initial per-Gaussian shading stage, along
    with 2D Gaussian primitives. Extensive experiments on standard datasets demonstrate
    that Ref-Gaussian surpasses existing approaches in terms of quantitative metrics,
    visual quality, and compute efficiency. Further, we show that our method serves
    as a unified solution for both reflective and non-reflective scenes, going beyond
    the previous alternatives focusing on only reflective scenes. Also, we illustrate
    that Ref-Gaussian supports more applications such as relighting and editing.

    '
  project_page: https://fudan-zvg.github.io/ref-gaussian/
  paper: https://arxiv.org/pdf/2412.19282.pdf
  code: null
  video: null
  tags:
  - Meshing
  - Project
  - Ray Tracing
  - Relight
  thumbnail: assets/thumbnails/yao2024reflective.jpg
  publication_date: '2024-12-26T16:58:35+00:00'
- id: qian2024weathergs
  title: 'WeatherGS: 3D Scene Reconstruction in Adverse Weather Conditions via Gaussian
    Splatting'
  authors: Chenghao Qian, Yuhu Guo, Wenjing Li, Gustav Markkula
  year: '2024'
  abstract: 3D Gaussian Splatting (3DGS) has gained significant attention for 3D scene
    reconstruction, but still suffers from complex outdoor environments, especially
    under adverse weather. This is because 3DGS treats the artifacts caused by adverse
    weather as part of the scene and will directly reconstruct them, largely reducing
    the clarity of the reconstructed scene. To address this challenge, we propose
    WeatherGS, a 3DGS-based framework for reconstructing clear scenes from multi-view
    images under different weather conditions. Specifically, we explicitly categorize
    the multi-weather artifacts into the dense particles and lens occlusions that
    have very different characters, in which the former are caused by snowflakes and
    raindrops in the air, and the latter are raised by the precipitation on the camera
    lens. In light of this, we propose a dense-to-sparse preprocess strategy, which
    sequentially removes the dense particles by an Atmospheric Effect Filter (AEF)
    and then extracts the relatively sparse occlusion masks with a Lens Effect Detector
    (LED). Finally, we train a set of 3D Gaussians by the processed images and generated
    masks for excluding occluded areas, and accurately recover the underlying clear
    scene by Gaussian splatting. We conduct a diverse and challenging benchmark to
    facilitate the evaluation of 3D reconstruction under complex weather scenarios.
    Extensive experiments on this benchmark demonstrate that our WeatherGS consistently
    produces high-quality, clean scenes across various weather scenarios, outperforming
    existing state-of-the-art methods.
  project_page: null
  paper: https://arxiv.org/pdf/2412.18862.pdf
  code: https://github.com/Jumponthemoon/WeatherGS
  video: null
  tags:
  - Code
  - In the Wild
  thumbnail: assets/thumbnails/qian2024weathergs.jpg
  publication_date: '2024-12-25T10:16:57+00:00'
- id: lyu2024facelift
  title: 'FaceLift: Single Image to 3D Head with View Generation and GS-LRM'
  authors: Weijie Lyu, Yi Zhou, Ming-Hsuan Yang, Zhixin Shu
  year: '2024'
  abstract: We present FaceLift, a feed-forward approach for rapid, high-quality,
    360-degree head reconstruction from a single image. Our pipeline begins by employing
    a multi-view latent diffusion model that generates consistent side and back views
    of the head from a single facial input. These generated views then serve as input
    to a GS-LRM reconstructor, which produces a comprehensive 3D representation using
    Gaussian splats. To train our system, we develop a dataset of multi-view renderings
    using synthetic 3D human head as-sets. The diffusion-based multi-view generator
    is trained exclusively on synthetic head images, while the GS-LRM reconstructor
    undergoes initial training on Objaverse followed by fine-tuning on synthetic head
    data. FaceLift excels at preserving identity and maintaining view consistency
    across views. Despite being trained solely on synthetic data, FaceLift demonstrates
    remarkable generalization to real-world images. Through extensive qualitative
    and quantitative evaluations, we show that FaceLift outperforms state-of-the-art
    methods in 3D head reconstruction, highlighting its practical applicability and
    robust performance on real-world images. In addition to single image reconstruction,
    FaceLift supports video inputs for 4D novel view synthesis and seamlessly integrates
    with 2D reanimation techniques to enable 3D facial animation.
  project_page: https://www.wlyu.me/FaceLift/
  paper: https://arxiv.org/pdf/2412.17812.pdf
  code: null
  video: https://huggingface.co/wlyu/FaceLift/resolve/main/videos/website_video.mp4
  tags:
  - Avatar
  - Feed-Forward
  - Project
  - Video
  thumbnail: assets/thumbnails/lyu2024facelift.jpg
  publication_date: '2024-12-23T18:59:49+00:00'
- id: shao2024gausim
  title: 'GauSim: Registering Elastic Objects into Digital World by Gaussian Simulator'
  authors: Yidi Shao, Mu Huang, Chen Change Loy, Bo Dai
  year: '2024'
  abstract: In this work, we introduce GauSim, a novel neural network-based simulator
    designed to capture the dynamic behaviors of real-world elastic objects represented
    through Gaussian kernels. Unlike traditional methods that treat kernels as particles
    within particle-based simulations, we leverage continuum mechanics, modeling each
    kernel as a continuous piece of matter to account for realistic deformations without
    idealized assumptions. To improve computational efficiency and fidelity, we employ
    a hierarchical structure that organizes kernels into Center of Mass Systems (CMS)
    with explicit formulations, enabling a coarse-to-fine simulation approach. This
    structure significantly reduces computational overhead while preserving detailed
    dynamics. In addition, GauSim incorporates explicit physics constraints, such
    as mass and momentum conservation, ensuring interpretable results and robust,
    physically plausible simulations. To validate our approach, we present a new dataset,
    READY, containing multi-view videos of real-world elastic deformations. Experimental
    results demonstrate that GauSim achieves superior performance compared to existing
    physics-driven baselines, offering a practical and accurate solution for simulating
    complex dynamic behaviors. Code and model will be released.
  project_page: https://www.mmlab-ntu.com/project/gausim/index.html
  paper: https://arxiv.org/pdf/2412.17804.pdf
  code: null
  video: null
  tags:
  - Dynamic
  - Physics
  - Project
  thumbnail: assets/thumbnails/shao2024gausim.jpg
  publication_date: '2024-12-23T18:58:17+00:00'
- id: jin2024activegs
  title: 'ActiveGS: Active Scene Reconstruction using Gaussian Splatting'
  authors: Liren Jin, Xingguang Zhong, Yue Pan, Jens Behley, Cyrill Stachniss, Marija
    Popović
  year: '2024'
  abstract: 'Robotics applications often rely on scene reconstructions to enable downstream
    tasks. In this work, we tackle the challenge of actively building an accurate
    map of an unknown scene using an on-board RGB-D camera. We propose a hybrid map
    representation that combines a Gaussian splatting map with a coarse voxel map,
    leveraging the strengths of both representations: the high-fidelity scene reconstruction
    capabilities of Gaussian splatting and the spatial modelling strengths of the
    voxel map. The core of our framework is an effective confidence modelling technique
    for the Gaussian splatting map to identify under-reconstructed areas, while utilising
    spatial information from the voxel map to target unexplored areas and assist in
    collision-free path planning. By actively collecting scene information in under-reconstructed
    and unexplored areas for map updates, our approach achieves superior Gaussian
    splatting reconstruction results compared to state-of-the-art approaches. Additionally,
    we demonstrate the applicability of our active scene reconstruction framework
    in the real world using an unmanned aerial vehicle.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2412.17769.pdf
  code: null
  video: null
  tags:
  - Meshing
  - Robotics
  - SLAM
  thumbnail: assets/thumbnails/jin2024activegs.jpg
  publication_date: '2024-12-23T18:29:03+00:00'
  date_source: arxiv
- id: gao2024cosurfgscollaborative
  title: CoSurfGS:Collaborative 3D Surface Gaussian Splatting with Distributed Learning
    for Large Scene Reconstruction
  authors: Yuanyuan Gao, Yalun Dai, Hao Li, Weicai Ye, Junyi Chen, Danpeng Chen, Dingwen
    Zhang, Tong He, Guofeng Zhang, Junwei Han
  year: '2024'
  abstract: 3D Gaussian Splatting (3DGS) has demonstrated impressive performance in
    scene reconstruction. However, most existing GS-based surface reconstruction methods
    focus on 3D objects or limited scenes. Directly applying these methods to large-scale
    scene reconstruction will pose challenges such as high memory costs, excessive
    time consumption, and lack of geometric detail, which makes it difficult to implement
    in practical applications. To address these issues, we propose a multi-agent collaborative
    fast 3DGS surface reconstruction framework based on distributed learning for large-scale
    surface reconstruction. Specifically, we develop local model compression (LMC)
    and model aggregation schemes (MAS) to achieve high-quality surface representation
    of large scenes while reducing GPU memory consumption. Extensive experiments on
    Urban3d, MegaNeRF, and BlendedMVS demonstrate that our proposed method can achieve
    fast and scalable high-fidelity surface reconstruction and photorealistic rendering.
  project_page: https://gyy456.github.io/CoSurfGS/
  paper: https://arxiv.org/pdf/2412.17612.pdf
  code: null
  video: null
  tags:
  - Distributed
  - Large-Scale
  - Meshing
  - Project
  thumbnail: assets/thumbnails/gao2024cosurfgscollaborative.jpg
  publication_date: '2024-12-23T14:31:15+00:00'
- id: gui2024balanced
  title: 'Balanced 3DGS: Gaussian-wise Parallelism Rendering with Fine-Grained Tiling'
  authors: Hao Gui, Lin Hu, Rui Chen, Mingxiao Huang, Yuxin Yin, Jin Yang, Yong Wu
  year: '2024'
  abstract: '3D Gaussian Splatting (3DGS) is increasingly attracting attention in
    both academia and industry owing to its superior visual quality and rendering
    speed. However, training a 3DGS model remains a time-intensive task, especially
    in load imbalance scenarios where workload diversity among pixels and Gaussian
    spheres causes poor renderCUDA kernel performance. We introduce Balanced 3DGS,
    a Gaussian-wise parallelism rendering with fine-grained tiling approach in 3DGS
    training process, perfectly solving load-imbalance issues. First, we innovatively
    introduce the inter-block dynamic workload distribution technique to map workloads
    to Streaming Multiprocessor(SM) resources within a single GPU dynamically, which
    constitutes the foundation of load balancing. Second, we are the first to propose
    the Gaussian-wise parallel rendering technique to significantly reduce workload
    divergence inside a warp, which serves as a critical component in addressing load
    imbalance. Based on the above two methods, we further creatively put forward the
    fine-grained combined load balancing technique to uniformly distribute workload
    across all SMs, which boosts the forward renderCUDA kernel performance by up to
    7.52x. Besides, we present a self-adaptive render kernel selection strategy during
    the 3DGS training process based on different load-balance situations, which effectively
    improves training efficiency.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2412.17378.pdf
  code: null
  video: null
  tags:
  - Acceleration
  thumbnail: assets/thumbnails/gui2024balanced.jpg
  publication_date: '2024-12-23T08:26:30+00:00'
- id: jambon2024interactive
  title: Interactive Scene Authoring with Specialized Generative Primitives
  authors: Clément Jambon, Changwoon Choi, Dongsu Zhang, Olga Sorkine-Hornung, Young
    Min Kim
  year: '2024'
  abstract: 'Generating high-quality 3D digital assets often requires expert knowledge
    of complex design tools. We introduce Specialized Generative Primitives, a generative
    framework that allows non-expert users to author high-quality 3D scenes in a seamless,
    lightweight, and controllable manner. Each primitive is an efficient generative
    model that captures the distribution of a single exemplar from the real world.
    With our framework, users capture a video of an environment, which we turn into
    a high-quality and explicit appearance model thanks to 3D Gaussian Splatting.
    Users then select regions of interest guided by semantically-aware features. To
    create a generative primitive, we adapt Generative Cellular Automata to single-exemplar
    training and controllable generation. We decouple the generative task from the
    appearance model by operating on sparse voxels and we recover a high-quality output
    with a subsequent sparse patch consistency step. Each primitive can be trained
    within 10 minutes and used to author new scenes interactively in a fully compositional
    manner. We showcase interactive sessions where various primitives are extracted
    from real-world scenes and controlled to create 3D assets and scenes in a few
    minutes. We also demonstrate additional capabilities of our primitives: handling
    various 3D representations to control generation, transferring appearances, and
    editing geometries.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2412.16253.pdf
  code: null
  video: null
  tags:
  - Editing
  - World Generation
  thumbnail: assets/thumbnails/jambon2024interactive.jpg
  publication_date: '2024-12-20T04:39:50+00:00'
- id: shen2024solidgs
  title: 'SolidGS: Consolidating Gaussian Surfel Splatting for Sparse-View Surface
    Reconstruction'
  authors: Zhuowen Shen, Yuan Liu, Zhang Chen, Zhong Li, Jiepeng Wang, Yongqing Liang,
    Zhengming Yu, Jingdong Zhang, Yi Xu, Scott Schaefer, Xin Li, Wenping Wang
  year: '2024'
  abstract: 'Gaussian splatting has achieved impressive improvements for both novel-view
    synthesis and surface reconstruction from multi-view images. However, current
    methods still struggle to reconstruct high-quality surfaces from only sparse view
    input images using Gaussian splatting. In this paper, we propose a novel method
    called SolidGS to address this problem. We observed that the reconstructed geometry
    can be severely inconsistent across multi-views, due to the property of Gaussian
    function in geometry rendering. This motivates us to consolidate all Gaussians
    by adopting a more solid kernel function, which effectively improves the surface
    reconstruction quality. With the additional help of geometrical regularization
    and monocular normal estimation, our method achieves superior performance on the
    sparse view surface reconstruction than all the Gaussian splatting methods and
    neural field methods on the widely used DTU, Tanks-and-Temples, and LLFF datasets.

    '
  project_page: https://mickshen7558.github.io/projects/SolidGS/
  paper: https://arxiv.org/pdf/2412.15400.pdf
  code: null
  video: null
  tags:
  - Meshing
  - Project
  - Sparse
  thumbnail: assets/thumbnails/shen2024solidgs.jpg
  publication_date: '2024-12-19T21:04:43+00:00'
  date_source: arxiv
- id: xie2024envgs
  title: 'EnvGS: Modeling View-Dependent Appearance with Environment Gaussian'
  authors: Tao Xie, Xi Chen, Zhen Xu, Yiman Xie, Yudong Jin, Yujun Shen, Sida Peng,
    Hujun Bao, Xiaowei Zhou
  year: '2024'
  abstract: 'Reconstructing complex reflections in real-world scenes from 2D images
    is essential for achieving photorealistic novel view synthesis. Existing methods
    that utilize environment maps to model reflections from distant lighting often
    struggle with high-frequency reflection details and fail to account for near-field
    reflections. In this work, we introduce EnvGS, a novel approach that employs a
    set of Gaussian primitives as an explicit 3D representation for capturing reflections
    of environments. These environment Gaussian primitives are incorporated with base
    Gaussian primitives to model the appearance of the whole scene. To efficiently
    render these environment Gaussian primitives, we developed a ray-tracing-based
    renderer that leverages the GPU''s RT core for fast rendering. This allows us
    to jointly optimize our model for high-quality reconstruction while maintaining
    real-time rendering speeds. Results from multiple real-world and synthetic datasets
    demonstrate that our method produces significantly more detailed reflections,
    achieving the best rendering quality in real-time novel view synthesis.

    '
  project_page: https://zju3dv.github.io/envgs/
  paper: https://arxiv.org/pdf/2412.15215.pdf
  code: null
  video: https://raw.githubusercontent.com/xbillowy/assets/refs/heads/main/envgs.github.io.assets/teaser.mp4
  tags:
  - Project
  - Ray Tracing
  - Rendering
  - Video
  thumbnail: assets/thumbnails/xie2024envgs.jpg
  publication_date: '2024-12-19T18:59:57+00:00'
  date_source: arxiv
- id: saito2024squeezeme
  title: 'SqueezeMe: Efficient Gaussian Avatars for VR'
  authors: Shunsuke Saito, Stanislav Pidhorskyi, Igor Santesteban, Forrest Iandola,
    Divam Gupta, Anuj Pahuja, Nemanja Bartolovic, Frank Yu, Emanuel Garbin, Tomas
    Simon
  year: '2024'
  abstract: "Gaussian Splatting has enabled real-time 3D human avatars with unprecedented\
    \ levels of visual quality. While previous methods require a desktop GPU for real-time\
    \ inference of a single avatar, we aim to squeeze multiple Gaussian avatars onto\
    \ a portable virtual reality headset with real-time drivable inference. We begin\
    \ by training a previous work, Animatable Gaussians, on a high quality dataset\
    \ captured with 512 cameras. The Gaussians are animated by controlling base set\
    \ of Gaussians with linear blend skinning (LBS) motion and then further adjusting\
    \ the Gaussians with a neural network decoder to correct their appearance. When\
    \ deploying the model on a Meta Quest 3 VR headset, we find two major computational\
    \ bottlenecks: the decoder and the rendering. To accelerate the decoder, we train\
    \ the Gaussians in UV-space instead of pixel-space, and we distill the decoder\
    \ to a single neural network layer. Further, we discover that neighborhoods of\
    \ Gaussians can share a single corrective from the decoder, which provides an\
    \ additional speedup. To accelerate the rendering, we develop a custom pipeline\
    \ in Vulkan that runs on the mobile GPU. Putting it all together, we run 3 Gaussian\
    \ avatars concurrently at 72 FPS on a VR headset. \n"
  project_page: https://forresti.github.io/squeezeme
  paper: https://arxiv.org/pdf/2412.15171.pdf
  code: null
  video: null
  tags:
  - Avatar
  - Dynamic
  - Project
  thumbnail: assets/thumbnails/saito2024squeezeme.jpg
  publication_date: '2024-12-19T18:46:55+00:00'
  date_source: arxiv
- id: peng2024gags
  title: 'GAGS: Granularity-Aware Feature Distillation for Language Gaussian Splatting'
  authors: Yuning Peng, Haiping Wang, Yuan Liu, Chenglu Wen, Zhen Dong, Bisheng Yang
  year: '2024'
  abstract: '3D open-vocabulary scene understanding, which accurately perceives complex
    semantic properties of objects in space, has gained significant attention in recent
    years. In this paper, we propose GAGS, a framework that distills 2D CLIP features
    into 3D Gaussian splatting, enabling open-vocabulary queries for renderings on
    arbitrary viewpoints. The main challenge of distilling 2D features for 3D fields
    lies in the multiview inconsistency of extracted 2D features, which provides unstable
    supervision for the 3D feature field. GAGS addresses this challenge with two novel
    strategies. First, GAGS associates the prompt point density of SAM with the camera
    distances, which significantly improves the multiview consistency of segmentation
    results. Second, GAGS further decodes a granularity factor to guide the distillation
    process and this granularity factor can be learned in a unsupervised manner to
    only select the multiview consistent 2D features in the distillation process.
    Experimental results on two datasets demonstrate significant performance and stability
    improvements of GAGS in visual grounding and semantic segmentation, with an inference
    speed 2$\times$ faster than baseline methods. The code and additional results
    are available at https://pz0826.github.io/GAGS-Webpage/ .

    '
  project_page: https://pz0826.github.io/GAGS-Webpage/
  paper: https://arxiv.org/pdf/2412.13654.pdf
  code: https://github.com/WHU-USI3DV/GAGS
  video: null
  tags:
  - Code
  - Language Embedding
  - Project
  - Segmentation
  thumbnail: assets/thumbnails/peng2024gags.jpg
  publication_date: '2024-12-18T09:33:20+00:00'
  date_source: arxiv
- id: lu2024turbogs
  title: 'Turbo-GS: Accelerating 3D Gaussian Fitting for High-Quality Radiance Fields'
  authors: Tao Lu, Ankit Dhiman, R Srinath, Emre Arslan, Angela Xing, Yuanbo Xiangli,
    R Venkatesh Babu, Srinath Sridhar
  year: '2024'
  abstract: 'Novel-view synthesis is an important problem in computer vision with
    applications in 3D reconstruction, mixed reality, and robotics. Recent methods
    like 3D Gaussian Splatting (3DGS) have become the preferred method for this task,
    providing high-quality novel views in real time. However, the training time of
    a 3DGS model is slow, often taking 30 minutes for a scene with 200 views. In contrast,
    our goal is to reduce the optimization time by training for fewer steps while
    maintaining high rendering quality. Specifically, we combine the guidance from
    both the position error and the appearance error to achieve a more effective densification.
    To balance the rate between adding new Gaussians and fitting old Gaussians, we
    develop a convergence-aware budget control mechanism. Moreover, to make the densification
    process more reliable, we selectively add new Gaussians from mostly visited regions.
    With these designs, we reduce the Gaussian optimization steps to one-third of
    the previous approach while achieving a comparable or even better novel view rendering
    quality. To further facilitate the rapid fitting of 4K resolution images, we introduce
    a dilation-based rendering technique. Our method, Turbo-GS, speeds up optimization
    for typical scenes and scales well to high-resolution (4K) scenarios on standard
    datasets. Through extensive experiments, we show that our method is significantly
    faster in optimization than other methods while retaining quality. Project page:
    https://ivl.cs.brown.edu/research/turbo-gs.

    '
  project_page: https://ivl.cs.brown.edu/research/turbo-gs
  paper: https://arxiv.org/pdf/2412.13547v1.pdf
  code: null
  video: null
  tags:
  - Acceleration
  - Densification
  - Project
  thumbnail: assets/thumbnails/lu2024turbogs.jpg
  publication_date: '2024-12-18T06:46:40+00:00'
  date_source: arxiv
- id: jiang2024gausstr
  title: 'GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised
    3D Spatial Understanding'
  authors: Haoyi Jiang, Liu Liu, Tianheng Cheng, Xinjie Wang, Tianwei Lin, Zhizhong
    Su, Wenyu Liu, Xinggang Wang
  year: '2024'
  abstract: '3D Semantic Occupancy Prediction is fundamental for spatial understanding
    as it provides a comprehensive semantic cognition of surrounding environments.
    However, prevalent approaches primarily rely on extensive labeled data and computationally
    intensive voxel-based modeling, restricting the scalability and generalizability
    of 3D representation learning. In this paper, we introduce GaussTR, a novel Gaussian
    Transformer that leverages alignment with foundation models to advance self-supervised
    3D spatial understanding. GaussTR adopts a Transformer architecture to predict
    sparse sets of 3D Gaussians that represent scenes in a feed-forward manner. Through
    aligning rendered Gaussian features with diverse knowledge from pre-trained foundation
    models, GaussTR facilitates the learning of versatile 3D representations and enables
    open-vocabulary occupancy prediction without explicit annotations. Empirical evaluations
    on the Occ3D-nuScenes dataset showcase GaussTR''s state-of-the-art zero-shot performance,
    achieving 11.70 mIoU while reducing training duration by approximately 50%. These
    experimental results highlight the significant potential of GaussTR for scalable
    and holistic 3D spatial understanding, with promising implications for autonomous
    driving and embodied agents. Code is available at https://github.com/hustvl/GaussTR.

    '
  project_page: https://hustvl.github.io/GaussTR/
  paper: https://arxiv.org/pdf/2412.13193.pdf
  code: https://github.com/hustvl/GaussTR
  video: null
  tags:
  - Code
  - Project
  thumbnail: assets/thumbnails/jiang2024gausstr.jpg
  publication_date: '2024-12-17T18:59:46+00:00'
  date_source: arxiv
- id: sun2024realtime
  title: Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double
    Unprojected Textures
  authors: Guoxing Sun, Rishabh Dabral, Heming Zhu, Pascal Fua, Christian Theobalt,
    Marc Habermann
  year: '2024'
  abstract: 'Real-time free-view human rendering from sparse-view RGB inputs is a
    challenging task due to the sensor scarcity and the tight time budget. To ensure
    efficiency, recent methods leverage 2D CNNs operating in texture space to learn
    rendering primitives. However, they either jointly learn geometry and appearance,
    or completely ignore sparse image information for geometry estimation, significantly
    harming visual quality and robustness to unseen body poses. To address these issues,
    we present Double Unprojected Textures, which at the core disentangles coarse
    geometric deformation estimation from appearance synthesis, enabling robust and
    photorealistic 4K rendering in real-time. Specifically, we first introduce a novel
    image-conditioned template deformation network, which estimates the coarse deformation
    of the human template from a first unprojected texture. This updated geometry
    is then used to apply a second and more accurate texture unprojection. The resulting
    texture map has fewer artifacts and better alignment with input views, which benefits
    our learning of finer-level geometry and appearance represented by Gaussian splats.
    We validate the effectiveness and efficiency of the proposed method in quantitative
    and qualitative experiments, which significantly surpasses other state-of-the-art
    methods.

    '
  project_page: https://vcai.mpi-inf.mpg.de/projects/DUT/
  paper: https://arxiv.org/pdf/2412.13183v1.pdf
  code: null
  video: https://vcai.mpi-inf.mpg.de/projects/DUT/videos/main_video.mp4
  tags:
  - Avatar
  - Project
  - Sparse
  - Texturing
  - Video
  thumbnail: assets/thumbnails/sun2024realtime.jpg
  publication_date: '2024-12-17T18:57:38+00:00'
  date_source: arxiv
- id: weiss2024gaussian
  title: 'Gaussian Billboards: Expressive 2D Gaussian Splatting with Textures'
  authors: Sebastian Weiss, Derek Bradley
  year: '2024'
  abstract: 'Gaussian Splatting has recently emerged as the go-to representation for
    reconstructing and rendering 3D scenes. The transition from 3D to 2D Gaussian
    primitives has further improved multi-view consistency and surface reconstruction
    accuracy. In this work we highlight the similarity between 2D Gaussian Splatting
    (2DGS) and billboards from traditional computer graphics. Both use flat semi-transparent
    2D geometry that is positioned, oriented and scaled in 3D space. However 2DGS
    uses a solid color per splat and an opacity modulated by a Gaussian distribution,
    where billboards are more expressive, modulating the color with a uv-parameterized
    texture. We propose to unify these concepts by presenting Gaussian Billboards,
    a modification of 2DGS to add spatially-varying color achieved using per-splat
    texture interpolation. The result is a mixture of the two representations, which
    benefits from both the robust scene optimization power of 2DGS and the expressiveness
    of texture mapping. We show that our method can improve the sharpness and quality
    of the scene representation in a wide range of qualitative and quantitative evaluations
    compared to the original 2DGS implementation.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2412.12734v1.pdf
  code: null
  video: null
  tags:
  - 2DGS
  - Texturing
  thumbnail: assets/thumbnails/weiss2024gaussian.jpg
  publication_date: '2024-12-17T09:57:04+00:00'
  date_source: arxiv
- id: wu20243dgut
  title: '3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting'
  authors: Qi Wu, Janick Martinez Esturo, Ashkan Mirzaei, Nicolas Moenne-Loccoz, Zan
    Gojcic
  year: '2024'
  abstract: '3D Gaussian Splatting (3DGS) has shown great potential for efficient
    reconstruction and high-fidelity real-time rendering of complex scenes on consumer
    hardware. However, due to its rasterization-based formulation, 3DGS is constrained
    to ideal pinhole cameras and lacks support for secondary lighting effects. Recent
    methods address these limitations by tracing volumetric particles instead, however,
    this comes at the cost of significantly slower rendering speeds. In this work,
    we propose 3D Gaussian Unscented Transform (3DGUT), replacing the EWA splatting
    formulation in 3DGS with the Unscented Transform that approximates the particles
    through sigma points, which can be projected exactly under any nonlinear projection
    function. This modification enables trivial support of distorted cameras with
    time dependent effects such as rolling shutter, while retaining the efficiency
    of rasterization. Additionally, we align our rendering formulation with that of
    tracing-based methods, enabling secondary ray tracing required to represent phenomena
    such as reflections and refraction within the same 3D representation.

    '
  project_page: https://research.nvidia.com/labs/toronto-ai/3DGUT/
  paper: https://arxiv.org/pdf/2412.12507.pdf
  code: null
  video: https://research.nvidia.com/labs/toronto-ai/3DGUT/res/3DGUT_ready_compressed.mp4
  tags:
  - Perspective-correct
  - Project
  - Video
  thumbnail: assets/thumbnails/wu20243dgut.jpg
  publication_date: '2024-12-17T03:21:25+00:00'
  date_source: arxiv
- id: murai2024mast3rslam
  title: 'MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors'
  authors: Riku Murai, Eric Dexheimer, Andrew J. Davison
  year: '2024'
  abstract: 'We present a real-time monocular dense SLAM system designed bottom-up
    from MASt3R, a two-view 3D reconstruction and matching prior. Equipped with this
    strong prior, our system is robust on in-the-wild video sequences despite making
    no assumption on a fixed or parametric camera model beyond a unique camera centre.
    We introduce efficient methods for pointmap matching, camera tracking and local
    fusion, graph construction and loop closure, and second-order global optimisation.
    With known calibration, a simple modification to the system achieves state-of-the-art
    performance across various benchmarks. Altogether, we propose a plug-and-play
    monocular SLAM system capable of producing globally-consistent poses and dense
    geometry while operating at 15 FPS.

    '
  project_page: null
  paper: https://arxiv.org/pdf/2412.12392.pdf
  code: null
  video: https://www.youtube.com/watch?v=wozt71NBFTQ
  tags:
  - 3ster-based
  - SLAM
  - Video
  thumbnail: assets/thumbnails/murai2024mast3rslam.jpg
  publication_date: '2024-12-16T23:00:05+00:00'
  date_source: arxiv
- id: zhang2024pansplat
  title: 'PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting'
  authors: Cheng Zhang, Haofei Xu, Qianyi Wu, Camilo Cruz Gambardella, Dinh Phung,
    Jianfei Cai
  year: '2024'
  abstract: 'With the advent of portable 360{\deg} cameras, panorama has gained significant
    attention in applications like virtual reality (VR), virtual tours, robotics,
    and autonomous driving. As a result, wide-baseline panorama view synthesis has
    emerged as a vital task, where high resolution, fast inference, and memory efficiency
    are essential. Nevertheless, existing methods are typically constrained to lower
    resolutions (512 $\times$ 1024) due to demanding memory and computational requirements.
    In this paper, we present PanSplat, a generalizable, feed-forward approach that
    efficiently supports resolution up to 4K (2048 $\times$ 4096). Our approach features
    a tailored spherical 3D Gaussian pyramid with a Fibonacci lattice arrangement,
    enhancing image quality while reducing information redundancy. To accommodate
    the demands of high resolution, we propose a pipeline that integrates a hierarchical
    spherical cost volume and Gaussian heads with local operations, enabling two-step
    deferred backpropagation for memory-efficient training on a single A100 GPU. Experiments
    demonstrate that PanSplat achieves state-of-the-art results with superior efficiency
    and image quality across both synthetic and real-world datasets. Code will be
    available at \url{https://github.com/chengzhag/PanSplat}.

    '
  project_page: https://chengzhag.github.io/publication/pansplat/
  paper: https://arxiv.org/pdf/2412.12096v1.pdf
  code: https://github.com/chengzhag/PanSplat
  video: https://youtu.be/R3qIzL77ZSc
  tags:
  - 360 degree
  - Code
  - Feed-Forward
  - Project
  - Video
  - World Generation
  thumbnail: assets/thumbnails/zhang2024pansplat.jpg
  publication_date: '2024-12-16T18:59:45+00:00'
  date_source: arxiv
- id: taubner2024cap4d
  title: 'CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View
    Diffusion Models'
  authors: Felix Taubner, Ruihang Zhang, Mathieu Tuli, David B. Lindell
  year: '2024'
  abstract: 'Reconstructing photorealistic and dynamic portrait avatars from images
    is essential to many applications including advertising, visual effects, and virtual
    reality. Depending on the application, avatar reconstruction involves different
    capture setups and constraints − for example, visual effects studios use camera
    arrays to capture hundreds of reference images, while content creators may seek
    to animate a single portrait image downloaded from the internet. As such, there
    is a large and heterogeneous ecosystem of methods for avatar reconstruction. Techniques
    based on multi-view stereo or neural rendering achieve the highest quality results,
    but require hundreds of reference images. Recent generative models produce convincing
    avatars from a single reference image, but visual fidelity yet lags behind multi-view
    techniques. Here, we present CAP4D: an approach that uses a morphable multi-view
    diffusion model to reconstruct photoreal 4D (dynamic 3D) portrait avatars from
    any number of reference images (i.e., one to 100) and animate and render them
    in real time. Our approach demonstrates state-of-the-art performance for single-,
    few-, and multi-image 4D portrait avatar reconstruction, and takes steps to bridge
    the gap in visual fidelity between single-image and multi-view reconstruction
    techniques.'
  project_page: https://felixtaubner.github.io/cap4d/
  paper: https://arxiv.org/pdf/2412.12093
  code: null
  video: null
  tags:
  - Avatar
  - Project
  thumbnail: assets/thumbnails/taubner2024cap4d.jpg
  publication_date: '2024-12-16T18:58:51+00:00'
- id: liang2024wonderland
  title: 'Wonderland: Navigating 3D Scenes from a Single Image'
  authors: Hanwen Liang, Junli Cao, Vidit Goel, Guocheng Qian, Sergei Korolev, Demetri
    Terzopoulos, Konstantinos N. Plataniotis, Sergey Tulyakov, Jian Ren
  year: '2024'
  abstract: 'This paper addresses a challenging question: How can we efficiently create
    high-quality, wide-scope 3D scenes from a single arbitrary image? Existing methods
    face several constraints, such as requiring multi-view data, time-consuming per-scene
    optimization, low visual quality in backgrounds, and distorted reconstructions
    in unseen areas. We propose a novel pipeline to overcome these limitations. Specifically,
    we introduce a large-scale reconstruction model that uses latents from a video
    diffusion model to predict 3D Gaussian Splattings for the scenes in a feed-forward
    manner. The video diffusion model is designed to create videos precisely following
    specified camera trajectories, allowing it to generate compressed video latents
    that contain multi-view information while maintaining 3D consistency. We train
    the 3D reconstruction model to operate on the video latent space with a progressive
    training strategy, enabling the efficient generation of high-quality, wide-scope,
    and generic 3D scenes. Extensive evaluations across various datasets demonstrate
    that our model significantly outperforms existing methods for single-view 3D scene
    generation, particularly with out-of-domain images. For the first time, we demonstrate
    that a 3D reconstruction model can be effectively built upon the latent space
    of a diffusion model to realize efficient 3D scene generation.

    '
  project_page: https://snap-research.github.io/wonderland/
  paper: https://arxiv.org/pdf/2412.12091v1.pdf
  code: null
  video: null
  tags:
  - Feed-Forward
  - Project
  - Sparse
  - World Generation
  thumbnail: assets/thumbnails/liang2024wonderland.jpg
  publication_date: '2024-12-16T18:58:17+00:00'
  date_source: arxiv
- id: huang2024deformable
  title: Deformable Radial Kernel Splatting
  authors: Yi-Hua Huang, Ming-Xian Lin, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei
    Cao, Xiaojuan Qi
  year: '2024'
  abstract: 'Recently, Gaussian splatting has emerged as a robust technique for representing
    3D scenes, enabling real-time rasterization and high-fidelity rendering. However,
    Gaussians'' inherent radial symmetry and smoothness constraints limit their ability
    to represent complex shapes, often requiring thousands of primitives to approximate
    detailed geometry. We introduce Deformable Radial Kernel (DRK), which extends
    Gaussian splatting into a more general and flexible framework. Through learnable
    radial bases with adjustable angles and scales, DRK efficiently models diverse
    shape primitives while enabling precise control over edge sharpness and boundary
    curvature. iven DRK''s planar nature, we further develop accurate ray-primitive
    intersection computation for depth sorting and introduce efficient kernel culling
    strategies for improved rasterization efficiency. Extensive experiments demonstrate
    that DRK outperforms existing methods in both representation efficiency and rendering
    quality, achieving state-of-the-art performance while dramatically reducing primitive
    count.

    '
  project_page: https://yihua7.github.io/DRK-web/
  paper: https://arxiv.org/pdf/2412.11752v1.pdf
  code: null
  video: null
  tags:
  - Optimization
  - Project
  - Rendering
  thumbnail: assets/thumbnails/huang2024deformable.jpg
  publication_date: '2024-12-16T13:11:02+00:00'
  date_source: arxiv
- id: xu2024gaussianproperty
  title: 'GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs'
  authors: Xinli Xu, Wenhang Ge, Dicong Qiu, ZhiFei Chen, Dongyu Yan, Zhuoyun Liu,
    Haoyu Zhao, Hanfeng Zhao, Shunsi Zhang, Junwei Liang, Ying-Cong Chen
  year: '2024'
  abstract: 'Estimating physical properties for visual data is a crucial task in computer
    vision, graphics, and robotics, underpinning applications such as augmented reality,
    physical simulation, and robotic grasping. However, this area remains under-explored
    due to the inherent ambiguities in physical property estimation. To address these
    challenges, we introduce GaussianProperty, a training-free framework that assigns
    physical properties of materials to 3D Gaussians. Specifically, we integrate the
    segmentation capability of SAM with the recognition capability of GPT-4V(ision)
    to formulate a global-local physical property reasoning module for 2D images.
    Then we project the physical properties from multi-view 2D images to 3D Gaussians
    using a voting strategy. We demonstrate that 3D Gaussians with physical property
    annotations enable applications in physics-based dynamic simulation and robotic
    grasping. For physics-based dynamic simulation, we leverage the Material Point
    Method (MPM) for realistic dynamic simulation. For robot grasping, we develop
    a grasping force prediction strategy that estimates a safe force range required
    for object grasping based on the estimated physical properties. Extensive experiments
    on material segmentation, physics-based dynamic simulation, and robotic grasping
    validate the effectiveness of our proposed method, highlighting its crucial role
    in understanding physical properties from visual data. Online demo, code, more
    cases and annotated datasets are available on \href{https://Gaussian-Property.github.io}{this
    https URL}.

    '
  project_page: https://gaussian-property.github.io/
  paper: https://arxiv.org/pdf/2412.11258.pdf
  code: https://github.com/xxlbigbrother/Gaussian-Property
  video: null
  tags:
  - Code
  - Language Embedding
  - Project
  - Robotics
  thumbnail: assets/thumbnails/xu2024gaussianproperty.jpg
  publication_date: '2024-12-15T17:44:10+00:00'
  date_source: arxiv
- id: liang2024supergseg
  title: 'SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians'
  authors: Siyun Liang, Sen Wang, Kunyi Li, Michael Niemeyer, Stefano Gasperini, Nassir
    Navab, Federico Tombari
  year: '2024'
  abstract: '3D Gaussian Splatting has recently gained traction for its efficient
    training and real-time rendering. While the vanilla Gaussian Splatting representation
    is mainly designed for view synthesis, more recent works investigated how to extend
    it with scene understanding and language features. However, existing methods lack
    a detailed comprehension of scenes, limiting their ability to segment and interpret
    complex structures. To this end, We introduce SuperGSeg, a novel approach that
    fosters cohesive, context-aware scene representation by disentangling segmentation
    and language field distillation. SuperGSeg first employs neural Gaussians to learn
    instance and hierarchical segmentation features from multi-view images with the
    aid of off-the-shelf 2D masks. These features are then leveraged to create a sparse
    set of what we call Super-Gaussians. Super-Gaussians facilitate the distillation
    of 2D language features into 3D space. Through Super-Gaussians, our method enables
    high-dimensional language feature rendering without extreme increases in GPU memory.
    Extensive experiments demonstrate that SuperGSeg outperforms prior works on both
    open-vocabulary object localization and semantic segmentation tasks.

    '
  project_page: https://supergseg.github.io/
  paper: https://arxiv.org/pdf/2412.10231.pdf
  code: null
  video: null
  tags:
  - Language Embedding
  - Project
  - Segmentation
  thumbnail: assets/thumbnails/liang2024supergseg.jpg
  publication_date: '2024-12-13T16:01:19+00:00'
  date_source: arxiv
- id: tang2024gaf
  title: 'GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view
    Diffusion'
  authors: Jiapeng Tang, Davide Davoli, Tobias Kirschstein, Liam Schoneveld, Matthias
    Nießner
  year: '2024'
  abstract: We propose a novel approach for reconstructing animatable 3D Gaussian
    avatars from monocular videos captured by commodity devices like smartphones.
    Photorealistic 3D head avatar reconstruction from such recordings is challenging
    due to limited observations, which leaves unobserved regions under-constrained
    and can lead to artifacts in novel views. To address this problem, we introduce
    a multi-view head diffusion model, leveraging its priors to fill in missing regions
    and ensure view consistency in Gaussian splatting renderings. To enable precise
    viewpoint control, we use normal maps rendered from FLAME-based head reconstruction,
    which provides pixel-aligned inductive biases. We also condition the diffusion
    model on VAE features extracted from the input image to preserve details of facial
    identity and appearance. For Gaussian avatar reconstruction, we distill multi-view
    diffusion priors by using iteratively denoised images as pseudo-ground truths,
    effectively mitigating over-saturation issues. To further improve photorealism,
    we apply latent upsampling to refine the denoised latent before decoding it into
    an image. We evaluate our method on the NeRSemble dataset, showing that GAF outperforms
    the previous state-of-the-art methods in novel view synthesis and novel expression
    animation. Furthermore, we demonstrate higher-fidelity avatar reconstructions
    from monocular videos captured on commodity devices.
  project_page: https://tangjiapeng.github.io/projects/GAF/
  paper: https://arxiv.org/pdf/2412.10209
  code: null
  video: https://www.youtube.com/embed/QuIYTljvhygE
  tags:
  - Avatar
  - Project
  - Video
  thumbnail: assets/thumbnails/tang2024gaf.jpg
  publication_date: '2024-12-13T15:31:22+00:00'
- id: park2024splinegs
  title: 'SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians
    from Monocular Video'
  authors: Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello, Jaeho Moon,
    Jihyong Oh, Munchurl Kim
  year: '2024'
  abstract: 'Synthesizing novel views from in-the-wild monocular videos is challenging
    due to scene dynamics and the lack of multi-view cues. To address this, we propose
    SplineGS, a COLMAP-free dynamic 3D Gaussian Splatting (3DGS) framework for high-quality
    reconstruction and fast rendering from monocular videos. At its core is a novel
    Motion-Adaptive Spline (MAS) method, which represents continuous dynamic 3D Gaussian
    trajectories using cubic Hermite splines with a

Download .txt

gitextract__e_uw1ef/

├── .gitattributes
├── .github/
│   ├── CODEOWNERS
│   ├── FUNDING.yml
│   └── workflows/
│       ├── generate-html.yml
│       └── validate-pr.yml
├── .gitignore
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── awesome_3dgs_papers.yaml
├── editor.py
├── index.html
├── requirements.txt
└── src/
    ├── __init__.py
    ├── arxiv_integration.py
    ├── components/
    │   ├── __init__.py
    │   ├── dialogs.py
    │   ├── thumbnail.py
    │   └── widgets.py
    ├── fix_date.py
    ├── generate.py
    ├── helper.py
    ├── paper_generator.py
    ├── paper_schema.py
    ├── static/
    │   ├── css/
    │   │   ├── base.css
    │   │   ├── components.css
    │   │   └── responsive.css
    │   └── js/
    │       ├── filters.js
    │       ├── main.js
    │       ├── navigation.js
    │       ├── selection.js
    │       ├── sharing.js
    │       ├── state.js
    │       └── utils.js
    ├── template_engine.py
    ├── templates/
    │   ├── index.html
    │   └── paper_card.html
    ├── utils.py
    ├── validate_yaml.py
    └── yaml_editor.py

Download .txt

SYMBOL INDEX (106 symbols across 18 files)

FILE: src/arxiv_integration.py
  class ArxivIntegration (line 8) | class ArxivIntegration:
    method __init__ (line 9) | def __init__(self):
    method extract_arxiv_id (line 12) | def extract_arxiv_id(self, url_or_id: str) -> str:
    method get_paper (line 29) | def get_paper(self, url_or_id: str) -> Optional[Dict[str, Any]]:
    method append_to_yaml (line 61) | def append_to_yaml(self, entry: Dict[str, Any], filename: str = "aweso...
    method format_yaml_entry (line 81) | def format_yaml_entry(entry: Dict[str, Any]) -> str:
  function clean_and_quote (line 107) | def clean_and_quote(text: str) -> str:
  function format_optional_field (line 117) | def format_optional_field(value) -> str:

FILE: src/components/dialogs.py
  class ArxivAddDialog (line 12) | class ArxivAddDialog(QDialog):
    method __init__ (line 13) | def __init__(self, parent=None):
    method setup_ui (line 21) | def setup_ui(self):
    method generate_thumbnail (line 43) | def generate_thumbnail(self, entry):
    method add_paper (line 67) | def add_paper(self):

FILE: src/components/thumbnail.py
  class ThumbnailGenerator (line 10) | class ThumbnailGenerator:
    method __init__ (line 11) | def __init__(self, output_dir: str = "assets/thumbnails"):
    method download_pdf (line 18) | def download_pdf(self, url: str) -> bytes:
    method create_thumbnail (line 29) | def create_thumbnail(self, pdf_content: bytes, paper_id: str) -> bool:

FILE: src/components/widgets.py
  class TagButton (line 3) | class TagButton(QPushButton):
    method __init__ (line 4) | def __init__(self, text, active=False):
  class URLWidget (line 25) | class URLWidget(QWidget):
    method __init__ (line 26) | def __init__(self, label_text):
    method set_text (line 40) | def set_text(self, value):

FILE: src/fix_date.py
  class YAMLUpdater (line 9) | class YAMLUpdater:
    method __init__ (line 10) | def __init__(self):
    method extract_year_from_id (line 14) | def extract_year_from_id(self, paper_id: str) -> Optional[int]:
    method extract_arxiv_id (line 21) | def extract_arxiv_id(self, url: str) -> Optional[str]:
    method get_fallback_date (line 32) | def get_fallback_date(self, entry: Dict[str, Any]) -> Optional[str]:
    method process_paper (line 51) | def process_paper(self, entry: Dict[str, Any]) -> Tuple[Dict[str, Any]...
    method safe_sort_key (line 90) | def safe_sort_key(self, x: Dict[str, Any]) -> tuple:
    method update_yaml_with_dates (line 112) | def update_yaml_with_dates(self, filename: str = "awesome_3dgs_papers....

FILE: src/generate.py
  function generate_html (line 9) | def generate_html(entries: List[Dict[str, Any]], output_file: str) -> None:
  function main (line 38) | def main():

FILE: src/helper.py
  function generate_year_options (line 10) | def generate_year_options(entries: List[Dict[str, Any]]) -> str:
  function generate_tag_filters (line 15) | def generate_tag_filters(entries: List[Dict[str, Any]]) -> str:
  function generate_paper_cards (line 21) | def generate_paper_cards(entries: List[Dict[str, Any]]) -> str:
  function format_publication_date (line 45) | def format_publication_date(date_str: str, date_source: str) -> str:

FILE: src/paper_generator.py
  class PaperCardGenerator (line 7) | class PaperCardGenerator:
    method __init__ (line 10) | def __init__(self, templates_dir: Path):
    method _generate_link (line 14) | def _generate_link(self, url: str, icon: str, text: str, emoji: str = ...
    method _generate_links (line 21) | def _generate_links(self, paper: Paper) -> str:
    method _generate_tags (line 44) | def _generate_tags(self, paper: Paper) -> str:
    method _generate_abstract (line 49) | def _generate_abstract(self, paper: Paper) -> str:
    method generate_card (line 58) | def generate_card(self, paper: Paper) -> str:
    method generate_cards (line 75) | def generate_cards(self, papers: List[Paper]) -> str:

FILE: src/paper_schema.py
  class Paper (line 6) | class Paper:
    method from_dict (line 23) | def from_dict(cls, data: dict) -> 'Paper':
    method to_dict (line 74) | def to_dict(self) -> dict:

FILE: src/static/js/filters.js
  function filterPapers (line 1) | function filterPapers() {
  function clearSearch (line 41) | function clearSearch() {
  function initializeFilters (line 46) | function initializeFilters() {

FILE: src/static/js/navigation.js
  function scrollToTop (line 2) | function scrollToTop() {
  function scrollToBottom (line 9) | function scrollToBottom() {
  function updateScrollProgress (line 17) | function updateScrollProgress() {
  function updateFilterStatus (line 25) | function updateFilterStatus() {
  function createFilterTag (line 77) | function createFilterTag(type, title, info) {
  function clearAllFilters (line 94) | function clearAllFilters() {

FILE: src/static/js/selection.js
  function toggleSelectedOnly (line 1) | function toggleSelectedOnly() {
  function toggleSelectionMode (line 24) | function toggleSelectionMode() {
  function clearSelection (line 54) | function clearSelection() {
  function togglePaperSelection (line 80) | function togglePaperSelection(paperId, checkbox) {
  function removeFromSelection (line 119) | function removeFromSelection(paperId) {
  function updateSelectionCount (line 143) | function updateSelectionCount() {
  function handleCheckboxClick (line 148) | function handleCheckboxClick(ev, paperId, checkbox) {
  function scrollToPaper (line 153) | function scrollToPaper(paperId) {

FILE: src/static/js/sharing.js
  function showShareModal (line 1) | function showShareModal() {
  function hideShareModal (line 17) | function hideShareModal() {
  function copyShareLink (line 21) | async function copyShareLink() {
  function copyBitcoinAddress (line 36) | function copyBitcoinAddress() {
  function applyURLParams (line 48) | function applyURLParams() {

FILE: src/static/js/utils.js
  function debounce (line 1) | function debounce(fn, delay) {
  function updateURL (line 9) | function updateURL() {
  function updatePaperNumbers (line 37) | function updatePaperNumbers() {

FILE: src/template_engine.py
  class TemplateEngine (line 5) | class TemplateEngine:
    method __init__ (line 6) | def __init__(self, template_path: Path):
    method render (line 10) | def render(self, context: Dict[str, Any]) -> str:

FILE: src/utils.py
  function read_files (line 4) | def read_files(base_dir: Path, file_paths: List[str]) -> List[str]:
  function write_output (line 12) | def write_output(output_file: str, content: str) -> None:

FILE: src/validate_yaml.py
  function validate_url (line 39) | def validate_url(url, required=False):
  function get_changed_entries (line 66) | def get_changed_entries():
  function validate_entries (line 101) | def validate_entries(entries):
  function main (line 153) | def main():

FILE: src/yaml_editor.py
  class YAMLEditor (line 17) | class YAMLEditor(QMainWindow):
    method __init__ (line 18) | def __init__(self):
    method safe_sort_key (line 53) | def safe_sort_key(self, x: Dict[str, Any]) -> tuple:
    method load_yaml (line 84) | def load_yaml(self):
    method setup_status_bar (line 102) | def setup_status_bar(self):
    method show_save_feedback (line 107) | def show_save_feedback(self, success=True):
    method clear_save_indicator (line 116) | def clear_save_indicator(self):
    method setup_ui (line 120) | def setup_ui(self):
    method auto_save (line 253) | def auto_save(self):
    method handle_url_change (line 303) | def handle_url_change(self):
    method get_entry_state (line 307) | def get_entry_state(self, entry):
    method update_tags (line 314) | def update_tags(self):
    method update_automatic_tags (line 320) | def update_automatic_tags(self):
    method clear_search_results (line 341) | def clear_search_results(self):
    method show_current_entry (line 346) | def show_current_entry(self):
    method search_entry (line 378) | def search_entry(self):
    method open_url (line 414) | def open_url(self, field):
    method go_to_page (line 419) | def go_to_page(self):
    method prev_entry (line 430) | def prev_entry(self):
    method next_entry (line 436) | def next_entry(self):
    method delete_current_entry (line 442) | def delete_current_entry(self):
    method add_arxiv_button (line 479) | def add_arxiv_button(self):
    method refresh_ui (line 484) | def refresh_ui(self):
    method show_arxiv_dialog (line 505) | def show_arxiv_dialog(self):
  function main (line 551) | def main():

Download .json

Condensed preview — 40 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (2,639K chars).

[
  {
    "path": ".gitattributes",
    "chars": 43,
    "preview": "assets filter=lfs diff=lfs merge=lfs -text\n"
  },
  {
    "path": ".github/CODEOWNERS",
    "chars": 43,
    "preview": "# Require review for all changes\n* @MrNeRF\n"
  },
  {
    "path": ".github/FUNDING.yml",
    "chars": 0,
    "preview": ""
  },
  {
    "path": ".github/workflows/generate-html.yml",
    "chars": 1984,
    "preview": "name: Generate HTML\non:\n  pull_request:\n    types: [closed]\n  push:\n    branches: [main]\n    paths:\n      - 'awesome_3dg"
  },
  {
    "path": ".github/workflows/validate-pr.yml",
    "chars": 805,
    "preview": "name: Validate PR Changes\n\non:\n  pull_request:\n    branches: [ main ]\n    paths:\n      - 'awesome_3dgs_papers.yaml'\n\njob"
  },
  {
    "path": ".gitignore",
    "chars": 26,
    "preview": ".venv\n__pycache__/\n*.pyc\n\n"
  },
  {
    "path": "CONTRIBUTING.md",
    "chars": 1775,
    "preview": "# Contributing Guide\n\nThank you for your interest in contributing to the Awesome 3D Gaussian Splatting repository! This "
  },
  {
    "path": "LICENSE",
    "chars": 1064,
    "preview": "MIT License\n\nCopyright (c) 2023 janusch\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof"
  },
  {
    "path": "README.md",
    "chars": 16910,
    "preview": "# Awesome 3D Gaussian Splatting\n\n<div align=\"center\">\n  A curated collection of resources focused on 3D Gaussian Splatti"
  },
  {
    "path": "awesome_3dgs_papers.yaml",
    "chars": 921032,
    "preview": "- id: ren2025fastgs\n  title: 'FastGS: Training 3D Gaussian Splatting in 100 Seconds'\n  authors: Shiwei Ren, Tianci Wen, "
  },
  {
    "path": "editor.py",
    "chars": 72,
    "preview": "from src.yaml_editor import main\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "index.html",
    "chars": 1504672,
    "preview": "<!DOCTYPE HTML>\n<html lang=\"en\">\n<head>\n    <meta charset=\"UTF-8\">\n    <meta name=\"viewport\" content=\"width=device-width"
  },
  {
    "path": "requirements.txt",
    "chars": 386,
    "preview": "arxiv==2.1.3\ncertifi==2024.12.14\ncffi==1.17.1\ncharset-normalizer==3.4.1\ncryptography==44.0.0\nDeprecated==1.2.15\nfeedpars"
  },
  {
    "path": "src/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "src/arxiv_integration.py",
    "chars": 4717,
    "preview": "import arxiv\nimport yaml\nimport re\nfrom urllib.parse import urlparse\nfrom typing import Optional, Dict, Any\n\n\nclass Arxi"
  },
  {
    "path": "src/components/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "src/components/dialogs.py",
    "chars": 4571,
    "preview": "import arxiv\nfrom PyQt6.QtWidgets import (QApplication, QMainWindow, QWidget, QVBoxLayout, \n                           Q"
  },
  {
    "path": "src/components/thumbnail.py",
    "chars": 2426,
    "preview": "from pathlib import Path\nimport requests\nfrom pdf2image import convert_from_bytes\nfrom PIL import Image\nimport logging\n\n"
  },
  {
    "path": "src/components/widgets.py",
    "chars": 1304,
    "preview": "from PyQt6.QtWidgets import QPushButton, QWidget, QHBoxLayout, QLabel, QLineEdit\n\nclass TagButton(QPushButton):\n    def "
  },
  {
    "path": "src/fix_date.py",
    "chars": 6114,
    "preview": "import yaml\nimport arxiv\nimport time\nfrom datetime import datetime\nfrom typing import Dict, Any, Optional, List, Tuple\ni"
  },
  {
    "path": "src/generate.py",
    "chars": 1897,
    "preview": "import yaml\nimport sys\nfrom pathlib import Path\nfrom typing import List, Dict, Any\nfrom helper import generate_year_opti"
  },
  {
    "path": "src/helper.py",
    "chars": 2386,
    "preview": "import datetime\nfrom typing import List, Dict, Any\nfrom paper_schema import Paper\nfrom pathlib import Path\nfrom paper_ge"
  },
  {
    "path": "src/paper_generator.py",
    "chars": 3718,
    "preview": "from pathlib import Path\nimport json\nfrom typing import List\nfrom paper_schema import Paper\nfrom template_engine import "
  },
  {
    "path": "src/paper_schema.py",
    "chars": 3619,
    "preview": "from dataclasses import dataclass\nfrom typing import List, Optional\nfrom datetime import datetime\n\n@dataclass\nclass Pape"
  },
  {
    "path": "src/static/css/base.css",
    "chars": 9237,
    "preview": "/* Root Variables */\n:root {\n    --primary-color: #1772d0;\n    --hover-color: #f09228;\n    --bg-color: #ffffff;\n    --ca"
  },
  {
    "path": "src/static/css/components.css",
    "chars": 10183,
    "preview": ".selected-only-mode-toggle {\n    position: fixed;\n    bottom: 6rem;\n    right: 2rem;\n    background: var(--primary-color"
  },
  {
    "path": "src/static/css/responsive.css",
    "chars": 1138,
    "preview": "@media (max-width: 1024px) {\n    .container {\n        padding: 0 1rem;\n    }\n\n    .selection-preview {\n        display: "
  },
  {
    "path": "src/static/js/filters.js",
    "chars": 2774,
    "preview": "function filterPapers() {\n    // Show/hide non-paper elements regardless of filter state\n    document.querySelectorAll('"
  },
  {
    "path": "src/static/js/main.js",
    "chars": 2623,
    "preview": "document.addEventListener('DOMContentLoaded', function() {\n    // Initialize variables\n    window.paperCards = document."
  },
  {
    "path": "src/static/js/navigation.js",
    "chars": 4383,
    "preview": "// Navigation controls\nfunction scrollToTop() {\n    window.scrollTo({\n        top: 0,\n        behavior: 'smooth'\n    });"
  },
  {
    "path": "src/static/js/selection.js",
    "chars": 5891,
    "preview": "function toggleSelectedOnly() {\n    state.onlyShowSelected = !state.onlyShowSelected;\n    const button = document.queryS"
  },
  {
    "path": "src/static/js/sharing.js",
    "chars": 4025,
    "preview": "function showShareModal() {\n    if (state.selectedPapers.size === 0) {\n        alert('Please select at least one paper t"
  },
  {
    "path": "src/static/js/state.js",
    "chars": 161,
    "preview": "const state = {\n    selectedPapers: new Set(),\n    isSelectionMode: false,\n    includeTags: new Set(),\n    excludeTags: "
  },
  {
    "path": "src/static/js/utils.js",
    "chars": 1316,
    "preview": "function debounce(fn, delay) {\n    let timeout;\n    return (...args) => {\n        if (timeout) clearTimeout(timeout);\n  "
  },
  {
    "path": "src/template_engine.py",
    "chars": 448,
    "preview": "from string import Template as StringTemplate\nfrom typing import Dict, Any\nfrom pathlib import Path\n\nclass TemplateEngin"
  },
  {
    "path": "src/templates/index.html",
    "chars": 10054,
    "preview": "<!DOCTYPE HTML>\n<html lang=\"en\">\n<head>\n    <meta charset=\"UTF-8\">\n    <meta name=\"viewport\" content=\"width=device-width"
  },
  {
    "path": "src/templates/paper_card.html",
    "chars": 743,
    "preview": "<div class=\"paper-row\" data-id=\"$id\" data-title=\"$title\" data-authors=\"$authors\" data-year=\"$year\" data-tags='$tags_json"
  },
  {
    "path": "src/utils.py",
    "chars": 542,
    "preview": "from pathlib import Path\nfrom typing import List\n\ndef read_files(base_dir: Path, file_paths: List[str]) -> List[str]:\n  "
  },
  {
    "path": "src/validate_yaml.py",
    "chars": 6230,
    "preview": "import yaml\nimport sys\nimport os\nimport requests\nimport time\nfrom urllib3.util.retry import Retry\nfrom requests.adapters"
  },
  {
    "path": "src/yaml_editor.py",
    "chars": 22360,
    "preview": "import sys\nfrom src.fix_date import YAMLUpdater\nimport yaml\nimport webbrowser\nfrom PyQt6.QtWidgets import (QApplication,"
  }
]

About this extraction

This page contains the full source code of the MrNeRF/awesome-3D-gaussian-splatting GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 40 files (2.4 MB), approximately 642.1k tokens, and a symbol index with 106 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo